遍歷二進制數的真位（快速）-有解無憂

所以我試圖在 Python 中創建一個函式來回傳數字二進制中使用的所有 2 的冪。

例如：二進制的 123 是 1111011。我希望我的函式盡快回傳對應于 123 的真位（1、2、8、16、32 和 64）的 2 的冪。

現在我發現的最好的是：

def true_bits(num):
    while num:
        temp = num & -num
        num -= temp
        yield temp

uj5u.com熱心網友回復：

一些（非更快的）替代方案：

使用numpy并假設 8 位無符號整數：

import numpy as np
def numpy_bits(num):
    bits = np.unpackbits(np.uint8(num), bitorder='little')
    return 2**np.arange(8)[bits.astype(bool)]

numpy_bits(123)
# array([ 1,  2,  8, 16, 32, 64])
# 6.8 μs ± 293 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

使用 python 回圈（減少位順序）：

def python_bits(num):
    for i in range(7,-1,-1):
        if num >= (x:=2**i):
            yield x
            num -= x

list(python_bits(123))
# [64, 32, 16, 8, 2, 1]
# 2.26 μs ± 61.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

OP的方法：

list(true_bits(123))
# [1, 2, 8, 16, 32, 64]
# 1.14 μs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

uj5u.com熱心網友回復：

用一堆隨機的 64 位數字和 24 個真位進行基準測驗（基于您在評論中所說的）：

47.7 ms  true_bits_original
16.0 ms  true_bits_Kelly

45.6 ms  true_bits_original
15.7 ms  true_bits_Kelly

47.4 ms  true_bits_original
16.3 ms  true_bits_Kelly

我正在使用八個查找表，每個查找表負責八位。帶有基準的完整代碼（在線試用！）：

intern = {2**i: 2**i for i in range(64)}.get
table0 = [()]
for i in range(8):
    table0  = [(*bits, intern(2**i)) for bits in table0]
table0 = tuple(table0)
table1 = tuple(tuple(intern(bit << 8) for bit in bits) for bits in table0)
table2 = tuple(tuple(intern(bit << 8) for bit in bits) for bits in table1)
table3 = tuple(tuple(intern(bit << 8) for bit in bits) for bits in table2)
table4 = tuple(tuple(intern(bit << 8) for bit in bits) for bits in table3)
table5 = tuple(tuple(intern(bit << 8) for bit in bits) for bits in table4)
table6 = tuple(tuple(intern(bit << 8) for bit in bits) for bits in table5)
table7 = tuple(tuple(intern(bit << 8) for bit in bits) for bits in table6)

def true_bits_Kelly(num):
    return chain(table0[num & 0xff],
                 table1[(num >> 8) & 0xff],
                 table2[(num >> 16) & 0xff],
                 table3[(num >> 24) & 0xff],
                 table4[(num >> 32) & 0xff],
                 table5[(num >> 40) & 0xff],
                 table6[(num >> 48) & 0xff],
                 table7[num >> 56])

def true_bits_original(num):
    while num:
        temp = num & -num
        num -= temp
        yield temp

funcs = true_bits_original, true_bits_Kelly

import timeit
from itertools import repeat, chain
from random import getrandbits, sample
from collections import deque

# Correctness
for _ in range(1000):
    num = getrandbits(64)
    expect = list(funcs[0](num))
    for func in funcs:
        assert list(func(num)) == expect

# Speed
for _ in range(3):
    nums = [sum(2**i for i in sample(range(64), 24))
            for _ in range(10000)]
    for func in funcs:
        def run():
            gens = map(func, nums)
            consumes = map(deque, gens, repeat(0))
            deque(consumes, 0)
        t = min(timeit.repeat(run, number=1))
        print('%4.1f ms ' % (t * 1e3), func.__name__)
    print()

uj5u.com熱心網友回復：

您的初始代碼已經非常有效。問題是CPython 解釋器讓它變慢。

實際上，解釋器使用了管理起來很昂貴的參考計數的可變大小整數。因此，-num分配一個新的整數物件以及num & ...和num -= temp。這意味著完成了3 次昂貴的分配。yield 也是一個非常昂貴的操作（它會導致低級別的背景關系切換）。

這種開銷大部分可以通過即時編譯器(JIT) 消除。例如，PyPy 通用的基于 JIT 的解釋器能夠大部分消除物件分配的開銷（還要感謝快速的垃圾收集器），盡管 PyPy 還沒有yield很好地優化。或者，可以在此處使用Numba 。Numba 是一個 JIT 編譯器，旨在優化可在 CPython 執行的代碼中使用的數字代碼。例如下面的代碼有點快：

import numba as nb

@nb.njit('(uint64,)')
def true_bits(num):
    while num:
        temp = num & -num
        num -= temp
        yield temp

話雖如此，它僅限于 64 位整數（或更小），就像 Numpy 一樣。Cython還可以通過使用基本編譯器提前編譯代碼來提供幫助。這類似于撰寫自己的 C 模塊，但您不需要撰寫 C 代碼，而 Cython 使這個程序更容易。

如果您想進一步優化代碼，那么您當然需要在呼叫函式中使用此類工具，以免支付來自 CPython 解釋器的函式呼叫的昂貴開銷（這至少比本地解釋器慢 10 倍）。

如果這是不可能的（無望的情況），您可以對 Numba 使用以下方法：

@nb.njit('(uint64,uint64[::1])')
def true_bits(num, buffer):
    cur = 0
    buffer.fill(0)
    while num:
        temp = num & -num
        num -= temp
        buffer[cur] = temp
        cur  = 1
    return cur

buffer = np.empty(64, dtype=np.uint64)
written_items = true_bits(154781, buffer)
# Result stored in in buffer[:written_items]

這個想法是將結果寫入預分配的緩沖區（因為在這種情況下創建 Numpy 陣列很慢）。然后該函式在需要時將值寫入緩沖區并回傳寫入項的數量。您可以獲取實際專案，buffer[:written_items]也可以對陣列進行迭代，但請注意，這樣做幾乎與計算本身一樣昂貴（同樣由于 CPython 解釋器）。盡管如此，它還是比最初的解決方案更快。

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/459262.html

標籤：Python 表现数学二进制

上一篇：停止RecyclerViewMediaPlayer中的上一個專案

下一篇：如何為python中的Number類加速__init__？