有效地將Numpy2D計數陣列轉換為零填充的2D索引陣列？-有解無憂

我有一個 n 行（觀察）X m 列（特征）的 numpy 2D 陣列，其中每個元素是觀察到該特征的次數。我需要將其轉換為 feature_indices 的零填充 2D 陣列，其中每個 feature_index 重復多次，對應于原始 2D 陣列中的“計數”。

這似乎應該是使用索引的簡單組合np.where或np.repeat僅擴展，但我沒有看到它。這是一個非常緩慢、回圈的解決方案（在實踐中使用起來太慢了）：

# Loopy solution (way too slow!)
def convert_2Dcountsarray_to_zeropaddedindices(countsarray2D):
    rowsums = np.sum(countsarray2D,1)
    max_rowsum = np.max(rowsums)
    out = []
    for row_idx, row in enumerate(countsarray2D):
        out_row = [0]*int(max_rowsum - rowsums[row_idx]) #Padding zeros so all out_rows same length
        for ele_idx in range(len(row)):
            [out_row.append(x) for x in np.repeat(ele_idx, row[ele_idx]) ] 
        out.append(out_row)
    return np.array(out)

# Working example
countsarray2D = np.array( [[1,2,0,1,3],
                           [0,0,0,0,3],
                           [0,1,1,0,0]] )

# Shift all features up by 1 (i.e. add a dummy feature 0 we will use for padding)
countsarray2D = np.hstack( (np.zeros((len(countsarray2D),1)), countsarray2D) )

print(convert_2Dcountsarray_to_zeropaddedindices(countsarray2D))

# Desired result:
array([[1 2 2 4 5 5 5]
       [0 0 0 0 5 5 5]
       [0 0 0 0 0 2 3]])

uj5u.com熱心網友回復：

一種解決方案是flatten陣列并使用np.repeat.

此解決方案需要首先將用作每行填充的零數添加到countsarray2D. 這可以按如下方式完成：

counts = countsarray2D.sum(axis=1)
max_count = max(counts)
zeros_to_add = max_count - counts
countsarray2D = np.c_[zeros_to_add, countsarray2D]

那么新countsarray2D的是：

array([[0, 1, 2, 0, 1, 3],
       [4, 0, 0, 0, 0, 3],
       [5, 0, 1, 1, 0, 0]])

現在，我們可以展平陣列并使用np.repeat. 索引陣列A用作輸入陣列，同時countsarray2D確定每個索引值應重復的次數。

n_rows, n_cols = countsarray2D.shape
A = np.tile(np.arange(n_cols), (n_rows, 1))
np.repeat(A, countsarray2D.flatten()).reshape(n_rows, -1)

最后結果：

array([[1, 2, 2, 4, 5, 5, 5],
       [0, 0, 0, 0, 5, 5, 5],
       [0, 0, 0, 0, 0, 2, 3]])

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/497575.html

標籤：Python 数组麻木的表现索引

上一篇：如何根據以前的行為python進行插值？

下一篇：如何提高此Powershell代碼的性能