如何master
通過一些矢量化程序創建 DataFrame?如果不可能,執行此操作的最省時(不關心記憶體)方法是什么?
可以將 for 回圈替換為更有效的東西嗎?
如您所見,combinations
很快就會產生非常大的數字,因此我需要一種快速的方法來產生這個 DataFrame。
請參閱下面的最小可重現示例:
%%time
import pandas as pd
import string
import numpy as np
from itertools import combinations
# create dummy data
cols = list(string.ascii_uppercase)
dummy = pd.DataFrame()
for col in cols:
dummy = dummy.append([[col, 0] np.random.randint(2, 100, size=(1, 10)).tolist()[0]])
dummy = dummy.append([[col, 1] np.random.randint(2, 100, size=(1, 10)).tolist()[0]])
dummy = dummy.append([[col, 2] np.random.randint(2, 100, size=(1, 10)).tolist()[0]])
dummy.columns=['name', 'id', 'v1', 'v2', 'v3', 'v4', 'v5', 'v1', 'v6', 'v7', 'v8', 'v9']
# create all possible unique combinations
combos = list(combinations(cols, 2))
# generate DataFrame with all combinations
master = pd.DataFrame()
for i, combo in enumerate(combos):
A = dummy[dummy.name == combo[0]]
B = dummy[dummy.name == combo[1]]
joined = pd.merge(A, B, on=["id"], suffixes=('_A', '_B'))
joined = joined.sort_values("id")
joined['pair_id'] = i
master = pd.concat([master, joined])
輸出:
CPU times: total: 1.8 s
Wall time: 1.8 s
謝謝!
uj5u.com熱心網友回復:
由于您的資料是結構化的,因此您可以下拉到 numpy 以利用矢量化操作。
names = list(string.ascii_uppercase)
ids = [0, 1, 2]
columns = pd.Series(["v1", "v2", "v3", "v4", "v5", "v1", "v6", "v7", "v8", "v9"])
# Generate the random data
data = np.random.randint(2, 100, (len(names), len(ids), len(columns)))
# Pair data for every 2-combination of names
arr = [np.hstack([data[i], data[j]]) for i,j in combinations(range(len(names)), 2)]
# Assembling the data to final dataframe
idx = pd.MultiIndex.from_tuples([
(p,a,b,i) for p, (a, b) in enumerate(combinations(names,2)) for i in ids
], names=["pair_id", "name_A", "name_B", "id"])
cols = pd.concat([columns "_A", columns "_B"])
master = pd.DataFrame(np.vstack(arr), index=idx, columns=cols)
原碼:4s。新代碼:7ms
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/464447.html