我有一個包含陣列列和字串列的資料框
| string_col | array_col |
|-------------|----------------------|
| fruits | ['apple', 'banaana'] |
| flowers | ['rose', 'sunflower']|
| animals | ['lion', 'tiger'] |
我想將 string_col 元素分配給 array_col 中的每個元素。因此,以下形式的輸出資料幀。
| string_col | array_col | new_col |
|-------------|----------------------|----------------------|
| fruits | ['apple', 'banaana'] |['fruits', 'fruits'] |
| flowers | ['rose', 'sunflower']|['flowers', 'flowers']|
| animals | ['lion', 'tiger'] |['animals', 'animals']|
uj5u.com熱心網友回復:
string
按列的長度對重復 s 使用串列推導:
df['new_col'] = [[a] * len(b) for a, b in zip(df['string_col'], df['array_col'])]
print (df)
string_col array_col new_col
0 fruits [apple, banaana] [fruits, fruits]
1 flowers [rose, sunflower] [flowers, flowers]
2 animals [lion, tiger] [animals, animals]
如果小資料和性能不重要,請使用DataFrame.apply
:
df['new_col'] = df.apply(lambda x: [x['string_col']] * len(x['array_col']) , axis=1)
#3k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [311]: %timeit df['new_col'] = [[a] * len(b) for a, b in zip(df['string_col'], df['array_col'])]
1.94 ms ± 97.3 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [312]: %timeit df['new_col'] = df.apply(lambda x: [x['string_col']] * len(x['array_col']) , axis=1)
40.4 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [313]: %timeit df['new_col']=df[['string_col']].agg(list, axis=1)*df['array_col'].str.len()
132 ms ± 6.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/520228.html
上一篇:C#將不帶空格的字串添加到陣列中