我有以下資料框:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Name' : ['Jake', 'Nate', '', 'Alex', '', 'Max', 'Nate', 'Jake'],
'Color' : ['', 'red;blue', 'blue;pink', 'green;blue;red', '', '', 'blue', 'red;yellow'],
'Value_1' : [1211233.419, 4007489.726, 953474.6894, np.NaN, 1761987.704, 222600361, 404419.2243, 606066.067 ],
'Value_2' : [np.NaN, 1509907.457, 4792269.911, 43486.59312, np.NaN, np.NaN, 2066645.251, 60988660.37],
'Value_3' : [1175299.998, np.NaN, 1888559.459, np.NaN, 444689.0177, 405513.0572, 343704.0269, 2948494.383]})
---
Name Color Value_1 Value_2 Value_3
0 Jake 1.211233e 06 NaN 1.175300e 06
1 Nate red;blue 4.007490e 06 1.509907e 06 NaN
2 blue;pink 9.534747e 05 4.792270e 06 1.888559e 06
3 Alex green;blue;red NaN 4.348659e 04 NaN
4 1.761988e 06 NaN 4.446890e 05
5 Max 2.226004e 08 NaN 4.055131e 05
6 Nate blue 4.044192e 05 2.066645e 06 3.437040e 05
7 Jake red;yellow 6.060661e 05 6.098866e 07 2.948494e 06
我需要兩件事:
1)在第一種情況下,我需要添加我具有相同名稱的所有值(Value_1、Value_2、Value_3),例如:
Name Value_1 Value_2 Value_3
0 Jake 1.817299e 06 6.098866e 07 4.123794e 06
1 Nate 4.411909e 06 3.576553e 06 3.437040e 05
2 Alex NaN 4.348659e 04 NaN
3 Max 2.226004e 08 NaN 4.055131e 05
2)我需要相同的東西,但名稱列的值加上顏色列的拆分(僅當同一行中至少有一個名稱和一種顏色時):
Name Color Value_1 Value_2 Value_3
0 Alex green NaN 4.348659e 04 NaN
1 Alex blue NaN 4.348659e 04 NaN
3 Alex red NaN 4.348659e 04 NaN
4 Jake red 6.060661e 05 6.098866e 07 2.948494e 06
5 Jake yellow 6.060661e 05 6.098866e 07 2.948494e 06
6 Nate red 4.007490e 06 1.509907e 06 NaN
7 Nate blue 4.411909e 06 3.576553e 06 3.437040e 05
(請注意,在這種情況下,唯一出現兩次的線是 Nate-Blue)
uj5u.com熱心網友回復:
首先將前 2 列中的空字串替換為缺失值:
df1[['Name','Color']] = df1[['Name','Color']].replace('', np.nan)
sum
然后用min_count=1
缺失值聚合0
:
df2 = df1.groupby('Name', as_index=False).sum(min_count=1)
print (df2)
Name Value_1 Value_2 Value_3
0 Alex NaN 4.348659e 04 NaN
1 Jake 1.817299e 06 6.098866e 07 4.123794e 06
2 Max 2.226004e 08 NaN 4.055131e 05
3 Nate 4.411909e 06 3.576553e 06 3.437040e 05
對于第二個輸出,首先使用Series.str.split
withDataFrame.explode
然后聚合sum
:
df3 = (df1.assign(Color=df1['Color'].str.split(';'))
.explode('Color')
.groupby(['Name', 'Color'], as_index=False)
.sum(min_count=1))
print (df3)
Name Color Value_1 Value_2 Value_3
0 Alex blue NaN 4.348659e 04 NaN
1 Alex green NaN 4.348659e 04 NaN
2 Alex red NaN 4.348659e 04 NaN
3 Jake red 6.060661e 05 6.098866e 07 2.948494e 06
4 Jake yellow 6.060661e 05 6.098866e 07 2.948494e 06
5 Nate blue 4.411909e 06 3.576553e 06 3.437040e 05
6 Nate red 4.007490e 06 1.509907e 06 NaN
uj5u.com熱心網友回復:
您可以使用:
(df1.assign(Color=df1['Color'].str.split(';'))
.explode('Color')
.groupby(['Name', 'Color'], as_index=False)
.sum()
.replace('', pd.NA).dropna()
)
輸出:
Name Color Value_1 Value_2 Value_3
3 Alex blue 0.000000e 00 4.348659e 04 0.000000e 00
4 Alex green 0.000000e 00 4.348659e 04 0.000000e 00
5 Alex red 0.000000e 00 4.348659e 04 0.000000e 00
7 Jake red 6.060661e 05 6.098866e 07 2.948494e 06
8 Jake yellow 6.060661e 05 6.098866e 07 2.948494e 06
10 Nate blue 4.411909e 06 3.576553e 06 3.437040e 05
11 Nate red 4.007490e 06 1.509907e 06 0.000000e 00
uj5u.com熱心網友回復:
df1['Color'] = df1['Color'].apply(lambda x: x.split(';'))
df1.explode('Color')
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/505760.html