我有以下資料框(示例):
import pandas as pd
data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 12:00:00', True, 3], ['A', '2022-09-01 14:00:00', False, 1],
['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4], ['B', '2022-09-01 18:00:00', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])
group date indicator value
0 A 2022-09-01 10:00:00 False 2
1 A 2022-09-01 12:00:00 True 3
2 A 2022-09-01 14:00:00 False 1
3 B 2022-09-01 13:00:00 False 1
4 B 2022-09-01 16:00:00 True 4
5 B 2022-09-01 18:00:00 False 3
我想每組每小時填寫一次缺失的日期,其中值與前一個現有行相同。當前一個現有行的指示符為 True 時,我想用 False 而不是 True 的指示符填充這些缺失的行,但值雖然相同。這是所需的輸出:
data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 11:00:00', False, 2], ['A', '2022-09-01 12:00:00', True, 3], ['A', '2022-09-01 13:00:00', False, 3], ['A', '2022-09-01 14:00:00', False, 1],
['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 14:00:00', False, 1], ['B', '2022-09-01 15:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4], ['B', '2022-09-01 17:00:00', False, 4], ['B', '2022-09-01 18:00:00', False, 3]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])
group date indicator value
0 A 2022-09-01 10:00:00 False 2
1 A 2022-09-01 11:00:00 False 2
2 A 2022-09-01 12:00:00 True 3
3 A 2022-09-01 13:00:00 False 3
4 A 2022-09-01 14:00:00 False 1
5 B 2022-09-01 13:00:00 False 1
6 B 2022-09-01 14:00:00 False 1
7 B 2022-09-01 15:00:00 False 1
8 B 2022-09-01 16:00:00 True 4
9 B 2022-09-01 17:00:00 False 4
10 B 2022-09-01 18:00:00 False 3
如您所見,每個組每小時填充一次日期,當前一個指標為 True 時,這些指標變為 False。
所以我想知道是否有人知道如何在每組每小時填充這些缺失的日期,并注意指標何時為 True using pandas
?
uj5u.com熱心網友回復:
首先創建DatetimeIndex
,DataFrame.set_index
然后在 lambda 函式中添加缺失小時數DataFrame.asfreq
,最后替換缺失值Series.fillna
并轉發缺失值:
df['date'] = pd.to_datetime(df['date'])
df = (df.set_index('date')
.groupby('group')[['indicator', 'value']]
.apply(lambda x: x.asfreq('H'))
.assign(indicator = lambda x: x['indicator'].fillna(False),
value = lambda x: x['value'].ffill())
.reset_index())
print (df)
group date indicator value
0 A 2022-09-01 10:00:00 False 2.0
1 A 2022-09-01 11:00:00 False 2.0
2 A 2022-09-01 12:00:00 True 3.0
3 A 2022-09-01 13:00:00 False 3.0
4 A 2022-09-01 14:00:00 False 1.0
5 B 2022-09-01 13:00:00 False 1.0
6 B 2022-09-01 14:00:00 False 1.0
7 B 2022-09-01 15:00:00 False 1.0
8 B 2022-09-01 16:00:00 True 4.0
9 B 2022-09-01 17:00:00 False 4.0
10 B 2022-09-01 18:00:00 False 3.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/525417.html
標籤:Python熊猫数据框