我有一個pandas DataFrame
,它存盤股票價格和時間,時間列的型別是pd.datetime
。
這是一個演示:
import pandas as pd
df = pd.DataFrame([['2022-09-01 09:33:00', 100.], ['2022-09-01 09:33:14', 101.], ['2022-09-01 09:33:16', 99.4], ['2022-09-01 09:33:30', 100.9]], columns=['time', 'price'])
df['time'] = pd.to_datetime(df['time'])
In [11]: df
Out[11]:
time price
0 2022-09-01 09:33:00 100.0
1 2022-09-01 09:33:14 101.0
2 2022-09-01 09:33:16 99.4
3 2022-09-01 09:33:30 100.9
我想在 15 秒內計算未來回報。(15秒后的第一個價格-當前價格)
我想要的是:
In [13]: df
Out[13]:
time price return
0 2022-09-01 09:33:00 100.0 -0.6 // the future price is 99.4, period is 16s
1 2022-09-01 09:33:14 101.0 -0.1 // the future price is 100.9, period is 16s
2 2022-09-01 09:33:16 99.4 NaN
3 2022-09-01 09:33:30 100.9 NaN
我知道df.diff
可以得到索引的差異,有什么好的方法可以做到這一點嗎?
uj5u.com熱心網友回復:
merge_asof
救援
15s
從資料幀中減去一個時間增量,right
然后time
使用merge_asof
它direction=forward
選擇資料幀中的第一行,right
其 on 鍵大于或等于資料幀中的 on 鍵,left
然后減去該price
列以計算return
df1 = pd.merge_asof(
left=df,
right=df.assign(time=df['time'] - pd.Timedelta('15s')),
on='time', direction='forward', suffixes=['', '_r']
)
df1['return'] = df1.pop('price_r') - df1['price']
結果
time price return
0 2022-09-01 09:33:00 100.0 -0.6
1 2022-09-01 09:33:14 101.0 -0.1
2 2022-09-01 09:33:16 99.4 NaN
3 2022-09-01 09:33:30 100.9 NaN
uj5u.com熱心網友回復:
請試試這個(但我不相信輸出很有意義:-()。這是你所期望的嗎?(我意識到這個代碼分配了前“15”秒的回報,而不是下一個“15”秒. 但這就是回報通常的索引方式——在它實作的時候,而不是在未來仍然預期的時候)。
import numpy as np
import pandas as pd
df = pd.DataFrame([['2022-09-01 09:33:00', 100.], ['2022-09-01 09:33:14', 101.], ['2022-09-01 09:33:16', 99.4], ['2022-09-01 09:33:30', 100.9]], columns=['time', 'price'])
df['time'] = pd.to_datetime(df['time'])
df = df.sort_values('time').reset_index(drop=True)
df.loc[:, 'return'] = df['price'].diff()
df['time_diff'] = df['time'].diff()
df['15sec_or_more'] = (df['time_diff'] >= np.timedelta64(15, 's'))
for k, i in enumerate(df.index):
if k:
if not df.loc[i,'15sec_or_more']:
temp = df.iloc[k:].loc[:,['return','time_diff']].cumsum(axis=0)
conds = (temp['time_diff'] >= np.timedelta64(15, 's'))
if conds.sum():
true_return_index = conds.idxmax()
df.loc[i, 'return'] = df.loc[true_return_index, 'return']
else:
df.loc[i, 'return'] = np.nan
df = df[['time', 'price' ,'return']]
print(df)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/507682.html