我在嘗試連接并set
在多個列上使用時遇到問題。
這是一個示例 df:
df = pd.DataFrame({'customer id':[1,2,3,4,5],
'email1':['[email protected]',np.nan,'[email protected]',np.nan, np.nan],
'email2':['[email protected]' ,np.nan,'[email protected]','[email protected]', np.nan],
'email3':['[email protected]',np.nan,'[email protected]','[email protected]', '[email protected]']})
東風:
customer id email1 email2 email3
0 1 [email protected] [email protected] [email protected]
1 2 NaN NaN NaN
2 3 [email protected] [email protected] [email protected]
3 4 NaN [email protected] [email protected]
4 5 NaN NaN [email protected]
我想創建一個具有所有列(email1、email2 和 email3)的唯一值的新列,因此創建的列將為每個客戶提供一組唯一的電子郵件,一些電子郵件有不同的情況(上、下 .. 等)
這是我到目前為止所做的:
df['ALL_EMAILS'] = df[['email1','email2','email3']].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1)
對于超過 50 萬客戶的 df,這大約需要3 分鐘!
然后我創建了一個函式來處理輸出并在單元格不為空時獲取唯一值:
def checkemail(x):
if x:
#to_lower
lower_x = x.lower()
y= lower_x.split(',')
return set(y)
然后將其應用于列:
df['ALL_EMAILS'] = df['ALL_EMAILS'].apply(checkemail)
但我在 ALL_EMAILS 列下得到了錯誤的輸出!
ALL_EMAILS
0 { [email protected], [email protected], [email protected]}
1 None
2 { [email protected], [email protected]}
3 { [email protected], [email protected]}
4 {[email protected]}
uj5u.com熱心網友回復:
讓filter
電子郵件之類的列然后stack
轉換為系列,然后轉換為小寫并與set
on聚合level=0
email = df.filter(like='email')
df['all_emails'] = email.stack().str.lower().groupby(level=0).agg(set)
customer id email1 email2 email3 all_emails
0 1 [email protected] [email protected] [email protected] {[email protected], [email protected]}
1 2 NaN NaN NaN NaN
2 3 [email protected] [email protected] [email protected] {[email protected], [email protected]}
3 4 NaN [email protected] [email protected] {[email protected]}
4 5 NaN NaN [email protected] {[email protected]}
uj5u.com熱心網友回復:
嘗試直接處理這些值,而不是加入它們然后再次拆分:
df['ALL_EMAILS'] = df.filter(like='email').apply(lambda x: set(x.dropna().str.lower()) or None, axis=1)
輸出:
customer id email1 email2 email3 ALL_EMAILS
0 1 [email protected] [email protected] [email protected] {[email protected], [email protected]}
1 2 NaN NaN NaN None
2 3 [email protected] [email protected] [email protected] {[email protected], [email protected]}
3 4 NaN [email protected] [email protected] {[email protected]}
4 5 NaN NaN [email protected] {[email protected]}
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/507952.html
標籤:Python python-3.x 熊猫 级联