我有一個如下所示的資料框
----------- -------------- ---------------- -------------------- -----------------------
|CUSTOMER_ID|mkt_channel_id|mkt_channel_name|mkt_channel_category|mkt_channel_subcategory|
----------- -------------- ---------------- -------------------- -----------------------
| 1794405| 8| email| e_cat| send|
| 19911215| 9| email| e_cat| delivery|
| 18907679| 10| email| e_cat| open|
| 18106624| 11| email| e_cat| click|
| 8335735| 8| email| e_cat| send|
| 17912034| 11| email| e_cat| click|
我需要為每個客戶創建 3 列,例如電子郵件總數sent
(發送 遞送)open
、click
我正在使用下面的代碼,它只創建單列
df = d.groupfBy('CUSTOMER_ID','mkt_channel_id').agg(F.count('mkt_channel_subcategory'))
我的決賽桌應該有以下列
CUSTOMER_ID|mkt_channel_id|mkt_channel_name|mkt_channel_category|mkt_channel_subcategory|sent|open|click
誰能告訴我該怎么做?
uj5u.com熱心網友回復:
看來您想旋轉該列mkt_channel_subcategory
。您可以先合并事件值send
,delivery
然后旋轉列并計數。
這樣的事情適用于您的輸入示例:
from pyspark.sql import functions as F
df = df.withColumn(
"mkt_channel_subcategory",
F.when(
F.col("mkt_channel_subcategory") == "delivery",
F.lit("send")
).otherwise(F.col("mkt_channel_subcategory"))
).groupby(
"CUSTOMER_ID", "mkt_channel_id", "mkt_channel_name", "mkt_channel_category"
).pivot(
"mkt_channel_subcategory"
).agg(F.count("*")).fillna(0)
df.show()
# ----------- -------------- ---------------- -------------------- ----- ---- ----
# |CUSTOMER_ID|mkt_channel_id|mkt_channel_name|mkt_channel_category|click|open|send|
# ----------- -------------- ---------------- -------------------- ----- ---- ----
# | 18907679| 10| email| e_cat| 0| 1| 0|
# | 1794405| 8| email| e_cat| 0| 0| 1|
# | 17912034| 11| email| e_cat| 1| 0| 0|
# | 19911215| 9| email| e_cat| 0| 0| 1|
# | 8335735| 8| email| e_cat| 0| 0| 1|
# | 18106624| 11| email| e_cat| 1| 0| 0|
# ----------- -------------- ---------------- -------------------- ----- ---- ----
同樣可以使用條件聚合來實作:
df = df.groupby(
"CUSTOMER_ID", "mkt_channel_id", "mkt_channel_name", "mkt_channel_category"
).agg(
F.sum(F.when(F.col("mkt_channel_subcategory").isin("delivery", "send"), 1).otherwise(0)).alias("sent")
# ... same for other columns
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/497521.html