我有以下資料框:
資料框1:
---- ------------------------------------------------------------------
|id |records |
---- ------------------------------------------------------------------
|473 |[{1, 8414932001257, 34301, 70.0}, {2, 015878529935, 34301, 140.0}]|
|1529|[{1, 54490802, 34301, 70.0}] |
|2052|[{1, 016229901172, 34301, 70.0}, {1, 8410793206138, 34304, 40.0}] |
|894 |[{1, 8429359001223, 34301, 70.0}] | |
|2053|[{1, 8480012007242, 34304, 40.0}, {1, 8420030011050, 34301, 70.0}]|
---- ------------------------------------------------------------------
誰的架構是:
StructType(
StructField(id,LongType,true),
StructField(records,ArrayType(
StructType(
StructField(count,LongType,true),
StructField(barcode,StringType,true),
StructField(height,LongType,true),
StructField(weight,DoubleType,true)),
true),
true)
)
資料框2:
------------ --------
| barcode|itemType|
------------ --------
|015878529935|Box |
|015878539989|Box |
|016229901141|Can |
|016229901172|Box |
|016229901189|Can |
------------ --------
誰的架構是:
StructType(
StructField(barcode,StringType,true),
StructField(itemType,StringType,true)
)
我想以具有列的barcode
方式加入這兩個資料框,所以我會有一個類似的模式:Dataframe1
itemType
Dataframe2
StructType(
StructField(id,LongType,true),
StructField(records,ArrayType(
StructType(
StructField(count,LongType,true),
StructField(barcode,StringType,true),
StructField(material,LongType,true),
StructField(weight,DoubleType,true)),
StructField(itemType,StringType,true)),
true),
true)
)
我嘗試了以下代碼:
dataframe1.join(dataframe2, dataframe1("records.barcode") === dataframe2("barcode"), "leftouter")
Result:
AnalysisException: cannot resolve '(spark_catalog.database.table1.`records`.`barcode` = spark_catalog.database.table2.`barcode`)' due to data type mismatch: differing types in '(spark_catalog.database.table1.`records`.`barcode` = spark_catalog.database.table2.`barcode`)' (array<string> and string).;
但從邏輯上講,它失敗了,因為 records 列是一個 ArrayStructType
并且barcode
是一個StringType
. 我的問題是我不知道如何訪問records.barcode
陣列中每個元素的級別。
uj5u.com熱心網友回復:
您的代碼非常好..您只需對代碼進行少量更改即可按預期作業。
dataframe1
.withColumn("records", explode_outer($"records"))
// small change you have to make here.. because records is an array column .. to access barcode you have to explode array then join with other data frame
.join(
dataframe2,
dataframe1("records.barcode") === dataframe2("barcode"),
"leftouter"
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/508556.html