我正在使用最新的sbt.version=1.5.7
.
我assembly.sbt
的無非是addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.1.0")
。
由于需求需要,我必須與子專案合作。
我正面臨著Spark
與provided
這篇文章類似的范圍的依賴關系:如何有效地使用 SBT、Spark 和“提供的”依賴關系?
正如上面的帖子所說,我可以Compile / run
在根專案下管理,但Compile / run
在子專案中失敗。
這是我的build.sbt
詳細資訊:
val deps = Seq(
"org.apache.spark" %% "spark-sql" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-core" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "3.1.2" % "provided",
"org.apache.spark" %% "spark-avro" % "3.1.2" % "provided",
)
val analyticsFrameless =
(project in file("."))
.aggregate(sqlChoreography, impressionModelEtl)
.settings(
libraryDependencies = deps
)
lazy val sqlChoreography =
(project in file("sql-choreography"))
.settings(libraryDependencies = deps)
lazy val impressionModelEtl =
(project in file("impression-model-etl"))
// .dependsOn(analytics)
.settings(
libraryDependencies = deps Seq(
"com.google.guava" % "guava" % "30.1.1-jre",
"io.delta" %% "delta-core" % "1.0.0",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
)
)
Compile / run := Defaults
.runTask(
Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
)
.evaluated
impressionModelEtl / Compile / run := Defaults
.runTask(
impressionModelEtl / Compile / fullClasspath,
impressionModelEtl / Compile / run / mainClass,
impressionModelEtl / Compile / run / runner
)
.evaluated
在我impressionModelEtl / Compile / run
用一個簡單的程式執行之后:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object SparkRead {
def main(args: Array[String]): Unit = {
val spark =
SparkSession
.builder()
.master("local[*]")
.appName("SparkReadTestProvidedScope")
.getOrCreate()
spark.stop()
}
}
,它回傳
[error] java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
[error] at SparkRead$.main(SparkRead.scala:7)
[error] at SparkRead.main(SparkRead.scala)
[error] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[error] Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
[error] at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
這讓我困惑了好幾天。請幫幫我...非常感謝
uj5u.com熱心網友回復:
請嘗試添加 dependsOn
val analyticsFrameless =
(project in file("."))
.dependsOn(sqlChoreography, impressionModelEtl)
.aggregate(sqlChoreography, impressionModelEtl)
.settings(
libraryDependencies = deps
)
如果您正在使用切碎測驗類還添加
.dependsOn(sqlChoreography % "compile->compile;test->test",
impressionModelEtl % "compile->compile;test->test")
uj5u.com熱心網友回復:
最后我想出了一個解決方案。只需將build.sbt
父專案中的檔案分離到其子專案中即可。
就像在./build.sbt
:
import Dependencies._
ThisBuild / trackInternalDependencies := TrackLevel.TrackIfMissing
ThisBuild / exportJars := true
ThisBuild / scalaVersion := "2.12.12"
ThisBuild / version := "0.0.1"
ThisBuild / Test / parallelExecution := false
ThisBuild / Test / fork := true
ThisBuild / Test / javaOptions = Seq(
"-Xms512M",
"-Xmx2048M",
"-XX:MaxPermSize=2048M",
"-XX: CMSClassUnloadingEnabled"
)
val analyticsFrameless =
(project in file("."))
// .dependsOn(sqlChoreography % "compile->compile;test->test", impressionModelEtl % "compile->compile;test->test")
.settings(
libraryDependencies = deps
)
lazy val sqlChoreography =
(project in file("sql-choreography"))
lazy val impressionModelEtl =
(project in file("impression-model-etl"))
在impression-model-etl
目錄中,創建另一個build.sbt
檔案:
import Dependencies._
lazy val impressionModelEtl =
(project in file("."))
.settings(
libraryDependencies = deps Seq(
"com.google.guava" % "guava" % "30.1.1-jre",
"io.delta" %% "delta-core" % "1.0.0",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.1.3"
)
// , assembly / assemblyExcludedJars := {
// val cp = (assembly / fullClasspath).value
// cp filter { _.data.getName == "org.apache.spark" }
// }
)
Compile / run := Defaults
.runTask(
Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
)
.evaluated
assembly / assemblyOption := (assembly / assemblyOption).value.withIncludeBin(false)
assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${sparkVersion}_${version.value}.jar"
name := "impression"
并確保使用檔案將常見的 Spark 庫提取到父project
目錄Dependencies.scala
:
import sbt._
object Dependencies {
// Versions
lazy val sparkVersion = "3.1.2"
val deps = Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"org.apache.spark" %% "spark-avro" % sparkVersion % "provided",
...
)
}
完成所有這些步驟后,在子專案檔案夾中本地運行 Spark 代碼,同時將 Spark 依賴項設定為“提供”是正常的。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/397647.html
標籤:斯卡拉 阿帕奇火花 sbt sbt-汇编 sbt 插件
下一篇:Spring:DTO檔案大小驗證