When trying to test something locally with Scala Spark, I noticed the following problem and was wondering what causes it, and whether there exists a workaround.
Consider the following build configuration and code:
// build.sbt
ThisBuild / scalaVersion := "2.13.14"
libraryDependencies += ("org.apache.spark" %% "spark-sql" % "3.5.1")
//test.sc
import org.apache.spark.sql._
val spark: SparkSession = SparkSession.builder()
.master("local")
.getOrCreate()
import spark.implicits._
val data = Seq(
(1, 10),
(2, 20)
)
val frame: DataFrame = data.toDF("id", "value")
frame.show()
This works and correctly outputs an ASCII-art of the daraframe. However, in Scala 3, something goes wrong with implicit resolution. Consider the new build configuration:
//build.sbt
ThisBuild / scalaVersion := "3.4.2"
libraryDependencies += ("org.apache.spark" %% "spark-sql" % "3.5.1").cross(CrossVersion.for3Use2_13)
lazy val root = (project in file("."))
.settings(
name := "InScala3"
)
The same code as presented above results in an error in the line
val frame: DataFrame = data.toDF("id", "value")
stating value toDF is not a member of Seq[(Int, Int)]
.
If I expand the implicits slightly and write
val frame: DataFrame = localSeqToDatasetHolder(data).toDF("id", "value")
I instead get
2 |val frame: DataFrame = localSeqToDatasetHolder(data).toDF("id", "value")
| ^
|Unable to find encoder for type (Int, Int). An implicit Encoder[(Int, Int)] is needed to store (Int, Int) instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases..
|I found:
|
| spark.implicits.newProductEncoder[(Int, Int)](
| /* missing */summon[scala.reflect.runtime.universe.TypeTag[(Int, Int)]])
|
|But no implicit values were found that match type scala.reflect.runtime.universe.TypeTag[(Int, Int)].
The error seems to point toward some reflection error. According to the error, this should work (I did everything that the error states to be necessary, which was to import spark.implicits._
) and yet it doesn’t.
2