From one single full raw data, I have to extract some data and dump multiple type of data.
In Simple Example
case class A (id: Int, age: Int, name: String)
case class B (id: Int, age: Int, married: Boolean)
case class C (id: Int, age: Int, gender: String)
Dumping it sequentially might be naive solution.
But since those case classes have multiple common fields, I think it is quite inefficient.
What I though was making some parent trait or abstract class and extends it.
And then add category field in case class.
Finally use partitionBy(category
) to dump in different path.
In example
trait P(category: String, id: Int, age: Int)
case class A (category: String, id: Int, age: Int, name: String)
case class B (category: String, id: Int, age: Int, married: Boolean)
case class C (category: String, id: Int, age: Int, gender: String)
On my IDE, it looks like making RDD[P]
works, but transforming it in to `DataSet(through .toDS() method) could not work.
Is there any way to solve it?
(Not making class with complex methods like compare or etc…)