I am having a hard time in transforming the ABC class to CustomClass.
Logic is, I want the count in CustomClass to be the total count of the groupyBy of a & b where date matches 1 year filter, while the t30Ycount and t30Ncount to be the count of groupBy but with filters applied for 30 days and flag.
However with the below logic i am getting
`[scalatest] org.apache.spark.sql.AnalysisException: expression ‘flag’ is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don’t care which value you get.;
case class ABC(a: String , b: Long, flag: String, date: Timestamp) extends Product {}
info
.filter(col(s"${abc.date}") > oneYear)
.groupBy(col(s"${abc.a}"), col(s"${abc.b}"))
.agg(
// Should be the count of total no of rows of groupBy above
count("a").as(s"${customClass.count}"),
// Should be the count of no of rows of group by above where flag is Y and date matches filter
when(lower(col(s"${abc.flag}")) === "Y".toLowerCase && col(s"${abc.date}") > thirtyDays,
count("a")).otherwise(lit(0)).as(s"${customClass.t30Ycount}"),
// Should be the count of no of rows of group by above where flag is N and date matches filter
when(lower(col(s"${abc.flag}")) === "N".toLowerCase && col(s"${abc.date}") > thirtyDays,
count("a")).otherwise(lit(0)).as(s"${customClass.t30Ncount}")
).as[CustomClass]
case class CustomClass(a: String
, b: Long
, count: Long
, t30Ycount: Long
, t30NCount: Long
)
2
This seems to be working unless someone can find a better solution
info
.filter(col(s"${abc.date}") > oneYear)
.groupBy(col(s"${abc.a}"), col(s"${abc.b}"))
.agg(
// Should be the count of total no of rows of groupBy above
count("*").as(s"${customClass.count}"),
// Should be the count of no of rows of group by above where flag is Y and date matches filter
sum(when(lower(col(s"${abc.flag}")) === "Y".toLowerCase && col(s"${abc.date}") > thirtyDays,
, 1).otherwise(lit(0)).as(s"${customClass.t30Ycount}"),
// Should be the count of no of rows of group by above where flag is N and date matches filter
sum(when(lower(col(s"${abc.flag}")) === "N".toLowerCase && col(s"${abc.date}") > thirtyDays,
, 1).otherwise(lit(0)).as(s"${customClass.t30Ncount}")
).as[CustomClass]