Relative Content

Tag Archive for apache-spark-sqlamazon-redshift

Calculate running sum in Spark SQL

I am working on a logic where I need to calculate totalscan, last5dayscan, month2dayscan from dailyscan count. As of today I sum the dailyscan count daily but now data volume is making it tough for compute. As a new approach, I am thinking of using a running sum but I am not able figure out how do I calculate running sum on totalscan i.e. today’s total scan will be – last totalscan value + today’s scan count (where last totalscan can also be 1 month or 2 month back)