PySpark with RDD – How to calculate and compare averages?
I need to solve a problem where a company wants to offer k different users free use (a kind of coupon) of their application for two months. The goal is to identify users who are likely to churn (leave the system) and select from them the k users that should be retained. A “retention” user is defined as someone who brings a lot of value to the company by frequently listening to music, but has shown a significant decrease in their usage frequency recently.