How can filter and retrieve specific records from big data efficiently using Python/Pyspark in Google Colab medium? I’m struggling with a data engineering problem: