I am working in jupyterlab, Python 3.7, PySpark 3 and going to download the result of sql query tab to csv via pyspark and pandas:
import gc
import time
while True:
sql_query = ‘’’ select * from table ‘’’
pdf = spark.sql(sql_query).toPandas()
pdf.to_csv(‘name.csv’, sep=‘;’)
gc.collect()
del pdf
time.sleep(10)
The problem is that if i do this in a while True loop, kernel memory decreases with every lap.
How do i clean memory after downloading to csv?
Btw i ve also tried to use lru_cache
and wrap part of code to a function:
import gc
import time
from functools import lru_cache
while True:
@lru_cache(maxsize=None)
def func():
sql_query = ‘’’ select * from table ‘’’
pdf = spark.sql(sql_query).toPandas()
Return pdf
func().to_csv(‘name.csv’, sep=‘;’)
func.cache_clear()
time.sleep(10)
This helps to decrease the consumption of memory but just a little bit.
New contributor
Jackie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.