Polars memory usage just building up and not coming down
I have an API which does alot of computation and write the output to database and also return output. This was largely in pandas, I am trying to use polars for this. A very basic code to mimic this API is below. I have this service running and the memory usage is just going up and up. I suspect soon the workers will restart running out of memory and this will cause a downtime. Past threads have conveyed that the memory allocator keeps the memory during the duration of process but as this is running as supervisor worker the process is always on unless restarted manually or in other situation like SIGTERM/SIGKILL etc..
How to solve this? One of the option that is mentione is that worker process dies once it has process, not sure how to do that. Also, not sure if that is ideal as new worker coming up has its own overhead?
Polars + Python: extracting from a column containing a list of structs, using another column containing values a field of the structs much match
Consider the example:
Parse string column containing date with only month and year
For example:
Python + Polars: parse string column containing date with only month and year
For example:
Rolling KPI calculations in polars, index not visible
How to add rolling KPI’s to original dataframe in polars? when I do group by, I am not seeing an index and so cant join? I want to keep all original columns in dataframe intact but add rolling kpi to the dataframe?
How can I convert a float value to datetime with higher precision in polars?
The restrictions I have:
Cast a column to other type, but there is a probability that the column does not exist in Polars
I want to cast a column to another type, but there is a possibility that the entire column does not exist in the df.
Best way to aggregate an iterable of `polars.DataFrame` or `polars.Series` objects
I am looking for the best way to compute a per-row running sum (average) over a large number of polars.DataFrames
, where each of the frames can potentially have a large number of rows. I’d like the implementation to be efficient (fast) but I want to keep the memory footprint in check, e.g. never assemble all the frames in memory before doing the aggregation.
Calculating specific measures of lists using polars
I have a dataframe in polars that look like this
Parse a pretty-printed string representation of a DataFrame back into a Polars DataFrame?
I have a string representation of a DataFrame
: