I have an API which does alot of computation and write the output to database and also return output. This was largely in pandas, I am trying to use polars for this. A very basic code to mimic this API is below. I have this service running and the memory usage is just going up and up. I suspect soon the workers will restart running out of memory and this will cause a downtime. Past threads have conveyed that the memory allocator keeps the memory during the duration of process but as this is running as supervisor worker the process is always on unless restarted manually or in other situation like SIGTERM/SIGKILL etc..
How to solve this? One of the option that is mentione is that worker process dies once it has process, not sure how to do that. Also, not sure if that is ideal as new worker coming up has its own overhead?
Working code:
from flask import Flask, request, jsonify
import polars as pl
import numpy as np
app = Flask(__name__)
@app.route('/generate', methods=['GET'])
def generate_dataframe():
try:
seed = int(request.args.get('seed', 0))
except ValueError:
return jsonify({"error": "Invalid seed value. Must be an integer."}), 400
np.random.seed(seed)
data = {
'column1': np.random.randint(0, 100, 1_000_000),
'column2': np.random.rand(1_000_000),
'column3': np.random.choice(['A', 'B', 'C', 'D'], 1_000_000)
}
df = pl.DataFrame(data)
return df.to_json(), 200
if __name__ == '__main__':
app.run(debug=True)
Supervisor config:
[program:app]
command=gunicorn -w 100 -b 127.0.0.1:5010 app:app
directory=/home/master/
user=root
autostart=true
autorestart=true
stderr_logfile=/var/log/app/polarsapp.err.log
stdout_logfile=/var/log/app/polarsapp.out.log