Our Python application is hanging on these 2 particular machines after 10-20 minutes of use. Htop shows 100% CPU usage. I used Pystack to get the stack trace of the running process. The Python side of the stack trace shows nothing interesting, it was just some dictionary look up (and each time it hangs they are at different code). But at the last call Pystack shows that it is stuck at this particular line in CPython source code (the while loop):
https://github.com/python/cpython/blob/v3.12.3/Modules/_asynciomodule.c#L3594
module_traverse(PyObject *mod, visitproc visit, void *arg)
{
asyncio_state *state = get_asyncio_state(mod);
Py_VISIT(state->FutureIterType);
Py_VISIT(state->TaskStepMethWrapper_Type);
Py_VISIT(state->FutureType);
Py_VISIT(state->TaskType);
Py_VISIT(state->asyncio_mod);
Py_VISIT(state->traceback_extract_stack);
Py_VISIT(state->asyncio_future_repr_func);
Py_VISIT(state->asyncio_get_event_loop_policy);
Py_VISIT(state->asyncio_iscoroutine_func);
Py_VISIT(state->asyncio_task_get_stack_func);
Py_VISIT(state->asyncio_task_print_stack_func);
Py_VISIT(state->asyncio_task_repr_func);
Py_VISIT(state->asyncio_InvalidStateError);
Py_VISIT(state->asyncio_CancelledError);
Py_VISIT(state->scheduled_tasks);
Py_VISIT(state->eager_tasks);
Py_VISIT(state->current_tasks);
Py_VISIT(state->iscoroutine_typecache);
Py_VISIT(state->context_kwname);
// Visit freelist.
PyObject *next = (PyObject*) state->fi_freelist;
while (next != NULL) {
// stuck inside this loop
PyObject *current = next;
Py_VISIT(current);
next = (PyObject*) ((futureiterobject*) current)->future;
}
return 0;
}
I believe this part of the code has something to do with garbage collection. What can I learn from this to troubleshoot the issue? Where should I look next?