Are there any good debuggers specifically for Jupyter Notebooks?
I’m building a C extension for python, it implements a well established algorithm that operates on a set of arrays to produce a single output array, which can have either 1 or 2 dimensions.
Currently my code is stable when I use a basic .py script to test it, even repeating the operation 1000 times. But I run into segmentation faults/memory errors when I use Jupyter notebooks (where I expect most people to want to use this module), or in the python terminal, it randomly crashes. The crash seems to only happen after I call a function from my C module multiple times, but the number of times it needs to be called to generate the crash varies. In the terminal the crash is either accompanied by a free invalid pointer
or malloc ...
error message.
The module is large, around 2000 lines of code and despite my best efforts, I haven’t (yet) been able to reproduce the error in any minimal code example- basically I only get the error in a “complete” piece of code.
So I’m looking for a good debugger to see where the issue lies, also any resources that detail the way in which memory is allocated/dealocated would be helpful.
I’ve tried adding printf statements throughout the code to isolate when the errors occured in the code, which let me fix some issues, but others still persist.
I have tried isolating components in different files and to opposite, trying to keep everything in the same file.
I’ve tried swapping calls to malloc
for PyDataMem_NEW
nothing seems to work.
harripd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
5
Are there any good [C-level] debuggers specifically for Jupyter Notebooks?
There’s no existing tool for this situation. There are debuggers that integrate nicely with Jupyter. There are debuggers that can debug C calls made by Python. There is not, to my knowledge, any debugger which does both.
Valgrind
In the terminal the crash is either accompanied by a
free invalid pointer
ormalloc ...
error message.
The first message suggests to me that you are freeing a pointer which was not given to you by malloc. (Or that the pointer is being corrupted somehow.) The second message suggests to me that you are corrupting metadata about an allocation, which the memory allocator usually puts right before the allocation.
Both of those things are something that Valgrind can detect. In order to run this under valgrind, you’d need to run valgrid <command you use to start jupyter notebook>
. However, there is a complication: the process you want to trace is a child of the notebook process. To trace children, you will need to use the --trace-children=yes
option. See also.
Be aware that Jupyter will be much slower, as Valgrind needs to track every memory access being made. You’ll also see some false positives, as not all of the errors will be related to your module.
You might also want to try running your code as a separate .py file under Valgrind. It might be able to detect a memory error that does not crash your program.
GDB
GDB can also work in this situation, and it’s a little more flexible, because it can attach to an existing process.
You can find a process ID of your notebook with the following Python:
import os
os.getpid()
You can attach to the process in GDB using this terminal command:
gdb -p PID
where PID
is the process ID you found before.
You may find the command break malloc_error_break
to be helpful. This will let you get a C stack trace of where your program is crashing, if it is crashing inside malloc.
However, this has a problem: the malloc call where your program is crashing is probably not the cause of your problem. As I mention before, you are probably writing past the end or before the beginning of some buffer, which causes the next call to malloc to fail. You are looking for some incorrect memory usage before this point.
I’ve tried swapping calls to
malloc
forPyDataMem_NEW
nothing seems to work.
That seems unlikely to matter either way, as PyDataMem_NEW
is essentially just calling malloc and optionally tracking memory usage. See the NumPy source code.