I am running data processing jobs that are
– Inputs (Pandas Series)
– Python function
I’d like to make this so that if inputs (function arguments) and the function itself (function body and code) are the same, I do not run the calculation, but just cache the results on the disk.
While function results are easy to cache in-process memory, in my case there might be multiple runs over multiple months. Also the underlying Python source code may change, with the change of underlying Python code.
I’d need to detect when the function code has changed and have a unique hash for each function by its code. If the function body changes I need to rerun the calculation and write the new cached result to the disk.
E.g. detect and hash the difference between
def my_task(a: pd.Series, b: pd.Series):
return a+b
And:
def my_task(a: pd.Series, b: pd.Series):
return a + b - 2
How can I calculate a hash for a Python function code? Or anything that is close enough to this?
Furthermore, these Python functions can reside within Jupyter Notebooks, not just in Python modules.