I am working with MATLAB on a model reduction algorithm. It is basically a data processing pipeline.
ckt = generate_ckt(ckt_properties);
freq = generate_fpoints(fconfig);
result = freq_dom_sim(ckt,freq);
red_ckt = run_PRIMA(ckt, red_order);
Each of these are potentially time consuming activities, being that the data I work with is pretty big (10000 × 10000 matrices). So in a previous implementation I had all of these as separate scripts that I had to execute one by one (manually or run a master script). Each of these stored the data in .mat
files. The next program would read from this and write its own result in another directory. And so on.
What I would like to use is a framework that can store the dependencies between various pieces of data, such that at any point of time I can just ask it to generate the output.
It should :
- Check if the variable is present in the workspace.
- If it is, check if its consistent with the expected properties (check with the
config
data) - If not, load from file (the exact path to the file will be pre-specified).
- Check if its consistent with the expected properties.
- If not, compute it from the command associated with it. (pre-specified)
I would like this to be recursive, so that effectively I run the last module and it automatically runs checks and actually computes only those pieces of data that are not already available and consistent.
Can you give some suggestions on how to design this? If it is already called something (I assume it must) please point me to it.
What you are describing in your ideal solution is very similar to what is provided by the make
program and makefiles. A makefile essentially expresses a dependency graph from a set of output files, through a set of intermediate files, to a set of input files, along with commands to transform a file at one step to the next.
Inferring names for the various functions you mention above, you might get something like this:
ckt.mat : ckt_properties.mat
matlab -r generate_ckt.m ckt_properties.mat
freq.mat : fconfig.mat
matlab -r generate_fpoints.m fconfig.mat
result.mat : ckt.mat freq.mat
matlab -r freq_dom_sim.m ckt.mat freq.mat
red_ckt.mat : ckt.mat red_order.mat
matlab -r run_PRIMA.m ckt.mat red_order.mat
This says that ckt.mat
depends on ckt_properties.mat
, and you can generate ckt.mat
when you need to by running matlab generate_ckt.m ckt_properties.mat
on the command line. “When you need to” means when the modification time of the source (ckt_properties.mat
) is newer than that of the target (ckt.mat
).
Now maybe you can do everything with files and makefiles, but this keeps you largely outside of Matlab’s IDE. You could also do something purely within Matlab by creating a structure that mimics the aspects of the filesystem that make relies upon, namely file names, modification times, and contents. In other words, create structures that bind a matrix and a modification time (perhaps held as a simple scalar) under a name. Then you would need another structure that encodes the dependency relationships, which is essentially a list of tuples containing a target structure, a list of source structures, and a transformation function. All this is doable (and might even have been done, I don’t know), but it might be easier to just use makefiles.
6