Consider the following code (condensed for brevity):
def modelTrainer(config):
model = myModel(dim1=config['dim1'], dim2=config['dim2'], dropout=config['dropout'])
optimizer = Adam(model.parameters(), lr=config['lr'])
trainDataset, valDataset, testDataset = loadData()
trainData = DataLoader(trainDataset, shuffle=True)
valData = DataLoader(valDataset, shuffle=True)
if checkpoint:
model.load_state_dict(...)
model.train() #train model on trainDataset
...
model.eval() #evaluate model on valDataset
...
if __name__ == "__main__":
paramSpace = {'dim1': [10], 'dim2': [32, 64])}
pbtParamSpace = {'dropout': tune.uniform(0.1, 0.3), 'lr': tune.loguniform(1e-4, 1e-1)}
pbtScheduler = PopulationBasedTraining(hyperparam_mutations = pbtParamSpace)
tuner = tune.Tuner(modelTrainer, param_space=paramSpace, tune_config=tune.TuneConfig(scheduler=pbtScheduler))
results = tuner.fit()
I am tuning the dropout
and lr
for 2 possible model configurations of [dim1, dim2]: [10, 32] and [10, 64]
. I have the following questions:
- Does every trial in Ray execute every line of code in the Trainable (modelTrainer) when starting, cloning & restoring trials?
- Should the line
trainDataset, valDataset, testDataset = loadData()
be outside the Trainable? Otherwise when a trial is resumed or cloned this can cause the 3 sets to mix across epochs. - If say after the perturbation interval, the
[10, 64]
model has a better metric than the[10, 32]
model. Then (withquantile fraction=0.5
), from the next epoch, is Tune going to trial 2 versions of the[10, 64]
model (one the original and the other cloned from the original with perturbeddropout
andlr
values)? - Should I add extra lines of code for the perturbed parameters to be applied to the cloned trial (i.e., what prevents the original and cloned trial to run with the same set of parameters)?
- With the PBT scheduler, is the implicit assumption that all the parameters in
hyperparam_mutations
should apply to every possible combination of parameters inparam_space
?