I am creating some scripts using python and I wanted to utilize Hydra (https://hydra.cc/). I am following the structured config pattern, where I have a config.py
and config.yaml
in a conf
directory, and I am validating my config using dataclasses, for example:
#### config.yaml
visualisation_conf:
degradation:
data_path: /measurements
input_data_formats: .csv
#### config.py
@dataclass
class VisualisationConfig:
degradation: DegradationConfig
@dataclass
class DegradationConfig:
data_path: str
input_data_formats: str
@dataclass
class MainHydraConfig:
visualisation_conf: VisualisationConfig
#### Usage in script:
import hydra
from conf.config import MainHydraConfig
@hydra.main(version_base=None, config_path="../conf", config_name="config")
def main(base_cfg: MainHydraConfig):
input_type = base_cfg.visualisation_conf.degradation.input_data_formats
...
This worked fine, until I needed to load a json file and add it to my configuration. I have an external json file (called “schema.json” and residing in conf
directory) that I would like to load at runtime and merge with my main config. I have tried something like this:
#### config.yaml
dataset_schema:
schema_path: ${hydra:runtime.cwd}/conf/schema.json
schema: ...
#### config.py
@dataclass
class JSONSchemaConfig:
schema_path: str
schema: dict = field(default_factory=dict)
def load_schema(self):
if os.path.exists(self.schema_path):
with open(self.schema_path, "r") as f:
self.schema = json.load(f)
@dataclass
class MainHydraConfig:
json_schema: JSONSchemaConfig = field(default_factory=JSONSchemaConfig)
def __post__init__(self):
self.json_schema.load_schema()
merged_conf = OmegaConf.create(self.json_schema.schema)
OmegaConf.merge(self, merged_conf)
however, nothing is being loaded at runtime, as the __post_init__ is not being called. Is what I want to achieve even possible in OmegaConf/Hydra? Maybe I should try a different approach for my end goal, that is, loading a json file as dict and merging it with rest of the configuration? I know I can move the logic related to loading this json to the script itself, but I am pretty sure I am just missing some small detail that prevents this from working as expected.
OmegaConf.merge() operates on OmegaConf configs. in some cases it converts the input to OmegaConf configs using OmegaConf.create().
It does not change the inputs, instead if returns a merged config object.
It will certainly not modify your dataclass instance directly.
Your design seems incompatible.
The dataclasses are only for schema backing and duck-typing. While it looks like a MainHydraConfig
class it is actually a DictConfig
without any functionality of the schema classes.
If you want to call the __post_init__
function you actually need to create on of the real MainHydraConfig
classes.
As a solution, you could write an extra function that takes care of it. I think with the below function the __post_init__
is not necessary.
@dataclass
class MainHydraConfig:
...
@staticmethod
def postprocess(settings : omegaconf.DictConfig):
"""
insert the settings.json_schema : DictConfig as self
and replaces the settings.json_schema.schema node
"""
JSONSchemaConfig.load_schema(settings.json_schema)
# ----
@hydra.main(version_base=None, config_path="../conf", config_name="config")
def main(base_cfg: MainHydraConfig):
MainHydraConfig.postprocess(base_cfg)
# obv you could also use the one liner directly, but imo a descriptive name sounds better.
# JSONSchemaConfig.load_schema(base_cfg.json_schema)
If you really want to use the __post_init__
you could use
OmegaConf.create(OmegaConf.to_object(base_cfg)) # -> real MainHydraConfig -> DictConfig
, as one of many possible ways.
You could pack everything into a decorator to keep your main function more clean https://hydra.cc/docs/advanced/decorating_main/
@hydra.main(...)
@postproces # write decorator that takes care of parsing and inserting the json file
def main(base_cfg):
(Edit Hydra only) – use a callback for postprocessing
To modify the config and have no extra python code the only other way I see is by using callbacks. Note: If you run your script remote or with multirun check that you set up the callback correctly.
from config import JSONSchemaConfig
from hydra.experimental.callback import Callback
class ParseJsonCallback(Callback):
def on_job_start(self, config: DictConfig, *, task_function: TaskFunction, **kwargs: Any) -> None:
"""
Called in both RUN and MULTIRUN modes, once for each Hydra job (before running application code).
The `task_function` argument is the function
decorated with `@hydra.main`.
"""
# modify the config
JSONSchemaConfig.load_schema(config.json_schema)
To execute the callback register it, eg. by adding
# conf.yaml
hydra:
callbacks:
insert_json:
_target_ : <modules to>.ParseJsonCallback
4