I need help about using get_column_pair_plot
because I have difficulty on understanding how to use MultiTableMetadata
. Consider the following data :
import numpy as np
import pandas as pd
from sdv.metadata import SingleTableMetadata, MultiTableMetadata
# Generate the data
n = 1000
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)
original_data = pd.DataFrame({'x': x, 'y': y})
m = 800
x = np.random.normal(0, 1, m)
y = np.random.normal(0, 1, m)
synthetic_data = pd.DataFrame({'x': x, 'y': y})
I would like to see what kind of heatmap does Synthetic Data Vault offer as specified in https://docs.sdv.dev/sdv/multi-table-data/evaluation/visualization . So I try to write :
metadata = MultiTableMetadata()
metadata.detect_from_dataframes(
data = {
'my table': original_data,
'my other table': synthetic_data
}
)
from sdv.evaluation.multi_table import get_column_pair_plot
fig = get_column_pair_plot(
real_data=original_data,
synthetic_data=synthetic_data,
table_name='my other table',
column_names=['x', 'y'],
metadata=metadata,
plot_type = "heatmap"
)
fig.show()
However, I am really confused about how to make multi-table metadata to work. I cannot find any meaningful examples or documentation online apart from https://docs.sdv.dev/sdv/multi-table-data/data-preparation/multi-table-metadata-api. By my understanding, it seems “multitable” just means I need a dictionary of each data frame I have and it shall generate (?) a metadata for each data frame? The visualization code does not run. It would be most appreciated to answer me how exactly does multitable function and to modify my code to make it runnable.