I am constructing a dataset with 6 columns and 6700 rows. This data is obtained from photon dose conversion coefficients that were extracted from various Monte Carlo simulations in different studies. The columns in the data include Energy, organ name, organ mass, organ density, dose from AP, dose from PA, and dose from lateral. The energy rows represent the calculated dose per energy bins from 1keV to 20 MeV, divided into 20 bins. The name of the organ , organ mass and organ density are repeated for each bin.
The data can be used for numerical fitting if we exclude the organ name. However, if we encode the organ names using a one-hot encoder, the data can be used more effectively and the cross validation shows better results vs numrical data only. The doses for different organs are taken from different phantoms, with some phantoms having more organs than others. After using a one-hot encoder, the total number of combined organs is for 32.
The issue arises when trying to predict organs from new phantoms, as the models show a symmetry error between the input and target file. My question is, how can we solve this problem?
The file work if lets say I use one phantom with 1000 bin for 20 organs and predicted another phantom for a 1000 bins and 20 organ exactly the same with different organs masses and densities
How to use catogrical whole data (6700 bins from 32 organ to predict (1000 bins for 32 organ)?