I’m trying to format my data for a ML program. There are 33,000 events and each event has 3 things I want to consider: Mass, Energy, a coordinate.
The Mass is of the shape (33000,) and looks like: [188.9 189.0 125.7 … 127.4 201.0 210.1].
Energy is also (33000,) and looks the same: [1 2 3 … 8 9 10]. I then also have a 10-dimensional coordinate vector of the shape (33000,10)
Each coordinate is a 10-dimensional vector containing 10 coordinate points:
Coordinate Array:
[[19.9 613.0 6.5 127.4 486.4 54.3 194.0 19.4 194.0 32.3]
[1.89 1.01 4.9 ... 2.3 2.3 2.3]
[1.2 6.1 4.0 ... 1.7 1.7 1.7]
...
]
I want to feed these into a machine-learning program. However, I don’t want to create an array that squashes the 10-dimensional coordinate into a flat set of float values that looks like:
[188.9 1 19.9 613.0 6.5 127.4 486.4 54.3 194.0 19.4 194.0 32.3]
[189.0 2 1.89 1.01 4.9 ... 2.3 2.3 2.3]
...
This would lose the information that the last 10 values are intrinsically tied together, as they are a coordinate. Instead, I would like to create a numpy array that has a vector in the middle of the array
[188.9 1 [19.9 613.0 6.5 127.4 486.4 54.3 194.0 19.4 194.0 32.3]]
[189.0 2 [1.89 1.01 4.9 ... 2.3 2.3 2.3]]
...
This way the machine learning program knows to consider the coordinate vector as its own feature, not as a set of different features. So the actual shape might be (33000,3) rather than (33000,13) Is this possible?
I’ve tried dstack, concatenate, stack, etc. All have the issue that “the axis must match exactly”. In my case, the axis does not match. One feature has an axis of 10, and the others have either no axis (33000,) or 1 axis (33000,1) if you force it to have an axis. I’m not sure if there is a numpy array fact I’m missing or if this just might not be possible.
Liam B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.