I’m working on a simulation that will be running a certain operation upwards of hundreds of millions of times per frame, so performance in this loop is critical. I’m nearly at my performance target but I would like advice on if this is the best way, or there’s something I’m overlooking.
The general set up is a large pool of objects that interact with each other, each consists of an equation set from another class as a Func<T,T>, and takes an arbitrary number of inputs in the form of binary bits passed around as an int. It’s a large looping network that iterates every frame, very similar to how computers function, or a binary neural net? The logic in each object can be expected to be somewhat random, but most will only have 1-2 bits of input/output, however some will use the full 32. Some will also require a memory as well, fitting within an int. The logic process itself is already well optimized taking up only ~10ms per billion ops for smaller ones, 95% of my slowdown seems to come from packing/unpacking the signals to perform the operations on it.
I’ve tried many ways of doing this, here’s my current best:
public class Block
{
public Func<Signal, Signal> logic;
public int memory;
// Inputs and Output connections...
// Get/Sets...
public Signal Process (int input)
{
return logic(new Signal(input, memory));
}
}
public struct Signal
{
public int data;
public int memory;
// Get/Set methods...
// Constructor...
}
Each Block receives several bit flags from one/multiple inputs, makes a single int out of them, converts that int along with its memory in to a signal struct, and then processes it using its logic Func, and outputs a new Signal struct which gets split in to its outputs. Hundreds of millions of these every frame. The input/output to signal conversion isn’t shown here, but it’s the same process as what happens within the Process() method. I’ve already tested it and it takes the same time on both ends.
Some notes: Removing the memory int from the Signal struct, even if no blocks ever call for it, makes this run 4x as fast, but I’m not sure why the improvement is so drastic. Since most blocks don’t need memory, I tried to instead use interfaces to make multiple Signal types. Signals not needing memory could then run faster, but this made it run around 100x slower, even if only the memory-free Signals were used. Shown below.
public interface Signal
{
public abstract int GetVal();
// Rest of the Get/Sets...
}
public struct Signal32 : Signal
{
private int data;
public Signal32(int _data)
{
data = _data;
}
// Get/Sets...
}
I’ve read that using inheritance for structs it’s bad, so that route probably wont work. My limit there is that the Func<T,T> needs access to the memory, but it only can read whatever the input gives it. It can’t access the memory int in the Block since it’s method is created in a different class, so that route wont work. If you know of a way to bypass this, that would be great.
Any feedback would be appreciated here. My current code works okay enough, but I’d definitely prefer it to be faster.
Staik is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.