The question in general is: is there a more effective way of implementation of table with structure like Dictionary<int, Dictionary<string, string>>
?
The reason I am asking this is because I have made few performance tests and it does not perform well for data with > 5M rows. Now, I dont really need this amount of data but I was wandering if there is a more effective way. It could also help performance for smaller tables with thousands of rows. Last but not least I am interested in what COULD be done to improve it.
What I have thought of is to use string[][] and have some method transform string rows/column to numbers. That would however require a quite significant rewrite of my work so far. Is there something simpler? I need rows to be able to handle gaps.
Background on my project:
I have a home brewed structure of objects that represent a table along with some additional functionality that I need. I have table called T, and it stores data (rows) in Dictionary<int, TRow>
. Each TRow has another dictionary<string, TCell>
that represents the row data, in which TCells are indexed by column names. TCells is basically a wrapper around simple string. Table and each row has a Schema definition (column -> {INT, DOUBLE, STRING, BOOL, …} that is parsed when getting data from the table by methods like .getBool( int row, string column ) etc. Each object (T,TRow,TCell) has quite a lot of helper methods that I use, so they are not a simple wrapper with get methods.
EDIT TO ANSWER FOLLOW-UP QUESTIONS:
The table is meant for general purpose. No special focus on reading/writing only. The table is often initially loaded from result-set produced by stored procedure in database and then only read from – but not exclusively. The composite key is an interesting idea, but that would break my T, TRow, TCell structure I am afraid. The Dictionary INT X STRING -> STRING is only a simplification, as written in my last paragraph the table T has Dictionary< int, TRow> and TRow has Dictionary< string, string >. The reason I need to keep Table, Row and Cell broken up is that sometimes I work directly with rows, e.g. some method can return a single row etc. Any ideas please? Or there is nothing better :/.
1
It all depends on what operations are you doing often. Writing? Reading? Do you work separately with Dictionary<string, string>
or do you always work with Dictionary<int, Dictionary<string, string>>
as a whole? Also, you are saying you have rows and columns. Do you add/remove them, or are they constant for each instance of this dictionary?
First thing that comes to mind is to create a composite key from int and string, and use that as a key for dictionary.
public struct IntStringKey
{
public int A {get; private set;}
public string B {get; private set;}
public IntStringKey(int a, string b)
{
A = a;
B = b;
}
// override GetHash and Equals here
}
And then use it :
Dictionary<IntStringKey, string> data;
This will make it easier on GC, because you are not creating so many internal dictionaries and also you call the get/set only once.
3
I’m not sure this will help speed, but memory-wise, if the keys in TRow are the column ids (ie. the same for every row), then you could do something like this:
class TColumnData
{
private readonly IDictionary<string, int> nameToIndex;
public int IndexFor(string id) => nameToIndex[id];
}
class TRow
{
private IList<TCell> cells;
private TColumnData columnData;
public TCell this[string id] => cells[columnData.IndexFor(id)];
}
With constructors and such as appropriate.