I have a ~280GB SQLite DB with company financial info which I am running various queries against. Yes it’s SQLite because it works fine and is simpler to work with. I have a MySQL version of this DB too but I’d rather avoid using it if I can.
There is a table called ItemValues
with a billion or so rows. Every single column shown in the two tables below has an index which is built at the time of DB build job (overnight)
CompanyId | ItemTypeId | NumericValue | DateEpoch |
---|---|---|---|
AB1234 | 1 | 100 | 1569884400 |
AB1234 | 2 | 200 | 1569884400 |
G17895 | 7 | 50 | 1632956400 |
The ItemTypeId
is a foreign key and the table it refers to , called ItemTypes
looks like this
Id | ItemShortDescription |
---|---|
1 | Cost |
2 | Revenue |
3 | SomeOtherFinancialMetric1 |
4 | SomeOtherFinancialMetric2 |
I can lookup companies with a specific Cost
OR Revenue
quite easily and the search takes milliseconds to return results.
Here is the SQL Fiddle
The problem arises when I have to lookup companies which meet a specific Cost
AND Revenue
criteria. While technically I can lookup all companies with a specific Cost
, then lookup all companies with a specific Revenue
and then combine the two, selecting only the subset of companies that meet both my criteria, this approach returns a ton of data as there millions upon millions of rows that match Cost
(Revenue
isn’t found in as many rows of ItemValues
, in relative terms).
What options might I have to solve this challenge? At present I am only searching on Cost and Revenue and I suppose I could add these two values to each ItemValues
but the fields I need to search on may grow. I’m wondering if my table design needs rethinking, or perhaps I should look into column oriented DBs but I believe I should first try to optimise querying against the existing db structure.