Given the following table structure:
Table A
a |
---|
1 |
2 |
3 |
Table AB
a | b |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 2 |
3 | 2 |
3 | 3 |
Table B
b |
---|
1 |
2 |
3 |
Problem description: Given a list of b
‘s, I want to query those a
‘s which are associated with ALL b
‘s in AB
.
Or formulated more generally, I want to retrieve all a
‘s which match a given criteria list b
. The length of the list of b
‘s (i.e. the criteria count) can be anywhere from 1 to 20.
Example: Given a b
list of (2, 3)
, I want to retrieve a
‘s 1 and 3 because they are each associated with 2 and 3 in AB
.
Question: The described problem sounds like a very generic problem which would come up often – is there a best solution on how to efficiently solve this query pattern in a relational database?
Is there a more optimal way to set up the table structure in order to answer this particular query pattern? What would be the best types of indices to use here?
My current approach
Currently I use the following approach, however I’m not sure if that’s the best way most efficiently select the data.
I use B-tree indices on A(a)
, B(b)
, AB(a,b)
, AB(b,a)
.
Let’s call the criteria list b_criteria=(b1, b2, ...)
, I query the data with:
SELECT DISTINCT ab.a
FROM AB ab
WHERE
ab.b = `b_criteria[b_lowest_cardinality_index]`
AND EXISTS (SELECT 1 FROM AB q WHERE ab.a = q.a AND q.b = `b_criteria[1]`)
AND EXISTS (SELECT 1 FROM AB q WHERE ab.a = q.a AND q.b = `b_criteria[2]`)
...
AND EXISTS (SELECT 1 FROM AB q WHERE ab.a = q.a AND q.b = `b_criteria[n]`)
So I select a
‘s which all match the EXISTS
clause.
Additionally I try to already start with the lowest possible count of rows by preselecting only those rows with b=b_lowest_cardinality
. I can choose b_lowest_cardinality
beforehand, by counting how many count(b)
grouped by b
are in total in AB
. The cardinalities of b
differ strongly in AB
(i.e. there might be many more b=3
in AB
as b=2
, for example).
Background info:
In my instance I have
A
row count: 400 millionAB
row count: 4 billionB
row count: 4000- Database: Oracle
6