The scenario:
Let’s say we have a database table ‘TABLE’ with entities identified by ‘ID’. Each entity has several features stored in columns (‘COLUMN_1’ to ‘COLUMN_N’). There is an additional feature of the entities ‘FEATURE_X’ that is not in the database.
An SQL query with a WHERE clause filters the entities based on conditions involving the columns and ‘FEATURE_X’.
Now we can’t include ‘FEATURE_X’ in the actual condition of the query that goes to the database, because that feature is not represented by a column in the database, but let’s say is determined by a very slow algorithm.
The problem:
I want to split the logical condition into two parts. One that involves only columns in order to send to the database and one condition involving ‘FEATURE_X’.
I want to construct a query to retrieve entities that form a superset of the actual result set, so that I can start the very slow algorithm with an already prefiltered set of potential result entities.
Is there a logical operation that ensures that we always get a superset regardless on how the logical expression is?
An example:
Given an expression
Φ = (COLUMN_1<10) AND ((COLUMN_2=4) OR (COLUMN_3>5) OR NOT (COLUMN_4=0 AND FEATURE_X=1))
the SQL statement would look like this:
SELECT ... FROM ... WHERE Φ
How can I exclude ‘FEATURE_X’ without compromising the correctness of the result?
My pseudo code that doesn’t work:
- Identify Φ_database
Φ = (COLUMN_1<10) AND ((COLUMN_2=4) OR (COLUMN_3>5) OR NOT (COLUMN_4=0 AND TRUE))
- Identify Φ_feature_x
Φ = FEATURE_X=1
- Construct SQL query
SELECT "ID" FROM "TABLE" WHERE (COLUMN_1<10) AND ((COLUMN_2=4) OR (COLUMN_3>5) OR NOT (COLUMN_4=0 AND TRUE)))
- Use the result as an input for the slow algorithm and get the result.
The error here is, that the first sql query does not always give me back a super set but sometimes also an empty set, which means that the logical expressions in the where clause are not preserved.