I’m trying to utilize Calcite as a SQL query transformer on top of PostgreSQL database.
The schema is somewhat non-trivial. It’s basically a schema of user-defined entities and columns:
CREATE TABLE entity (
id SERIAL PRIMARY KEY,
entity_type TEXT, -- let's say it's a type of entity a user defined
field_1 TEXT,
field_2 TEXT,
field_3 TEXT
-- or just use JSONB
fields JSONB
);
Since the table above cannot be easily indexed – I have a separate table used for selective indexing (only certain fields):
CREATE TABLE entity_index (
id SERIAL PRIMARY KEY,
entity_id INT REFERENCES entity(id),
field_id INT,
string_value TEXT,
numeric_value NUMERIC,
date_value TIMESTAMP WITH TIME ZONE
);
There’s also schema mapping user-defined columns to the actual columns in the database:
CREATE TABLE schema_mapping (
id SERIAL PRIMARY KEY,
entity_type TEXT,
column_name TEXT,
column_type TEXT,
field_number INT,
is_indexed BOOLEAN
);
I’d like to allow users to query their entities using SQL queries, but I’d like to transform these queries into the actual SQL queries that can be executed on the database. For example, if a user has an entity of type user
with fields name
, age
, and email
, I’d like to allow them to query their entities like this:
SELECT name, age FROM user WHERE age > 18;
But I’d like to transform this query into the actual SQL query that can be executed on the database:
SELECT field_1, field_2 FROM entity WHERE entity_type = 'user' AND CAST(field_2 as NUMERIC) > 18;
And a more complex example:
only user.age and company.region are indexed
SELECT name, age FROM user u
JOIN company c on c.id = u.company_id
WHERE name LIKE 'Joe%' and age > 18 and region = 'US';
This should be transformed into:
SELECT u.field_1 as name, u.field_2 as age FROM entity u
JOIN entity_index ui on u.id = ui.entity_id
JOIN entity_index ui2 on u.id = ui2.entity_id
WHERE u.field_1 LIKE 'Joe%' AND ui.numeric_value > 18 AND ui2.string_value = 'US';
My goal is just to get a transformed SQL query. Not run it against the database.
I was able to do some of this using Calcite’s different components, but I’m not sure what is the best way to do this:
- Manipulate the logical plan (using a bunch of RelShuttle and RexVisitors), and then render back to SQL. (This works ok for simple plans, but manipulating complex plans is a nightmare and I’m not sure if this is the right way to go)
- Implement an adapter that receives the simple Logical plan – and puts the complexity here in a component similar to what RelToSqlConverter does, which is not a simple task either.
Appreciate any help or guidance on how to approach this problem. Thanks!