I have a table called event_source which has lot of data which is a mix of data from multiple sources produced by multiple teams and i want to create tables specific to each team.
This table gets data from kafka and stores the whole data json as string
CREATE MATERIALIZED VIEW events.event_source_mv TO events.event_source ( `message` String ) AS SELECT message AS message FROM events.kafka_engine;
CREATE TABLE events.event_source ( `ingest_time` DateTime DEFAULT now() CODEC(LZ4HC(9)), `message` String CODEC(LZ4HC(9)) ) ENGINE = MergeTree PARTITION BY toYYYYMMDD(ingest_time) ORDER BY ingest_time TTL toDateTime(ingest_time) + toIntervalMonth(6) SETTINGS index_granularity = 8192, storage_policy = 'hot_to_cold';
I am trting to create MATERIALIZED VIEW like following which updates new incoming data
CREATE MATERIALIZED VIEW event_source_business1_mv TO event_source_business1
AS SELECT
ingest_time as ingest_time,
JSONExtractString(message,'source') as source,
JSONExtractString(message,'eventName') as event_name,
JSONExtractString(JSONExtractString(message,'entityContext'),'type') as ec_type,
JSONExtractString(JSONExtractString(message,'additionalInfo'),'ip_address') as ai_ip_address
FROM event_source;
What is the best way to get the old data in the new table i.e. previous data present in source
given the data in source table is close to 6TB uncompressed spanning over 3 months, would be glad to provide extra data if needed
tried using insert(something along the lines of insert into event_source_business1 from event_source where date>3months) but its failing inbetween after sometime, is there a better way