SummingMergeTree + TTL on total sum breaks when records are left unmerged

I have a SummingMerge table in Clickhouse that collects some counters that are streamed in from some source

CREATE TABLE counters (hourDateTime, idString, reqcount UInt32, )
ENGINE = ReplicatedSummingMergeTree() PARTITION BY toYYYYMMDD(hour)
ORDER BY (hour, id) TTL hour + toIntervalHour(3) DELETE WHERE reqcount < 100

I’ve simplified the example a bit here, in reality they’re web requests counting http status codes, and id can be a number of things.

Records are inserted continuously and in batches, and I want to collect totals per hour, and get rid of the records with low count per id/hour.
It would be nice, but not strictly necessary if these could be added up to one ‘low count ids’ bin.

My problem is that Clickhouse doesn’t always merge records. If two batches come in for the same (hour,id) with reqcount=60, that should add up to 120 and therefore not be deleted when the TTL expires.
But if they’re not merged, Clickhouse just seems to look at each record individually and delete them because reqcount<100.

I’ve tried several ways to solve this

a) Add a second GROUP BY TTL: TTL hour + toIntervalHour(2) GROUP BY (hour, id) SET reqcount = SUM(reqcount), hour + toIntervalHour(3) DELETE WHERE reqcount < 100. The first TTL should sum up the records before the second discards the ones that still have low counts, but when I test this with data > 3 hours old (to simulate what would happen if TTL doesn’t run for a while), this worked on one machine, but failed on the CI server which may have slighly different settings and CH version. It seems I can’t depend on CH processing the TTL rules with an earlier time first? That felt a bit shaky from the start, but if there’s a way to force the order it would solve the problem.

b) Removing the TTL completely and doing it myself by first collecting the data in a _prepare table, and periodically run a script that moves it to a _final table while doing the aggregation and filtering I want. This has the added advantage that I sum up all the records with reqcount < 100 in a ‘low count ids’ record, but since Clickhouse doesn’t seem to have transactions I was in doubt how to do this reliably, and just INSERT INTO .. SELECT + a DELETE seemed to hang on the delete sometimes. In the real setup there is a Distributed table grouping several shards, and the DELETE FROM ... ON CLUSTER ... would just hang and timeout occassionally.

c) I could just schedule an OPTIMIZE TABLE counters FINAL every hour to force the SummingMergeTree to actually do the merge, but that doesn’t feel right.

So I have three options that all seem either unreliable, or just not the right way to do it. At the same time the use case doesn’t seem that exotic to me.

Either (a) summing things up over time and having a TTL to get rid of the low counts after a timeframe, or (b) I could also solve it if I had a nice pattern to periodically move data from one table to another and apply some transformation (in this case mapping id to “low count ids” and grouping in the same way as before).

Both seem like things that Clickhouse should be good at, yet I can’t get it to work.

I found one old question Table TTL on SummingMergeTree that suggests there is no solution, but that’s 5 years ago so I’m hoping there’s a better way to do this now.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 01:06

Thẻ: clickhouse

SummingMergeTree + TTL on total sum breaks when records are left unmerged

I have a SummingMerge table in Clickhouse that collects some counters that are streamed in from some source

I’ve simplified the example a bit here, in reality they’re web requests counting http status codes, and id can be a number of things.

I’ve tried several ways to solve this

c) I could just schedule an OPTIMIZE TABLE counters FINAL every hour to force the SummingMergeTree to actually do the merge, but that doesn’t feel right.

So I have three options that all seem either unreliable, or just not the right way to do it. At the same time the use case doesn’t seem that exotic to me.

Both seem like things that Clickhouse should be good at, yet I can’t get it to work.

I found one old question Table TTL on SummingMergeTree that suggests there is no solution, but that’s 5 years ago so I’m hoping there’s a better way to do this now.

Filed under: Kiến thức lập trình - @ 01:06

Thẻ: clickhouse

Thiết kế website giá rẻ

Danh mục

SummingMergeTree + TTL on total sum breaks when records are left unmerged

SummingMergeTree + TTL on total sum breaks when records are left unmerged