Building an automated notification system

Automated notification system :- Detects outages in system, and triggers notification to all users whose affected in outage period, when outage settles.

Assumption : In 96% of cases, outage would last <= 60-mins.
So we would only support to send notification to users falling in last 60 mins of outage window, before that no user will be notified once system recovers.

Constraints : In an hour around 2*10^5 orders could fail at max which is = order_api req/hour.

I am planning to use a state machine kind of approach, which will track live order failure rate over 5m using a scheduler (it runs every 5th_min to check failure rate of last 5 min orders). Based on that we detect is there’s an outage in the system.

If at any point in time on 5 * nth min. order failures rate total_5xx_order_api_5m * 100/total_rps_order_api_5m exceeds a certain threshold t, we updates state of order outage in DB as {order_outage: active}, which represents system is down for most of users.
On every 5xx of order_api for a user, system saves user_id | timestamp in table named failed_orders, where timestamp represents time when order_api failed.
As we are supporting 60 min outage, we want to keep time-to-live of 60-min on every record of user_id | timestamp on insertion.
Identify if outage settled (tricky) :- System should be able to identify when the outage has settled. For that I am planning to keep a list of last 3 order-failure-rate in-memory, example: [..., r1, r2, r3], where r1,r2,r3 > t. Now at any 5 * nth minute if the list state turns to be [r3, r4, r5], where r3,r4,r5 < t, which states for the last 15 mins outage rate is under threshold, in this case system assumes that outage must have settled, so it turns the state from active -> inactive: {order_outage: inactive}
As soon as order_outage state moves active -> inactive, system will trigger notification to all the users whose order failed during this period. we’ll traverse users with query :
select * from failed_orders where timestamp < cur_time - 10min & trigger notification to all these users, parallelly keep deleting users from table once notification is triggered.

Few design decisions :

We need to keep inserting order failed users in table (even though when outage is not detected), because if we start capturing the order failures at some 5 * nth minute than we will have no data for users whose orders failed before that minute & were in outage time period.
Due to first we need to keep TTL to our rows, If we don’t keep TTL, then failed order data will always keep growing, if outage threshold is never met.

Better alternative suggestions :

Any suggestions to improve on current design further OR make it simpler ?
Could there bean edge case which this system would miss ? Like missed sending notifications to some x type of users.
We have evaluated redis (for TTL) + MySQL (for traversing), but issue is we traverse table order_failed which is on MySQL & keep checking on redis (n/w call-1) if key(user_id) has expired or not, if expired than we have to delete the record from MySQL as well(n/w call-2), while we traverse the table. we want to avoid this kind of setup.
Can anyone suggest better single setup DB alternative which has good TTL support & good query performance on range queries(timestamp in this case). I have good things about BigTable & syclla DB, but both seems to be suitable when we have a lot of data, In our cases our data is small around 2 * 10 ^5 records, but we need similar features of TTL and range based queries.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 00:30

Thẻ: redisdatabase-designarchitecturegoogle-cloud-bigtablescylla

Thiết kế website giá rẻ

Danh mục

Building an automated notification system