Data architecture for event log metrics?

My service has a large ongoing number of user events, and we would like to do things like “count occurrence of event type T since date D.”

We are trying to make two basic decisions:

  1. What to store? Storing every event vs. only storing aggregates

    • (Event log style) log every event and count them later, vs.
    • (Time-series style) store a single aggregated “count of event E for date D” for every day
  2. Where to store the data

    • In a relational database (particularly MySQL)
    • In a non-relational (NoSQL) database
    • In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?


Additional details:

  • The total event stream is large, potentially hundreds of thousands of entries per day
  • But our current need is only to count certain types of events within it
  • We don’t necessarily need real-time access to the raw data or aggregation results

IMHO, “log all events to files, crawl them at a later time to filter and aggregate the stream” is a pretty standard UNIX Way, but my Rails-y compatriots seem to think that nothing is real unless it’s in MySQL.

5

It always depends, I’ll give you my advice to offer you a new perspective

What to store? Storing every event vs. only storing aggregates

(Event log style) log every event and count them later, vs.

If you plan to don’t miss any detail, even though now they are not relevant, on my eyes that’s the best approach, because sometimes, as the results comes, then you find some other events that for X or Y they were not relevant, or they didn’t bring any extra information, but after some analysis, it simply does, and you need to also track that one, then because its recorded but not accounted it would take you some time before you can add it to the picture.

(Time-series style) store a single aggregated “count of event E for date D” for every day

If you want to implement and use it tomorrow, it can work, but then if you have a new requirements, or you find a correlation with another event that you omitted for any reason, then you need to add this new event and then wait some long time to have nice aggregation levels

Where to store the data

In a relational database (particularly MySQL)

The first option can be heavy for a DB if you go for recording all events, so MySQL I’m afraid can become too small, and if you want to go for RDBMS solutions you may think bigger, like PostgreSQL or proprietary like Oracle or DB2.

But for the aggregation would be a good choice, depending of the load generated you can aggregate in code and insert those aggregations in the DB.

In a non-relational (NoSQL) database

If you go for this solution, you need to see which approach you want to follow nice read on wikipedia may help you, I can’t help you much on that topic because I simply don’t have enough experience, I mostly use rdbms.

In flat log files (collected centrally over the network via syslog-ng)

I personally would discourage you to go for that option, If the file grows too much, it would be more difficult to parse, but still I don’t know the main purpose, is to follow up on a system, or simply check a log file …

Hope it helps!

1

I think that your idea to parse logs, count and store results in a DB is valid. Not sure you’d want all those raw logs in the DB anyway (I think that’s what you said your compatriots are suggesting). You’ve already got the logs in files, correct? You could just archive those. I suppose that bit really depends on your use case(s).

Also agree with @Thorbjørn Ravn Andersen about moving your “comment answer” to the question.

Depends on your intended usage. If you have a standard graph or report showing aggregate values, then you’ll want to simply filter the events as they come in and aggregate them into the appropriate bucket. If you need to drill down into specific events, or if you think you might want to go back and re-analyze / re-categorize events later, then you should store the individual events.

If you’ve got the time and space, what I typically like to do is aggregate the data, but store the details in a (compressed) file. The details don’t have to be easily accessible, since I almost never need them, but they’re available for bulk re-processing if the classification criteria change.

3

Any architecture decisión should be driven by business needs. In your case, you should have a more clear idea of what information do you want to obtain from your log system and in order to decide how to store, how often you will require this info and how much time you can wait to get the result. This is what drives the design of log collectors, event correlators and similar applications.

Rather than giving you my opinion, I suggest you look at some applications similar to what you try to develop. Some of them may be way more powerful that what you pretend to develop but it won’t hurt if you look at the architecture and storage policies followed. On the professional side, you have SIEM applications like RSA and Arcsight and in the Open Source side you have initiatives like Kiwi or OSSIM (that has also a professional appliance based version).

Another thing to consider is that when you start using the results obtained by the tool, you will start receiving very likely many requests from your management for more information and more detailed one. So… use it carefully and plan with your view in the horizon. It may give you more work, but definitely you may get a lot of support and visibility (pressure comes in the package)….

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật