Exception handling in a program that needs to run 24/7

I have read that we should only catch exceptions that can be handled, which makes catching the base exception class (C# in this case) a bad idea (on top of other reasons). I am currently part of a project in which I so far have yet to see anything but the base exception being caught. I mentioned that it is considered bad practice to do so, but the response was “This service needs to run 24/7, so that is the way it is.”.

Since I did not have a good response for how to properly handle exceptions in a program that needs to run 24/7, I am now here. I have not managed to find any information / suggestions on how to deal with exception handling in “critical” programs / services that needs to run around the clock (and in this case I believe it may be ok if the service is down for a minute or two, so not even critical). I understand it depends on the exact nature of the program. The requirements for a program that can cause life threatening issues are quite different compared to a log scanner for an online game.

Two examples:

1: A type-ahead service for customers of the Brittish railways, used when they searching online for railway stations.

2: A program that automatically controls the railway switches for the above railways based on realtime information provided from various sensors in the tracks, trains etc.

The first program would probably not cause a major issue if it went down for a minute or two, wheras the latter could cause human casualties. Suggestions on how to deal with each? Pointer to where I can find more information and thoughts on this issue?

4

Certain language features like

  • Garbage Collection
  • Exception Systems
  • Lazy Evaluation

are not generally useful in a real-time system. One should probably choose a language without these features, and try to prove certain properties like maximum memory usage, or maximum response time.


When a program needs to run continuously, but short and non-global failures are acceptable, then we could use an Erlang-like strategy. Erlang is a concurrent, functional programming language. Usually, a program written in Erlang will consist of multiple worker processes which can communicate with each other (actor model). If one worker thread encounters an exception, it is re-started. While this does imply a short downtime, the other actors can carry on as usual.

To summarize this: In a robust program, various parts are isolated from each other and can be restarted or scaled independently.

So basically we need a piece of code equivalent to this:

while (true) {
  try {
    DoWork();
  }
  catch (Exception e) {
    log(e);
  }
}

plus a way to terminate the loop. Such a loop would then drive each worker thread.


A problem with ignoring errors via a catch-all is that invariants of your program might have been violated by the error cause, and that subsequent operations could be useless. A good solution to this is to share no data between independent workers. Restarting a worker will rebuild all necessary invariants. This means that they must communicate differently, e.g. through message sends. An actor’s state may not be part of other actors’ invariants.

Another problem with catching too many exceptions is that not all exceptions are fixable by restarting, even when taking such precautions. Otherwise-hard problems like running out of memory can be handled by restarting. But a restart won’t help you to regain internet connectivity when a physical cable was pulled out.

1

To answer your question, one has to understand what exceptions are, and how they work.

Exceptions are usually thrown when such errors occurs, where user’s assistance is required. In such cases, it doesn’t matter how long it takes to unwind the stack and handle the exception.

Without catch handlers, the program stops execution. Depending on your setup and requirements, it may be acceptable.

In your specific cases :

  1. if the query can not be executed (for example, wrong city name), then inform the user of the error, and ask to fix it.
  2. if you are not getting information from a critical sensor, there isn’t much sense in continuing without asking for the operator to fix the issue.

That means that in both cases it may make sense to use exceptions, with more care in a RT program to indicate only serious problems where it is not possible to continue the execution.

I so far have yet to see anything but the base exception being caught.

It sounds like there is a problem here, in as much as exceptions aren’t being dealt with appropriately. Catching exceptions at the appropriate point and taking appropriate action (depending on the type of exception) will keep the service running in a much more reliable fashion.

If service must continue, presumably it’s important that it is working as intended. Given your example, if a program which controls railway switches throws an exception it may indicate that there’s a problem communicating with safety-related sensors. If you catch the base exception and continue the service may run, but may not function as intended leading to disaster.

Alternatively, if you catch the exception thrown when there’s a communication failure with the sensor and deal with it appropriately (i.e. stop the trains in the affected area) your service is running and you haven’t killed anyone.

So, as I understand the question, I’d suggest that in the first instance you would be better looking to add more specific exception handling rather than to remove the base-exception-type handlers.

With regards to point 2: don’t use C#. It’s not a realtime language and you will get hurt if you try to use it as such.

For point 1: you could go the erlang way: let it crash, then restart

2

Declaimer: these are only thoughts, I haven’t the experience.

I would guess that a program, satisfying the requirements of the second example should be extremely modular. Consequently, modules will be capable of being restarted, without destabilizing the system.

For example, an object, failing an assert for internal state, should be able to be destroyed and re-created, notifying in the process all it’s consumers and suppliers. More concretely, if the program is controlling the switches of the railway and fails an assert in the decision loop, it can still run an emergency module, which stops all involved trains, and waits for the main decision module to re-initialize.

More realistically, one would introduce redundancy – duplication of the hardware and software. One instance is wired to the controlled system, and the other is free-running. If an error is detected, the systems are switched.

An example are two processes on the same machine, which monitor one another and if one is killed, the other re-spawns it and disassociates it’s parent PID from itself.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật