I am busy writing a Python application using ZeroMQ and implementing a variation of the Majordomo pattern as described in the ZGuide.
I have a broker as an intermediary between a set of workers and clients. I want to do some extensive logging for every request that comes in, but I do not want the broker to waste time doing that. The broker should pass that logging request to something else.
I have thought of two ways :-
- Create workers that are only for logging and use the ZeroMQ IPC transport
- Use Multiprocessing with a Queue
I am not sure which one is better or faster for that matter. The first option does allow me to use the current worker base classes that I already use for normal workers, but the second option seems quicker to implement.
I would like some advice or comments on the above or possibly a different solution.
I like the approach of using standard tools like what Jonathan proposed. You didn’t mention which OS you are doing the work on, but another alternative that follows that same spirit could be to use Python’s standard logging module together with logging.handlers.SysLogHandler
and send the logging messages to rsyslog service (available on any linux/unix, but I think there’s also a windows option, but I’ve never used that one).
Essentially that whole system implements same thing that you are thinking off. Your local process queues up log message to be handled/processed/written by someone else. In this case, the someone else (rsyslog
) is a well-known, proven service that has a lot of built-in functionality and flexibility.
Another advantage of this approach is that your product will integrate that much better with other sysadmin tools that are built on top of syslog. And it wouldn’t even require you to write any code to get that option.
1
You might want to consider a third possibility for implementing remote logging. If you use the standard Python logging module you can consider using the logging.QueueHandler
class in your workers, clients and broker, and the logging.QueueListener
class in your remote logging process.
Instead of using the normal Python multiprocessing.Queue
as the transport between your application processes and your logging process, implement your own Queue
replacement class using ZeroMQ with duck typing to have your class be a drop-in replacement for the standard Python Queue
. In this way your application will be able to run unaltered in any environment from a single multi-core computer through distributed data centres.
To summarize, use a standard Python logger with a QueueHandler
in all your workers, clients and brokers and create an independent process based on QueueListener
and the Python logging
handler(s) of your choice to handle the heavy lifting of logging.
3
These are radically different approaches, each with its own sets of pros and cons, which you will most likely see panning out at a later development stage:
I have thought of two ways :-
- Create workers that are only for logging and use the ZeroMQ IPC transport
- Use Multiprocessing with a Queue
One way you could try is to have an additional logging-worker, as in approach 1. You could let your workers log to a memcache logging cluster, and the logging worker monitors the current resource load and upon deceeding a given resource load parameter, the worker logs to an IOPs limited device (e.g. harddisk) .
I also like Jonathan’s approach with the caveat that I too mostly use Python 2.x, and that you would likely have to setup your own logging backend to really push the performance-envelope.
Correct me if I am wrong, but my take is that you are doing some really data-intensive task, with storage IOPs being your bottleneck.
A convenient way still would be to let the broker do the brokerage
logging – in the form as described- with all the disadvantages of a central broker instance.
For instance if the broker is in such high demand that it never gets some breathing room to write the memcached logs back to storage, you would need to take another approach.
You may ultimately end up with a brokerless model. That is with the workers managing their work among themselves. In a simple example, through a Distributed round-robin algorithm.