How should we load balance listeners to a 3rd party service?
e.g. We need to read gtalk messages via XMPP Listener and build an application using the interactions of the user with a chatbot. In such a case, how would we ensure that we can have XMPP listeners on multiple servers and we load balance the messages between the listeners?
Thanks in advance.
1
You might want to look at Twitter’s Storm framework. It makes use of zookeeper to manage clusters of JVM’s that can be configured to process incoming events.
From their fine tutorial:
-
The core abstraction in Storm is the “stream”. A stream is an unbounded sequence of tuples. Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way. For example, you may transform a stream of tweets into a stream of trending topics.
-
A spout is a source of streams. For example, a spout may read tuples off of a Kestrel queue and emit them as a stream. Or a spout may connect to the Twitter API and emit a stream of tweets.
-
A bolt consumes any number of input streams, does some processing, and possibly emits new streams. Complex stream transformations, like computing a stream of trending topics from a stream of tweets, require multiple steps and thus multiple bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.
-
Networks of spouts and bolts are packaged into a “topology” which is the top-level abstraction that you submit to Storm clusters for execution. A topology is a graph of stream transformations where each node is a spout or bolt. Edges in the graph indicate which bolts are subscribing to which streams. When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.
http://storm-project.net
It might be overkill, but looking at it might also inspire you to think of your problem differently even if you don’t use it.
2
If you can’t modify the 3rd party, then you have to do the load balancing on your side using a Message Queue.
One server will listen to incoming messages in XMPP, and put it inside a message queue. This message queue will have multiple workers, so it will be automatically load-balanced between the workers. Once the worker is done processing the message, it will post the result back to the server.
Using a non-blocking server like NodeJS, and a broker-less message queue like ZeroMQ would be the best, low-overhead and high-performance solution. You may want to read more on NodeJS (non-blocking IO and event-loops) and ZeroMQ (0MQ is a library, not a server, so you don’t need a separate server for it)