I have moved to a new project and as the only ‘software guy’ I’m welcome, and expected, to suggest design improvements to the existing code (code just got out of prototype phase and needs extended to final releasable version; plus adding functionality)
The code has multiple stand alone components which are suppose to run in concert, but each one can be rebooted on it’s own. They are passing messages between these daemons via a few hand written classes which implement their own server with message queue’s. It works well enough, the project made it passed prototype phase and while stability is one of their biggest long term focuses it’s not notoriously unstable; just not quite up to par for a final release. Still, ever time I see the code all I can think is “I would have done this using JMS.” If I went back in time to their implementation days I would have told them to do this in JMS from the start; and I am very tempted to insist on that now.
Only problem is that, even if reinventing the wheel was suboptimal at the time, what they have mostly works now and I would hate to throw out large chunks of code. I have no doubt if we kept their message passing logic I would have to spend some time working on it; I’m not certain it is 100% reliable; there are some minor data-races that they get away with because they system should be able to recover even if it happens, some ‘code smells’ I want to investigate further etc etc. Still, replacing the whole system with JMS would be a pretty big overhaul. Require writing allot of new code, some debugging to fix issues they already ran into and corrected with their home brewed system, and ultimately add a dependency on JMS when we release this project which may be undesirable.
To give a slightly better idea of the system as it is; it’s mostly a direct pipeline with two way asynchronous calls. Each separate part of the system (called daemons) at most speak with two other daemon, the one in front of it in the pipeline, and the one behind it. Each daemon will pass requests further ‘down’ the pipeline to the low level architecture and want some response back (but doesn’t block waiting for it), in addition asynchronous messages must be able to propagate back up the system. Multiple people can be making requests at once; but we don’t expect an absurdly high number of total messages. Each customer will generate messages in a small burst, but the bursts aren’t huge and I don’t think this has to scale up to an absurdly high number of customers. Lost of a message would likely result in inconsistent state; lost messages are quasi-detected currently, but nothing really done to ‘clean up’ when it happens; but the system is basic enough that inconsistent state isn’t likely to break things as horrible as some systems could (plus we have our own tiny closed system, probably aren’t going to lose messages). The current communication code is generic, any daemon can send any message to any other; and we just trust each daemon only sends messages to the correct daemons and that no daemon sends a message of a type another daemon doesn’t know how to happen (all messages are enumerated, but some only are viable when sent to specific daemons)
So the question is, can I justify a switch to JMS? I will feel more comfortable with stability if I know were using an enterprise system, and I think it would allow some of my other “crazy, probably never get around to implementing, but would be so cool” ideas a bit easier to implement later. However, we have a small team and can’t possible do everything I would love to do anyways. Will JMS provide enough benefit to warrant the cost of rewriting mostly-working code (mostly working because we don’t have an obvious “this doesn’t work!” example, I know from code review that there are minor issues a few ‘smells’ that I need to look into further, and basically that it’s a decent but no-where close to enterprise level, but the stuff I am certain I have to address wouldn’t take as long as writing the JMS logic, much less testing and debugging). How ‘bad’ is it to create a dependency on an external system like JMS when we currently can run without it? Are there other useful advantages to JMS I’m not seeing (guaranteed message delivery seems a big one to me, but not sure just how much of a benefit that will be yet for this project)
Incidentally, one of our eventual “long term, cool to have, were see if we get around to it” goals would be to replace the current use of SQL calls to fetch data by storing everything in memory. If we do store in-memory we will need very reliable message passing to keep state consistent between multiple daemons, and my inclination right now is that something like that should never be tried without the something like JMS. The state wouldn’t change very often, but when it does change were have to stay consistent, worry about data race etc. We may not do that at all, would JMS make it substantially easier to keep consistent state and handle data-races, or will I still be doing the same logic in either case?
Are there certain things particularly prone to failing with quickly written message passing systems I should be looking for explicitly?
1
Okay, finally read it all through. I agree, what you describe is a perfect scenario to using JMS. The benefits of leveraging the API is that you can remove the code that deals with the message passing and wiring up your service bus and leave it all to your JMS provider. The providers on the market have their own configuration tools and built in recovery mechanisms that will simplify your code.
Here’s the hitch, making that change to as you mentioned “mostly working” code is hard to justify. But in my opinion, the less code you have to write/maintain the better, and it sounds like a great deal of effort was invested just to handle the infrastructure of passing the messages (and correlation, and session handling). Look at it this way. You can sell the change as “we can eliminate all of this code and let a vendor whose primary focus is on messaging infrastructure handle it.” That’s a big sell for some people. For others thinking of just throwing away all that effort (and money spent) is blasphemy.
1