I have a Java web application that makes use of certain libraries for example: Ghost Script for converting PDFs to TIFFs and Tesseract for OCR. There are java wrappers for both of these: Ghost4J and Tess4J.
What are some things I should think about when deciding whether to use the libraries or issue a command line process from my web application?
Off the bat what I’m noticing is that command line operations are slightly faster and don’t take a toll on my application. For example 100 users sending PDFs that need to be converted to TIFFs using Ghost4j makes Java run at more that 100% CPU which makes the entire web application unresponsive.
1
The following factors should be considered:
- portability
- the command line interface will be different per environment,
- and also the configuration
- performance
- re-usability
- maintainability
- scaleability
If possible consider creating the conversion component as a separate service running in it’s own JVM (possibly multiple) and forwarding conversion requests to this service (reusability).
This allows you to use the wrapper libraries (allowing the code to be portable) + (maintainability).
It also allows you to scale the number of conversion component processes to deal with system scaling (scaleability), and also allows you to potentially host on different hardware when the existing hardware resources are at a limit. (performance)
A simple web-service interface or equivalent is simple to implement between two components to provide this functionality, which then allows you to separate your web interface behaviour from the grunt work of performing document conversion.
I agree to all the comments and the answer above on the Non-functional requirements that will influence whether to go with solution A or B.
Additional information regarding whether you want the conversions to be done offline or in realtime (e.g. User sends PDF, and gets Tiff back as soon its done) would be useful.
Regardless lets assume that at this point you are constrained by the available system resources (Conversion consuming all of the available CPU) and from personal experience i can say that the command-line tooling will be much efficient here. I do recommend to use that over pure java.
I can name a number of known java based systems out there that use imagemagick to do similar conversion (Create, resize, convert PDFs). it also comes with a nice Java api to work with. Reference: http://www.imagemagick.org/script/api.php
Next to that if your main app’s performance is getting a huge hit because of the conversion, then i would also recommend to move the whole conversion to dedicated node(s) / cluster.