I am currently prototyping a piece software which must be able to generate different types of documents in different file formats. The document could be a letter, or a receipt, for example, requested as a Word document, PDF, or both.
I am using a Word and PDF library which each have their own separate APIs. Each different type of document can have many different layouts and formats, including custom fonts, images, headers and footers.
I want the design to be as abstract as possible, and avoid exponential growth of concrete implementations for every new type of document, layout, and file format.
First, I think I should abstract the PDF and Word document APIs into some more generic API. I feel like the adapter pattern is most suitable. For example, I might have an interface called DocumentAdapter, with two concrete implementations called PdfDocmentAdapter and WordDocumentAdapter.
I then want to be able to build any type of document, in any format, with any layout. I’ve therefore started to design a DocumentBuilder class with two dependencies: a document adapter, and a document specification. The specification should define the layout of the document – and be file format agnostic.
The DocumentBuilder will have a concrete method called BuildDocument, which builds the document for the appropriate adapter and specification – then writes it to the filesystem.
The problem is that the APIs for building Word and PDF documents are so different I don’t really know how to best solve this design problem.
I could have something like LetterDocumentInterface -> PdfLetterDocument, WordLetterDocument, StandardPdfLetterDocument, ShortPdfLetterDocument, StandardWordLetterDocument, ShortWordLetterDocument, but this would lead to a ridiculous number of concrete classes.
Any hints or experience from building similar designs would be immensely appreciated
4
When faced with a similar challenge several years ago, I went with DocBook. There were translators available for generating PDF, Word, HTML and other output formats.
I had a number of difficulties with this approach.
- DocBook itself is very much oriented to producing books. Other document types are a specialization, but you need all the cruft. No such thing as lightweight DocBook.
- Producing other document formats from a docbook source used an XSLT transformation step. At the time, XSLT 2.0 processors were not wide spread and were tricky to work with.
- The pre-packaged DocBook XSLT packages had been developed for the particular needs of the those developers. Hence, there were obscure and tricky things that had to be circumvented.
I eventually developed transformers to produce Word (OOXML) documents, PDF (generating Apache FOP) documents, and HTML. The results were quite good looking. However, by this time the train had left the station and the project could not afford to wait on me to develop these things to the point of robustness.
Looking back, I would suggest using a simplified document format, perhaps resembling DocBook, and use XSLT to generate the contents in the different formats. Then, injecting of style sheets would take care of the rest of your needs.
2