Question

I'm building an application that compiles a single PDF document from multiple source PDF documents as follows: it takes the first page of each source document, stamps certain information on top of each of those pages, and then combines all those "first pages" into an output PDF document. Assume the source PDFs already exist.

I'm using a third-party class library to manipulate the PDFs, i.e. extract the first pages, apply the stamped information, and combine pages to output the resulting PDF. My goal is to keep those PDF manipulation operations independent of my domain logic layer so it will be easier to swap out the third-party library if needed in the future. For this, I'd like to make use of an anti-corruption layer.

I imagine my domain will consist of one class (at this point): PdfDocument. The domain logic layer will load a collection of PdfDocument objects from file or another type of input stream and make use of services exposed by the anti-corruption layer to produce a single output document which has the characteristics I mentioned earlier. I can envision two possible ways of architecting this:

  • expose the following distinct services on the anti-corruption layer: 1.) extract the first page of a PDF document, i.e. return a PdfDocument that is one page, 2.) stamp provided text at the top of a provided PdfDocument, i.e. return a new PdfDocument that contains the stamped text, and 3.) combine multiple PdfDocument objects into one PdfDocument, i.e. return a new PdfDocument that is all pages from the provided PdfDocuments combined.

  • expose a single service on the anti-corruption layer which takes a collection of PdfDocuments and returns a single PdfDocument with the aforementioned characteristics.

The first approach seems more in line with separation of concerns because the second approach would have to take on a lot of the domain logic considerations such as "how many pages do I extract?", "do I stamp every page?", and "which of the extracted pages to I include in the output?". However, the first approach is much-less efficient because each service returns a PdfDocument. The third-party library I'm using has a Document class and an intermediate class -- Page -- which represents a page belonging to a whole PDF Document. If I use the first approach, the third-party library would have to bundle everything up into a Document and then output a PdfDocument for each service. However, if I use the second approach, I can more-efficiently operate on each PDF because the input PDFs can be broken down into Page objects to which the stamps could then be directly applied, and those Page objects could then be combined into the resulting Document and only then returned as a PdfDocument.

I've considered adding a PdfDocumentPage class to my model as one solution to this, but then my model is taking on concerns that it shouldn't necessarily have. I have no need for the notion of a "page" in my model other than to facilitate a more-efficient use of the third-party library, and that, to me, defeats the purpose of the anti-corruption layer.

Please help me work through this!

Was it helpful?

Solution

I sounds like you're trying to create an adapter around a third party library to avoid it becoming coupled in your business logic. Out of the two approaches you mention, the first one seems the best idea, as it's the most flexible and you can separate your business logic from the adapter (i.e. the adapter will basically be a pass through to the third party library).

However, you'd like to take advantage of a possible intermediate state of processing offered by the library. One way of doing this might be to use the builder pattern.

Adapter.CreateBuilder() -> returns the builder object
Builder.AddFirstPageFromDocument(binary data) -> returns the page number
Builder.StampPage(page number, message)
Adapter.CreateDocument(builder)

That's just a brief example, you could hide all the adapter inside the builder object or each method could live in the adapter and you would pass the builder to each method (in which case it becomes more of a state/context object rather than a builder object).

If you're still not comfortable with introducing additional objects into your domain, however, I would definitely sacrifice efficiency for a clean separation (i.e. stick with the first option and return the whole document each time), purely from a maintenance perspective.

OTHER TIPS

Your first approach looks much cleaner than the second, but I do not think you have to sacrifice performance necessarily by choosing this design.

Assumed it turns out the approach is really not efficient enough, then you could use your PDFDocument as an abstraction over your preferred pdf representation "behind the scenes". What you call a PDFDocument, does not need to be exactly what your third party library calls a document, so it could alternatively be a certain page range within the latter. If you follow that route, extracting the first page from a PDFDocument can be implemented internally just by setting the page range from "page 1 to page 1", without creating a new original "3rd party document object". The stamp operation then internally has to make use of that page range, as well as the final merge.

So I guess it should be possible to have both: a simple representation for PDFDocuments in your adapter layer, and the usage of internal optimizations provided by your library vendor without exposing them to your domain layer.

Licensed under: CC-BY-SA with attribution
scroll top