I am working on code that will check each of the images in a PDF and do some image processing.
As a proof of concept, to check that my image processing worked, I wrote a simple page parser that check for a PDImageXObject and processed that, replacing the image with a processed one. That worked so I am looking to extend the code because, as we all know, real PDFs can have PDFormXObjects that reference PDFormXObjects that reference PDImageXObjects etc.
Researching this, I looked at the examples PrintImageLocations.java as a starter. That appeared to be a good path to go down until I came across this where @mkl implied that it would not “find” all the images in a PDF, specifically those in a PDFormXObject.
Can anyone enlighten me, provide guidance or point me at some sample code?
Thanks in advance.
Tried parsing the PDF Page. Researched other techniques i.e. PrintImageLocations.java
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException
{
// Check for "Do" then examine the previous token for the PDXImageObject and then process the image content
String operation = operator.getName();
if (OperatorName.DRAW_OBJECT.equals(operation))
{
COSName objectName = (COSName) operands.get( 0 );
PDXObject xobject = getResources().getXObject( objectName );
if( xobject instanceof PDImageXObject)
{
PDImageXObject image = (PDImageXObject)xobject;
BufferedImage bi = ((PDImageXObject)xobject).getImage();
BufferedImage newBufferedImage = renderBufferedImage (bi);
}
}
The example referenced earlier ‘put’ the modified PDImageXObject back into the page level resources and while that works for ‘simple’ cases, it isn’t the right way to handle the case for PDFormXObjects(where it should be placed in the resources of the associated PDFormXObject).
Is there a way to determine the parent of the PDImageXObject to be able to replace it at that level or do I need to create some sort of structure (a Map?) to hold the name of the original PDImageXObject and processed image and replace everything once the processOperator is complete?
Invisible Coder is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
7