I’m trying to create a PDF/A-3 file as a base to form a ZUGFeRD file with the help of these two libraries:
-
JasperReports (6.21.3)
-
pdfbox/xmpbox (3.0.2)
In order to check if my output file is compliant, I use Mustang (on another system):
<groupId>org.mustangproject</groupId>
<artifactId>validator</artifactId>
<version>2.11.0</version>
I managed to create a file (attempt.pdf) with jasperreports that is in PDF/A-3 format, but it doesn’t validate as compliant and I get the following error from Mustang:
<?xml version="1.0" encoding="UTF-8"?>
<validation filename="attempt.pdf" datetime="2024-07-24 17:00:53">
<pdf>ValidationResult [flavour=3b, totalAssertions=382, assertions=[TestAssertion [ruleId=RuleId [specification=ISO 19005-3: 2012, clause=6.6.2.3, testNumber=7
], status=failed, message=All properties specified in XMP form shall use either the predefined schemas defined in the XMP Specification, ISO 19005-1 or this part of ISO 19005, or any extension schemas that comply with 6.6.2.3.2., location=Location [level=CosDocument, context=root/document[
0
]/metadata[
0
](21 0 obj PDMetadata)/XMPPackage[
0
]/Properties[
5
](urn:factur-x:pdfa:CrossIndustryDocument:invoice: 1p0# - fx:ConformanceLevel)
], locationContext=null, errorMessage=An XMP property is either not not pre-defined, is not defined in any extension schema, or has invalid type.
], TestAssertion [ruleId=RuleId [specification=ISO 19005-3: 2012, clause=6.6.2.3, testNumber=7
], status=failed, message=All properties specified in XMP form shall use either the predefined schemas defined in the XMP Specification, ISO 19005-1 or this part of ISO 19005, or any extension schemas that comply with 6.6.2.3.2., location=Location [level=CosDocument, context=root/document[
0
]/metadata[
0
](21 0 obj PDMetadata)/XMPPackage[
0
]/Properties[
6
](urn:factur-x:pdfa:CrossIndustryDocument:invoice: 1p0# - fx:DocumentFileName)
], locationContext=null, errorMessage=An XMP property is either not not pre-defined, is not defined in any extension schema, or has invalid type.
], TestAssertion [ruleId=RuleId [specification=ISO 19005-3: 2012, clause=6.6.2.3, testNumber=7
], status=failed, message=All properties specified in XMP form shall use either the predefined schemas defined in the XMP Specification, ISO 19005-1 or this part of ISO 19005, or any extension schemas that comply with 6.6.2.3.2., location=Location [level=CosDocument, context=root/document[
0
]/metadata[
0
](21 0 obj PDMetadata)/XMPPackage[
0
]/Properties[
7
](urn:factur-x:pdfa:CrossIndustryDocument:invoice: 1p0# - fx:DocumentType)
], locationContext=null, errorMessage=An XMP property is either not not pre-defined, is not defined in any extension schema, or has invalid type.
], TestAssertion [ruleId=RuleId [specification=ISO 19005-3: 2012, clause=6.6.2.3, testNumber=7
], status=failed, message=All properties specified in XMP form shall use either the predefined schemas defined in the XMP Specification, ISO 19005-1 or this part of ISO 19005, or any extension schemas that comply with 6.6.2.3.2., location=Location [level=CosDocument, context=root/document[
0
]/metadata[
0
](21 0 obj PDMetadata)/XMPPackage[
0
]/Properties[
8
](urn:factur-x:pdfa:CrossIndustryDocument:invoice: 1p0# - fx:Version)
], locationContext=null, errorMessage=An XMP property is either not not pre-defined, is not defined in any extension schema, or has invalid type.
], TestAssertion [ruleId=RuleId [specification=ISO 19005-3: 2012, clause=6.6.2.3, testNumber=3
], status=failed, message=The Schema type is an XMP structure containing the definition of an extension schema. The field namespace URI is "http://www.aiim.org/pdfa/ns/schema#". The required field namespace prefix is pdfaSchema. The Schema type includes the following fields: pdfaSchema:schema (Text), pdfaSchema:namespaceURI (URI), pdfaSchema:prefix (Text), pdfaSchema:property (Seq Property), pdfaSchema:valueType (Seq ValueType)., location=Location [level=CosDocument, context=root/document[
0
]/metadata[
0
](21 0 obj PDMetadata)/XMPPackage[
0
]/ExtensionSchemasContainers[
0
]/ExtensionSchemaDefinitions[
0
]
], locationContext=null, errorMessage=Invalid Extension Schema definition
], TestAssertion [ruleId=RuleId [specification=ISO 19005-3: 2012, clause=6.6.2.3, testNumber=1
], status=failed, message=Extension schemas shall be specified using the PDF/A extension schema container schema defined in 6.6.2.3.3. All fields described in each of the tables in 6.6.2.3.3 shall be present in any extension schema container schema., location=Location [level=CosDocument, context=root/document[
0
]/metadata[
0
](21 0 obj PDMetadata)/XMPPackage[
0
]/ExtensionSchemasContainers[
0
]/ExtensionSchemaDefinitions[
0
]
], locationContext=null, errorMessage=An extension schema object contains field(s) schema(http: //www.aiim.org/pdfa/ns/extension/),namespaceURI(http://www.aiim.org/pdfa/ns/extension/),prefix(http://www.aiim.org/pdfa/ns/extension/),property(http://www.aiim.org/pdfa/ns/extension/) not defined by the specification]], isCompliant=false]
<info>
<signature>unknown</signature>
<duration unit="ms">2291</duration>
</info>
<summary status="invalid"/>
</pdf>
<messages>
<exception type="17">XML could not be extracted</exception>
</messages>
<summary status="invalid"/>
</validation>
It says that its properties need to comply with the predefined schemas or with an extension schema that I define myself. I tried the latter, because I compared my file with a valid (ZUGFeRD) pdf that I found online. So I tried to add the same Metadata as in the valid.pdf.
The valid.pdf has got the following XMP Metadata:
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/"
rdf:about=""
xmp:CreateDate="2022-02-18T17:13:43+01:00"
xmp:CreatorTool="intarsys ZUGFeRD Toolkit 2.1"
xmp:ModifyDate="2022-02-18T17:13:43+01:00"/>
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
pdf:Producer="intarsys PDF/A Live!"
rdf:about=""/>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"
pdfaid:conformance="B"
pdfaid:part="3"
rdf:about=""/>
<rdf:Description xmlns:fx="urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#"
rdf:about="">
<fx:DocumentType>INVOICE</fx:DocumentType>
<fx:DocumentFileName>factur-x.xml</fx:DocumentFileName>
<fx:Version>1.0</fx:Version>
<fx:ConformanceLevel>EN 16931</fx:ConformanceLevel>
</rdf:Description>
<rdf:Description xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
xmlns:pdfaField="http://www.aiim.org/pdfa/ns/field#"
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#"
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
xmlns:pdfaType="http://www.aiim.org/pdfa/ns/type#"
rdf:about="">
<pdfaExtension:schemas>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:schema>Factur-X PDFA Extension Schema</pdfaSchema:schema>
<pdfaSchema:namespaceURI>urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#</pdfaSchema:namespaceURI>
<pdfaSchema:prefix>fx</pdfaSchema:prefix>
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:name>DocumentFileName</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
<pdfaProperty:category>external</pdfaProperty:category>
<pdfaProperty:description>name of the embedded XML invoice file</pdfaProperty:description>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:name>DocumentType</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
<pdfaProperty:category>external</pdfaProperty:category>
<pdfaProperty:description>INVOICE</pdfaProperty:description>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:name>Version</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
<pdfaProperty:category>external</pdfaProperty:category>
<pdfaProperty:description>The actual version of the ZUGFeRD data</pdfaProperty:description>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:name>ConformanceLevel</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
<pdfaProperty:category>external</pdfaProperty:category>
<pdfaProperty:description>The conformance level of the ZUGFeRD data</pdfaProperty:description>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
</rdf:Bag>
</pdfaExtension:schemas>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
whereas my attempt.pdf has the following:
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="">
<pdfaid:part>3</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="">
<xmp:CreatorTool>JasperReports Library version 6.21.3-4a3078d20785ebe464f18037d738d12fc98c13cf</xmp:CreatorTool>
<xmp:CreateDate>2024-07-24T16:59:07+02:00</xmp:CreateDate>
<xmp:ModifyDate>2024-07-24T16:59:07+02:00</xmp:ModifyDate>
</rdf:Description>
<rdf:Description xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/" rdf:about="">
<pdfaExtension:schemas>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<pdfaExtension:schema>Factur-X PDFA Extension Schema</pdfaExtension:schema>
<pdfaExtension:namespaceURI>urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#</pdfaExtension:namespaceURI>
<pdfaExtension:prefix>fx</pdfaExtension:prefix>
<pdfaExtension:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaExtension:name>DocumentFileName</pdfaExtension:name>
<pdfaExtension:valueType>Text</pdfaExtension:valueType>
<pdfaExtension:category>external</pdfaExtension:category>
<pdfaExtension:description>name of the embedded XML invoice file</pdfaExtension:description>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaExtension:name>DocumentFileName</pdfaExtension:name>
<pdfaExtension:valueType>Text</pdfaExtension:valueType>
<pdfaExtension:category>external</pdfaExtension:category>
<pdfaExtension:description>INVOICE</pdfaExtension:description>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaExtension:name>DocumentFileName</pdfaExtension:name>
<pdfaExtension:valueType>Text</pdfaExtension:valueType>
<pdfaExtension:category>external</pdfaExtension:category>
<pdfaExtension:description>The actual version of the ZUGFeRD data</pdfaExtension:description>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaExtension:name>DocumentFileName</pdfaExtension:name>
<pdfaExtension:valueType>Text</pdfaExtension:valueType>
<pdfaExtension:category>external</pdfaExtension:category>
<pdfaExtension:description>The actual conformance level of the ZUGFeRD data</pdfaExtension:description>
</rdf:li>
</rdf:Seq>
</pdfaExtension:property>
</rdf:li>
</rdf:Bag>
</pdfaExtension:schemas>
</rdf:Description>
<rdf:Description xmlns:fx="urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#"
rdf:about="">
<fx:ConformanceLevel>EN 16931</fx:ConformanceLevel>
<fx:DocumentFileName>factur-x.xml</fx:DocumentFileName>
<fx:DocumentType>INVOICE</fx:DocumentType>
<fx:Version>1.0</fx:Version>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
The prefixes are slightly different. I tried to use the same prefixes “pdfaProperty” and “pdfaSchema” but when I try other prefixes I encounter a RuntimeException: java.lang.IllegalStateException: No binding for namespace prefix pdfaSchema.
Here is my current Java Code to produce the XMPMetadata:
private PDDocumentCatalog makeA3compliant(PDDocument doc)
throws IOException, TransformerException, javax.xml.transform.TransformerException, BadFieldValueException, XmpParsingException, XMPException, XmpSchemaException {
PDDocumentCatalog cat = doc.getDocumentCatalog();
// Document Information Dictionary
PDDocumentInformation info = doc.getDocumentInformation();
// XMP-Metadata
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
// PDF/A Identification Schema
PDFAIdentificationSchema pdfaSchema = xmp.createAndAddPDFAIdentificationSchema();
pdfaSchema.setPart(3);
pdfaSchema.setConformance("B");
// XMP Basic Schema
XMPBasicSchema xmpBasicSchema = xmp.createAndAddXMPBasicSchema();
xmpBasicSchema.setCreatorTool(info.getCreator());
xmpBasicSchema.setCreateDate(info.getCreationDate());
xmpBasicSchema.setModifyDate(info.getCreationDate());
// PDFA Extension Schema
PDFAExtensionSchema pdfaExtensionSchema = xmp.createAndAddPDFAExtensionSchemaWithNS(Map.of(
"pdfaExtension", "http://www.aiim.org/pdfa/ns/extension/",
"pdfaField", "http://www.aiim.org/pdfa/ns/field#",
"pdfaProperty", "http://www.aiim.org/pdfa/ns/property#",
"pdfaType", "http://www.aiim.org/pdfa/ns/type#"
));
pdfaExtensionSchema.setAbout(new Attribute("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "about", ""));
var bagArray = new ArrayProperty(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "schemas", Cardinality.Bag);
var resourceSchema = new PDFASchemaType(xmp);
resourceSchema.setAttribute(new Attribute(resourceSchema.getNamespace(), "rdf:parseType", "Resource"));
resourceSchema.addProperty(new TextType(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "schema", "Factur-X PDFA Extension Schema"));
resourceSchema.addProperty(new TextType(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "namespaceURI", "urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#"));
resourceSchema.addProperty(new TextType(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "prefix", "fx"));
var sequence = new ArrayProperty(xmp, "http://www.aiim.org/pdfa/ns/property#", pdfaExtensionSchema.getPrefix(), "property", Cardinality.Seq);
sequence.addProperty(createSchemaPropertyList(xmp, pdfaExtensionSchema, "DocumentFileName", "Text", "external", "name of the embedded XML invoice file"));
sequence.addProperty(createSchemaPropertyList(xmp, pdfaExtensionSchema, "DocumentFileName", "Text", "external", "INVOICE"));
sequence.addProperty(createSchemaPropertyList(xmp, pdfaExtensionSchema, "DocumentFileName", "Text", "external", "The actual version of the ZUGFeRD data"));
sequence.addProperty(createSchemaPropertyList(xmp, pdfaExtensionSchema, "DocumentFileName", "Text", "external", "The actual conformance level of the ZUGFeRD data"));
resourceSchema.addProperty(sequence);
bagArray.addProperty(resourceSchema);
pdfaExtensionSchema.addProperty(bagArray);
// Custom Schema for ZUGFeRD
String zugferdNS = "urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#";
String zugferdPrefix = "fx";
XMPSchema zugferdSchema = new XMPSchema(xmp, zugferdNS, zugferdPrefix);
zugferdSchema.setTextPropertyValue("ConformanceLevel", "EN 16931");
zugferdSchema.setTextPropertyValue("DocumentFileName", "factur-x.xml");
zugferdSchema.setTextPropertyValue("DocumentType", "INVOICE");
zugferdSchema.setTextPropertyValue("Version", "1.0");
xmp.addSchema(zugferdSchema);
// XMP-Metadata serializing
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
cat.setMetadata(metadata);
return cat;
}
private PDFAPropertyType createSchemaPropertyList(XMPMetadata xmp, PDFAExtensionSchema pdfaExtensionSchema, String name, String valueType, String category, String description) throws XmpSchemaException {
PDFAPropertyType propertyList = new PDFAPropertyType(xmp);
propertyList.setAttribute(new Attribute(propertyList.getNamespace(), "rdf:parseType", "Resource"));
propertyList.addProperty(new TextType(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "name", name));
propertyList.addProperty(new TextType(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "valueType", valueType));
propertyList.addProperty(new TextType(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "category", category));
propertyList.addProperty(new TextType(xmp, pdfaExtensionSchema.getNamespace(), pdfaExtensionSchema.getPrefix(), "description", description));
return propertyList;
}
Any help is very much appreciated. Is this maybe a wrong approach? Please enlighten me.