Accessing PDF document metadata

PDFxStream allows your applications to access both varieties of document-level metadata that might be available in a PDF file: "DocumentInfo" name/value mappings, and Adobe XMP data.

"DocumentInfo" Name / Value Metadata

Sometimes referred to as "classic" metadata, "DocumentInfo" name/value pairs typically include creation and modification dates, the PDF document’s author’s name, somtimes a document title, and other potentially useful metadata attributes. Retrieving the document metadata attributes contained in a PDF file is a no-brainer, as shown in this code segment:

Document doc = PDF.open(pdfFilePath);
for (String key : doc.getAttributeKeys()) {
    System.out.printf("%s: %s", key, doc.getAttribute(key));
    System.out.println();
}

// print the value of the Author attribute to System.out
String authorName = (String)stream.getAttribute(Document.ATTR_AUTHOR);
System.out.println("Author: " + authorName);

doc.close();

A few notes about this code:

Adobe XMP metadata

A PDF document may also contain metadata in the form of an Adobe XMP (Extensible Metadata Platform) stream. XMP streams are XML documents that adhere to the XMP metadata schema as defined by Adobe. XMP streams typically contain the same set of metadata attributes that are available through the "classic" metadata attribute accessors, described above. However, some specialized PDF generators and workflows do add metadata constructs to a document’s XMP stream that does not fit within the simple name / value pair structure of "classic" metadata.

PDFxStream allows your application to access XMP streams very easily, as shown in this example:

import java.io.*;

import com.snowtide.PDF;
import com.snowtide.pdf.Document;

public class ExtractXMPMetadata {
    public static void main (String[] args) throws IOException {
        String pdfFilePath = args[0];
        
        Document doc = PDF.open(pdfFilePath);
        String outPath = args[0] + ".xmp.xml";
        FileOutputStream s = new FileOutputStream(outPath);
        s.write(doc.getXmlMetadata());
        s.close();
        doc.close();
        
        System.out.println("Wrote Adobe XMP metadata to " + outPath);
    }
}

Things to consider when retrieving XMP data: