Accessing PDF annotations
Some PDF documents contain annotations, bits of data that are associated with specific regions of a PDF document's pages.
All annotations share a base set of possible attributes; functions for
accessing these base attributes are established by the
com.snowtide.pdf.annot.Annotation
interface. These attributes
include an annotation's name (typically unique within the page on which
the annotation is found), text contents (also used as a description
field when the annotation's primary content is non-text, as in a file
attachment), and the region on the document's page where the annotation
is placed.
PDFxStream provides richer implementations for four types of annotations:
PDF annotation type | PDFxStream type |
---|---|
Text notes ("stickies") | com.snowtide.pdf.annot.TextAnnotation |
Styled text attachments | com.snowtide.pdf.annot.FreeTextAnnotation |
Links (referring to a position within the PDF document or to local or network resources) | com.snowtide.pdf.annot.LinkAnnotation |
File attachments | com.snowtide.pdf.annot.FileAttachmentAnnotation |
The additional attributes provided by these classes are well documented in the linked API reference pages.
Actually accessing PDF annotations is accomplished using either
com.snowtide.pdf.Document.getAllAnnotations()
(to get a list of all annotations,
document-wide) or com.snowtide.pdf.Document.getAnnotations(int)
(to get a list of all
annotations on a given page).
Here is a code sample where all of the link annotations are retrieved from a PDF document, and their URI's are printed to standard out:
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.annot.*;
public class ExtractLinks {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];
try (Document doc = PDF.open(pdfFilePath)) {
for (Annotation a : doc.getAllAnnotations()) {
if (a instanceof LinkAnnotation) {
LinkAnnotation link = (LinkAnnotation)a;
if (link.getURI() != null) {
System.out.printf(
"Found outgoing link on page %s, bounds %s, uri: %s",
link.pageNumber(), link.bounds(), link.getURI());
System.out.println();
}
}
}
}
}
}
Note that LinkAnnotation
instances may
also refer to a position within the PDF document using a page number and
precise bounding coordinates as some bookmarks do; this data can be used
to drive a selective text extraction
process.