Accessing PDF annotations
Some PDF documents contain annotations, bits of data that are associated with specific regions of a PDF document’s pages. Annotations include:
- Text notes ("stickies")
- Styled text attachments
- Links (referring to a position within the PDF document or to local or network resources)
- File, audio, and video attachments
- Drawings and in-line graphics
All annotations share a base set of possible attributes; functions for
accessing these base attributes are established by the com.snowtide.pdf.annot.Annotation
interface. These attributes include an annotation’s name (typically unique
within the page on which the annotation is found), text contents (also used
as a description field when the annotation’s primary content is non-text, as
in a file attachment), and the region on the document’s page where the
annotation is placed.
PDFTextStream provides richer implementations for three types of
annotations: text notes (via the com.snowtide.pdf.annot.TextAnnotation
class), styled text attachments (via the com.snowtide.pdf.annot.FreeTextAnnotation
class), and links (via the com.snowtide.pdf.annot.LinkAnnotation
class). The additional attributes provided by these functions are well
documented in the linked API reference pages; here is a code sample where
all of the link annotations are retrieved from a PDF document, and their
URI’s are printed to standard out:
public void printURILinks (PDFTextStream stream) throws IOException { Annotation annot; LinkAnnotation link; String uri; for (int i = 0, len = stream.getPageCnt(); i < len; i++) { List annots = stream.getAnnotations(i); for (int c = 0, clen = annots.size(); c < clen; c++) { annot = (Annotation)annots.get(c); if (annot instanceof LinkAnnotation) { link = (LinkAnnotation)annot; if (link.getLinkActionName().equals("URI")) { uri = link.getURI(); if (uri != null) { System.out.println("URL link found on page " + (i+1)+":"+uri); } } } } } }
Note that LinkAnnotation
instances may also refer to a position within the PDF document using a page
number and precise bounding coordinates as some bookmarks do; this data can
be used to drive a selective text
extraction process..