Skip to main content

Introducing PDFxStream

PDFxStream is a Java library that enables applications to access text, image, tabular, metadata, and form content in PDF documents quickly, easily, and accurately.

While there are many excellent tools and libraries available for generating PDF documents, PDFxStream was the first and remains the only library to focus exclusively on the extraction of data from PDF files.

PDFxStream's capabilities include:

  • Support for versions 1.0 -- 1.7 (ExtensionLevel 5) of the PDF document specification
  • Document decryption, including 40-bit, 128-bit, 256-bit, and variable bitlength RC4 and AES ciphers
  • Support for extracting bookmarks, annotations, and document attachments
  • Access to all document metadata contained in a PDF file, including Adobe XMP metadata streams
  • Unicode text extraction, including Chinese, Japanese, and Korean (CJK) support
  • Automatic detection of tabular data and inference of table structure
  • extraction of images embedded in PDF documents for immediate display or for storage as PNG, JPEG, TIFF, GIF, or BMP formats
  • extraction and filling of interactive PDF form data

Given PDFxStream's capabilities and its focus on performance, it is well suited for use in a number of different development environments, including:

  • High-volume enterprise environments that need to extract data from large numbers of PDF files
  • Content management systems (CMS's) that need access to the text of PDF files for categorization or summarization purposes
  • Full-text indexing and search systems that wish to add comprehensive support for searching PDF documents
  • Data conversion processes, especially those that aim to selectively extract and convert unstructured PDF content into structured data formats and databases.
  • Alternative content delivery systems that need to provide access to PDF document content to devices that cannot readily open and view PDF content (i.e. mobile phones, PDA's, etc)

The following sections will provide you with the reference and tutorial information you need to successfully integrate PDFxStream into your application.