Skip to main content
PDFxStream Features & Capabilities

Everything in one box to make accessing content and data from PDFs easy.

info

Most developers don't need to read this page. PDFxStream supports nearly everything in the PDF specification (and hundreds of commonly-found constructs that aren't in the standard spec!), and is built so that your use of it requires zero knowledge of those details.

(If you care about a particular PDF data type or characteristic that is not listed below, then please feel free to contact us to confirm that PDFxStream supports what you need.)

That said, for those that do care about PDF internals and want to know about PDFxStream's level of support for them, read on and enjoy!

Making PDF data access simple and easy

Working with PDF documents is often a frustrating and difficult exercise. Few developers are familiar with the PDF file specification (and all of its incorporated sub-specifications), and even those that are usually don't want to have to consider those details when completing what should be an easy task like "store each uploaded PDF document's text in the database".

For this reason, one of PDFxStream's primary features is that it allows you to complete those sorts of tasks using an API that doesn't require knowledge of the particulars of the PDF file format. See for yourself just how little is necessary to access all of the essential pools of content and data within PDF documents using PDFxStream:

Choose a PDF data extraction task: 
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;

public class ExtractTextAllPages {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];

Document pdf = PDF.open(pdfFilePath);
StringBuilder text = new StringBuilder(1024);
pdf.pipe(new OutputTarget(text));
pdf.close();
System.out.println(text);
}
}

Give PDFxStream a try for your project's PDF data access needs!

Get Started

Baseline PDF format compatibility and basic data extraction capabilities

The official PDF file format specification is large and complex. PDF files can be rich, dynamic documents, and getting to all of the interesting and useful parts of them (i.e. their content, text, metadata, etc) is a daunting task.

Further, Adobe's specification only provides normative descriptions of how PDF documents should be constructed. Experience shows that applications must often process PDF documents from flawed sources that sometimes generate PDF files that bend and often break the "official" PDF specification — similar to how web browsers are forced to support broken and malformed HTML documents as best as they can.

This is just one of the many reasons why continually supporting and maintaining PDFxStream is a never-ending task. Doing anything else would prevent us from guaranteeing maximum compatibility with all PDF document formats and variants "in the field", regardless of their source or to what degree they violate certain rules of good PDF file format etiquette.

PDF Format Support Details

The range of PDF file format features (and quirks!) that PDFxStream supports is broad and deep. Below is a partial list of the major facets of the PDF specification that PDFxStream supports.

  • Compatibility with all versions of the PDF document specification, from v1.0 (corresponding to Acrobat 1) to v1.7 (corresponding to Acrobat 8 and higher).
  • Support for decryption of PDF documents encrypted with or without a password using 40-bit, 128-bit, 256-bit, and variable bitlength ciphers (including RC4 and AES)
  • Automatic "repair" of PDF documents to account for common malformations and irregularities
  • Extraction of PDF annotations (links, text notes, etc)
  • Extraction of embedded files and attachments
  • Extraction of PDF bookmarks (a.k.a. outline, table of contents)
  • Extraction of document metadata, as either key/value pairs or XML
  • Extraction of raw character data
  • Extraction of image metadata, including image dimensions, locations, and types
  • PDF file merging

Give PDFxStream a try for your project's PDF data access needs!

Get Started

Text extraction features

Image extraction features

  • Decompression and decoding of dozens of PDF image types
  • Rendering of images to on-screen graphics contexts (java.awt.image.BufferedImage on Java, or System.Drawing.Bitmap on .NET) and saving to disk in familiar formats:
    • JPEG
    • TIFF
    • GIF
    • PNG
    • BMP
  • Automatic stitching of image tiles and strips

Form data extraction features

  • Support for extracting "Acroform" (interactive) form data from all types of fields:
    • Text
    • Dropdowns ("Choice" fields)
    • Radio buttons
    • Checkboxes
    • Pushbuttons
    • Signatures
  • Support for extracting XFA form data
  • Support for filling "Acroform" fields, writing updated PDF documents

Learn more

To get the most out of PDFxStream's capabilities and PDF file format support, please check out the developer's guide and API reference.

Give PDFxStream a try for your project's PDF data access needs!

Get Started