Read PDF documents the easy, fast, accurate way

PDFxStream is used by the most demanding Java teams to extract from billions of PDF documents every year.

Complete PDF compatibility and unbeatable performance integrated into your application in 10 minutes or less. Requires no knowledge of PDF internals or specifications to solve your PDF problems!

Get Started Learn more

PDFxStream provides unmatched access to everything within a PDF

Text

In every encoding and language, including: non-Latin scripts; Chinese, Japanese, and Korean; vertical writing systems; right-to-left scripts like Arabic, Hebrew, Urdu, and others.

Images

All image formats and encodings, native Java raster support, and automatic stitching of image tiles and strips.

Metadata

Including key-value and XML-formatted (XMP) document metadata.

Form Data

Including all classic "interactive" form data and widget types, as well as XFA-style forms.

Bookmarks & Annotations

Supporting identification of content via native tables of contents, and post-publishing comments, markup, and other annotations.

If you want to know all of the gnarly details about PDFxStream's support for various PDF specifications, industry standards, and our constant work to support damaged and out-of-spec documents, start here.

See what's new in PDFxStream v4!

PDFxStream v4 adds outstanding support for right-to-left (RTL) and bidirectional (bidi) text extraction to complement its already-comprehensive PDF data access capabilities that have stood the test of time for Java teams for over 20 years.

There's even more though: a big price drop, a Java platform bump, and a raft of great performance and accuracy refinements.

Learn more

Getting data out of PDF documents really is this easy

Choose a PDF data extraction task:

import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;

public class ExtractTextAllPages {
    public static void main (String[] args) throws java.io.IOException {
        String pdfFilePath = args[0];

        Document pdf = PDF.open(pdfFilePath);
        StringBuilder text = new StringBuilder(1024);
        pdf.pipe(new OutputTarget(text));
        pdf.close();
        System.out.println(text);
    }
}

import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.Page;
import com.snowtide.pdf.layout.Image;

import java.io.File;
import java.io.FileOutputStream;

public class ExtractImages {
    public static void main (String[] args) throws java.io.IOException {
        String pdfFilePath = args[0];
        File outputDir = new File(args[1]);
        if (!outputDir.exists()) outputDir.mkdirs();

        Document pdf = PDF.open(pdfFilePath);
        for (Page p : pdf.getPages()) {
            int i = 0;
            for (Image img : p.getImages()) {
                FileOutputStream out = new FileOutputStream(
                        new File(outputDir, String.format("%s-%s.%s",
                                p.getPageNumber(), i, img.dataFormat().name().toLowerCase())));
                out.write(img.data());
                out.close();
                i++;
            }
            System.out.printf("Found %s images on page %s", p.getImages().size(), p.getPageNumber());
            System.out.println();
        }
    }
}

import java.io.IOException;

import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;

public class ExtractTextOnePage {
    public static void main (String[] args) throws IOException {
        String pdfFilePath = args[0];
        Document pdfts = PDF.open(pdfFilePath);
        StringBuilder text = new StringBuilder(1024);
        pdfts.getPage(0).pipe(new OutputTarget(text));
        pdfts.close();
        System.out.println(text);
    }
}

import com.snowtide.PDF;
import com.snowtide.pdf.Document;

public class ExtractMetadata {
    public static void main (String[] args) throws java.io.IOException {
        String pdfFilePath = args[0];

        System.out.println("All document metadata from " + pdfFilePath + ":");
        Document doc = PDF.open(pdfFilePath);
        for (String key : doc.getAttributeKeys()) {
            System.out.printf("%s: %s", key, doc.getAttribute(key));
            System.out.println();
        }
        doc.close();
    }
}

import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.forms.*;

public class ExtractFormData {
    public static void main (String[] args) throws java.io.IOException {
        String pdfFilePath = args[0];
        Document pdfts = PDF.open(pdfFilePath);
        AcroForm form = (AcroForm)pdfts.getFormData();
        
        // access specific fields directly
        AcroTextField projectName = (AcroTextField)form.getField("color.1");
        AcroCheckboxField isPrivateNonProfit =
                (AcroCheckboxField)form.getField("color.10-privatenonprofit");

        System.out.printf("Project %s %s run by a nonprofit organization",
                projectName.getValue(),
                isPrivateNonProfit.isChecked() ? "is" : "is not");
        System.out.println();
        
        // access all fields (just a sampling of available data/functionality)
        System.out.println(String.format("All form data from %s:", pdfFilePath));
        for (AcroFormField field : form) {
            Object ftype = field.getType();
            if (ftype == AcroFormField.FIELD_TYPE_TEXT) {
                System.out.printf("Field %s is a text box; value: %s",
                        field.getFullName(), field.getValue());
            } else if (ftype == AcroFormField.FIELD_TYPE_BUTTON) {
                switch (((AcroButtonField)field).getButtonType()) {
                    case AcroButtonField.BUTTON_TYPE_PUSHBUTTON:
                        System.out.printf("Field %s is a pushbutton; value: %s",
                                field.getFullName(), field.getValue());
                        break;
                    case AcroButtonField.BUTTON_TYPE_CHECKBOX:
                        System.out.printf("Field %s is a checkbox; value: %s; is checked? %s",
                                field.getFullName(), field.getValue(),
                                ((AcroCheckboxField)field).isChecked());                        
                        break;
                    case AcroButtonField.BUTTON_TYPE_RADIO_GROUP:
                        System.out.printf("Field %s is a radio button group; value: %s; possible values: %s",
                                field.getFullName(), field.getValue(),
                                ((AcroRadioButtonGroupField)field).getPossibleValues());
                        break;
                }
            } else if (ftype == AcroFormField.FIELD_TYPE_CHOICE) {
                System.out.printf("Field %s is 'select' dropdown; value: %s; display label: %s",
                        field.getFullName(), field.getValue(),
                        ((AcroChoiceField)field).getDisplayValue((String)field.getValue()));
            } else if (ftype == AcroFormField.FIELD_TYPE_SIGNATURE) {
                System.out.printf("Field %s is a signature; value: %s",
                        field.getFullName(), field.getValue());
            } else {
                System.out.printf("Field %s is of unknown type; value: %s",
                        field.getFullName(), field.getValue());
            }
            System.out.println();
        }

        pdfts.close();
    }
}

import com.snowtide.PDF;
import com.snowtide.pdf.Bookmark;
import com.snowtide.pdf.Document;

public class AccessBookmarks {
    public static void main (String[] args) throws java.io.IOException {
        String pdfFilePath = args[0];
        
        Document doc = PDF.open(pdfFilePath);
        Bookmark root = doc.getBookmarks();
        if (root == null) {
            System.out.println(pdfFilePath + " does not contain any bookmarks.");
        } else {
            for (Bookmark b : root.getAllDescendants()) {
                System.out.printf("Bookmark '%s' points at page %s, bounds %s, %s, %s, %s",
                        b.getTitle(), b.getPageNumber(),
                        b.getLeftBound(), b.getBottomBound(),
                        b.getRightBound(), b.getTopBound());
                System.out.println();
            }
        }

        doc.close();
    }
}

import java.io.*;

import com.snowtide.PDF;
import com.snowtide.pdf.Document;

public class ExtractXMPMetadata {
    public static void main (String[] args) throws IOException {
        String pdfFilePath = args[0];
        
        Document doc = PDF.open(pdfFilePath);
        String outPath = args[0] + ".xmp.xml";
        FileOutputStream s = new FileOutputStream(outPath);
        s.write(doc.getXmlMetadata());
        s.close();
        doc.close();
        
        System.out.println("Wrote Adobe XMP metadata to " + outPath);
    }
}

import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;

public class DecryptWithPassword {
    public static void main (String[] args) throws java.io.IOException {
        String pdfFilePath = args[0];
        Document pdfts = PDF.open(pdfFilePath, args[1].getBytes());
        StringBuilder text = new StringBuilder(1024);
        pdfts.pipe(new OutputTarget(text));
        pdfts.close();
        System.out.println(text);
    }
}

Our customers are everything to us

“

PDFxStream provided near-perfect results; it dwarfed any other software that we evaluated.
Neil Gandhi
Junior Software Engineer, Zinio

“

PDFxStream reliably gave us the best results [of the PDF extraction libraries we tested]. It worked well with our foreign language documents.
Douglas Kadlecek
Director of Engineering, Zinio

“

PDFxStream was selected because of its capabilities, ease of integration, ease of use and performance.
Chris Weiss
System Architect, State of Michigan

“

PDFxStream was the only solution that clearly had the primary goal of quality text extraction, rather than handling that as an afterthought.
Michael Bell
Vice President of R & D, GWAVA

“

Snowtide is the perfect fit and solution for GWAVA users. Throughout our extensive testing, PDFxStream proved itself to be far and away the best PDF content extraction solution available on the market.
Charles Taite
CEO & Co-Founder, GWAVA

“

We ended up not using any of these open-source APIs because they could not provide the functionality and the quality technical support we needed. By the time the project is complete, [PDFxStream] will have saved us thousands of man-hours, while also improving text extraction accuracy.
Mark Yu
Senior Engineer, NIH

“

We feel Snowtide truly went the extra mile to make us happy — and their responsive, knowledgeable tech support was an unexpected benefit.
Shailender Chohan
Senior Developer, NIH

“

Working with Snowtide was a very positive experience. We will recommend PDFxStream to other projects with similar needs.
Chris Weiss
System Architect, State of Michigan

PDFxStream is used by companies and governments around the world to process billions of documents each year.

Let us tame the PDF monster

The PDF monster: a massive person-sized PDF document, disheveled, furry muppet-like appendages, no head (but a tuft of brown fur poking out of the top of the document stack), covered in green slime.

PDF documents are just different, way more complex than clear-text or well-structured data sources. Most tools and libraries force you to become a PDF expert to get good data out of them.

It doesn't have to be that way. PDFxStream gives you an easy-to-use set of APIs that require zero intimate knowledge of PDF internals or any of the gnarly idiosyncracies of poorly-built PDF file generators.

And, if you need more active help, we're ready to put our 20+ years of PDF experience to use in building the PDF solution you need, integrated into your application for you thanks to our professional services.

Leave the mess to us, and focus on the work that matters most to you.

Download & Install Free 30-minute Consultation