Skip to main content

Read PDF documents the easy, fast, accurate way

PDFxStream is used by the most demanding Java teams to extract from billions of PDF documents every year.

Complete PDF compatibility and unbeatable performance integrated into your application in 10 minutes or less. Requires no knowledge of PDF internals or specifications to solve your PDF problems!

PDFxStream provides unmatched access to everything within a PDF

Text

In every encoding and language, including: non-Latin scripts; Chinese, Japanese, and Korean; vertical writing systems; right-to-left scripts like Arabic, Hebrew, Urdu, and others.

Images

All image formats and encodings, native Java raster support, and automatic stitching of image tiles and strips.

Metadata

Including key-value and XML-formatted (XMP) document metadata.

Form Data

Including all classic "interactive" form data and widget types, as well as XFA-style forms.

Bookmarks & Annotations

Supporting identification of content via native tables of contents, and post-publishing comments, markup, and other annotations.

Learn More

If you want to know all of the gnarly details about PDFxStream's support for various PDF specifications, industry standards, and our constant work to support damaged and out-of-spec documents, start here.

v4
v4
v4
v4
v4
v4
v4
v4
v4
v4

See what's new in PDFxStream v4!

PDFxStream v4 adds outstanding support for right-to-left (RTL) and bidirectional (bidi) text extraction to complement its already-comprehensive PDF data access capabilities that have stood the test of time for Java teams for over 20 years.

There's even more though: a big price drop, a Java platform bump, and a raft of great performance and accuracy refinements.

Learn more

Getting data out of PDF documents really is this easy

Choose a PDF data extraction task: 
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;

public class ExtractTextAllPages {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];

Document pdf = PDF.open(pdfFilePath);
StringBuilder text = new StringBuilder(1024);
pdf.pipe(new OutputTarget(text));
pdf.close();
System.out.println(text);
}
}
Our customers are everything to us

PDFxStream is used by companies and governments around the world to process billions of documents each year.

National Institutes of Health's logo
Deloitte's logo
State of Michigan's logo
Gwava's logo
Zinio's logo

Let us tame the PDF monster

The PDF monster: a massive person-sized PDF document, disheveled, furry muppet-like appendages, no head (but a tuft of brown fur poking out of the top of the document stack), covered in green slime.

PDF documents are just different, way more complex than clear-text or well-structured data sources. Most tools and libraries force you to become a PDF expert to get good data out of them.

It doesn't have to be that way. PDFxStream gives you an easy-to-use set of APIs that require zero intimate knowledge of PDF internals or any of the gnarly idiosyncracies of poorly-built PDF file generators.

And, if you need more active help, we're ready to put our 20+ years of PDF experience to use in building the PDF solution you need, integrated into your application for you thanks to our professional services.

Leave the mess to us, and focus on the work that matters most to you.