Read PDF documents the easy, fast, accurate way
PDFxStream is used by the most demanding Java teams to extract from billions of PDF documents every year.
Complete PDF compatibility and unbeatable performance integrated into your application in 10 minutes or less. Requires no knowledge of PDF internals or specifications to solve your PDF problems!
PDFxStream provides unmatched access to everything within a PDF
Text
In every encoding and language, including: non-Latin scripts; Chinese, Japanese, and Korean; vertical writing systems; right-to-left scripts like Arabic, Hebrew, Urdu, and others.
Images
All image formats and encodings, native Java raster support, and automatic stitching of image tiles and strips.
Metadata
Including key-value and XML-formatted (XMP) document metadata.
Form Data
Including all classic "interactive" form data and widget types, as well as XFA-style forms.
Bookmarks & Annotations
Supporting identification of content via native tables of contents, and post-publishing comments, markup, and other annotations.
Learn More
If you want to know all of the gnarly details about PDFxStream's support for various PDF specifications, industry standards, and our constant work to support damaged and out-of-spec documents, start here.
See what's new in PDFxStream v4!
PDFxStream v4 adds outstanding support for right-to-left (RTL) and bidirectional (bidi) text extraction to complement its already-comprehensive PDF data access capabilities that have stood the test of time for Java teams for over 20 years.
There's even more though: a big price drop, a Java platform bump, and a raft of great performance and accuracy refinements.
Learn moreGetting data out of PDF documents really is this easy
Choose a PDF data extraction task:import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;
public class ExtractTextAllPages {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];
Document pdf = PDF.open(pdfFilePath);
StringBuilder text = new StringBuilder(1024);
pdf.pipe(new OutputTarget(text));
pdf.close();
System.out.println(text);
}
}
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.Page;
import com.snowtide.pdf.layout.Image;
import java.io.File;
import java.io.FileOutputStream;
public class ExtractImages {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];
File outputDir = new File(args[1]);
if (!outputDir.exists()) outputDir.mkdirs();
Document pdf = PDF.open(pdfFilePath);
for (Page p : pdf.getPages()) {
int i = 0;
for (Image img : p.getImages()) {
FileOutputStream out = new FileOutputStream(
new File(outputDir, String.format("%s-%s.%s",
p.getPageNumber(), i, img.dataFormat().name().toLowerCase())));
out.write(img.data());
out.close();
i++;
}
System.out.printf("Found %s images on page %s", p.getImages().size(), p.getPageNumber());
System.out.println();
}
}
}
import java.io.IOException;
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;
public class ExtractTextOnePage {
public static void main (String[] args) throws IOException {
String pdfFilePath = args[0];
Document pdfts = PDF.open(pdfFilePath);
StringBuilder text = new StringBuilder(1024);
pdfts.getPage(0).pipe(new OutputTarget(text));
pdfts.close();
System.out.println(text);
}
}
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
public class ExtractMetadata {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];
System.out.println("All document metadata from " + pdfFilePath + ":");
Document doc = PDF.open(pdfFilePath);
for (String key : doc.getAttributeKeys()) {
System.out.printf("%s: %s", key, doc.getAttribute(key));
System.out.println();
}
doc.close();
}
}
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.forms.*;
public class ExtractFormData {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];
Document pdfts = PDF.open(pdfFilePath);
AcroForm form = (AcroForm)pdfts.getFormData();
// access specific fields directly
AcroTextField projectName = (AcroTextField)form.getField("color.1");
AcroCheckboxField isPrivateNonProfit =
(AcroCheckboxField)form.getField("color.10-privatenonprofit");
System.out.printf("Project %s %s run by a nonprofit organization",
projectName.getValue(),
isPrivateNonProfit.isChecked() ? "is" : "is not");
System.out.println();
// access all fields (just a sampling of available data/functionality)
System.out.println(String.format("All form data from %s:", pdfFilePath));
for (AcroFormField field : form) {
Object ftype = field.getType();
if (ftype == AcroFormField.FIELD_TYPE_TEXT) {
System.out.printf("Field %s is a text box; value: %s",
field.getFullName(), field.getValue());
} else if (ftype == AcroFormField.FIELD_TYPE_BUTTON) {
switch (((AcroButtonField)field).getButtonType()) {
case AcroButtonField.BUTTON_TYPE_PUSHBUTTON:
System.out.printf("Field %s is a pushbutton; value: %s",
field.getFullName(), field.getValue());
break;
case AcroButtonField.BUTTON_TYPE_CHECKBOX:
System.out.printf("Field %s is a checkbox; value: %s; is checked? %s",
field.getFullName(), field.getValue(),
((AcroCheckboxField)field).isChecked());
break;
case AcroButtonField.BUTTON_TYPE_RADIO_GROUP:
System.out.printf("Field %s is a radio button group; value: %s; possible values: %s",
field.getFullName(), field.getValue(),
((AcroRadioButtonGroupField)field).getPossibleValues());
break;
}
} else if (ftype == AcroFormField.FIELD_TYPE_CHOICE) {
System.out.printf("Field %s is 'select' dropdown; value: %s; display label: %s",
field.getFullName(), field.getValue(),
((AcroChoiceField)field).getDisplayValue((String)field.getValue()));
} else if (ftype == AcroFormField.FIELD_TYPE_SIGNATURE) {
System.out.printf("Field %s is a signature; value: %s",
field.getFullName(), field.getValue());
} else {
System.out.printf("Field %s is of unknown type; value: %s",
field.getFullName(), field.getValue());
}
System.out.println();
}
pdfts.close();
}
}
import com.snowtide.PDF;
import com.snowtide.pdf.Bookmark;
import com.snowtide.pdf.Document;
public class AccessBookmarks {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];
Document doc = PDF.open(pdfFilePath);
Bookmark root = doc.getBookmarks();
if (root == null) {
System.out.println(pdfFilePath + " does not contain any bookmarks.");
} else {
for (Bookmark b : root.getAllDescendants()) {
System.out.printf("Bookmark '%s' points at page %s, bounds %s, %s, %s, %s",
b.getTitle(), b.getPageNumber(),
b.getLeftBound(), b.getBottomBound(),
b.getRightBound(), b.getTopBound());
System.out.println();
}
}
doc.close();
}
}
import java.io.*;
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
public class ExtractXMPMetadata {
public static void main (String[] args) throws IOException {
String pdfFilePath = args[0];
Document doc = PDF.open(pdfFilePath);
String outPath = args[0] + ".xmp.xml";
FileOutputStream s = new FileOutputStream(outPath);
s.write(doc.getXmlMetadata());
s.close();
doc.close();
System.out.println("Wrote Adobe XMP metadata to " + outPath);
}
}
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;
public class DecryptWithPassword {
public static void main (String[] args) throws java.io.IOException {
String pdfFilePath = args[0];
Document pdfts = PDF.open(pdfFilePath, args[1].getBytes());
StringBuilder text = new StringBuilder(1024);
pdfts.pipe(new OutputTarget(text));
pdfts.close();
System.out.println(text);
}
}
PDFxStream is used by companies and governments around the world to process billions of documents each year.
Let us tame the PDF monster
PDF documents are just different, way more complex than clear-text or well-structured data sources. Most tools and libraries force you to become a PDF expert to get good data out of them.
It doesn't have to be that way. PDFxStream gives you an easy-to-use set of APIs that require zero intimate knowledge of PDF internals or any of the gnarly idiosyncracies of poorly-built PDF file generators.
And, if you need more active help, we're ready to put our 20+ years of PDF experience to use in building the PDF solution you need, integrated into your application for you thanks to our professional services.
Leave the mess to us, and focus on the work that matters most to you.