Migrating to PDFxStream v3.x from PDFTextStream v2.x

This page provides an overview of what existing PDFTextStream v2.x customers need to know about PDFxStream v3.x.

What is PDFxStream?

PDFxStream is the umbrella name for Snowtide's PDF data extraction capabilities. Over its 10-year history, PDFTextStream had accumulated a variety of non-text-extraction functionality in response to customer feedback. Renaming the product to "PDFxStream" makes it easier for us and you to talk about it and its various capabilities without confusion.

Tangibly, PDFxStream is a single Java and .NET API for PDF data extraction that provides four distinct capabilities:

  • Text extraction, still referred to as PDFTextStream
  • Form data extraction, called PDFFormStream
  • Image extraction, called PDFImageStream
  • "Everything else", i.e. foundational PDF specification support, access to document bookmarks, annotations, metadata, etc; this is PDFxStream Base

Aside from the name change, how does PDFxStream differ from PDFTextStream v2.x?

Aside from a raft of behind-the-scenes enhancements that continue our committment to maximal PDF specification and in-the-wild document support, PDFxStream makes three big changes:

  1. The addition of PDF image extraction. People have been asking about this for years, and it's finally here. PDFImageStream makes extracting images from PDF documents from Java or .NET incredibly easy. As with the text extraction API, you don't need to learn anything about how PDF documents store or encode images: just point the API at a document, and image data flows out. Learn more…
  2. The renaming and division of PDFxStream's different capabilities makes it possible for us to license featuresets on an á la carte basis, so you only pay for what you need and use. This means that you can opt to pay for just text extraction, just form data extraction, just image extraction, or any combination thereof. Learn more…
  3. In addition to all the improvements in v3.x, this major release cleans up and modernizes the PDFxStream API significantly (e.g. generics have been added wherever possible, etc). The good news is that there is very little breakage going from the v2.x API to v3.x; if you do happen to be affected by a breaking API change, resolving it should be trivial (< 5 minutes).

If you're interested in the technical details, please refer to the full changelog.

What happens to our existing PDFTextStream v2.x licenses?

Assuming you have maintained your support enrollments, your PDFTextStream v2.x licenses will convert without charge into PDFxStream v3.x licenses and include the PDFTextStream and PDFFormStream featuresets. This isn't an automated process, but email us and we'll make it super-fast and painless.