Zinio Delivers Digital Magazines to 3.5 Million Readers and Powers Search with PDFxStream

Zinio uses PDFxStream as the foundation of its digital magazine and book search features, which index every issue of every magazine from over 250 publishers that are digitally distributed to over 3.5 million readers each month.

Zinio is a leading online publisher and distributor of digital magazines and other publications, delivering over 20 million magazine issues to over 3.5 million readers each year. Their delivery platform makes it possible for readers to easily search the content of magazines and other publications, providing a richer reading experience, and real-time, accurate usage statistics for publishers.

The Challenge

"PDFxStream reliably gave us the best results [of the PDF extraction libraries we tested]. It worked well with our foreign language documents." Douglas Kadlecek, Director of Engineering, Zinio

Zinio's publishing workflow is based on PDF documents it receives from publishers, so it needed to support searching over this PDF content from the start. Unfortunately, the original search effort was hampered by the PDF content extraction library Zinio used at first.

"The open source PDF text extraction solution we had been using was not producing good results," said Neil Gandhi, Zinio's Junior Software Engineer. "It was severely crippling our search functionality by not extracting much of the text that was available in the PDF document[s]." While Zinio's initial PDF indexing implementation had carried them for a while, they had outgrown it, and it was time to find a better PDF content extraction foundation.

In addition, Zinio did not want to abandon their investment in Apache Lucene, which ran the actual user queries based on the content extracted from each PDF document. So, the Zinio technical team began a search for a more reliable, more accurate PDF content extraction library for their "next- gen" search features.

The Solution

"PDFxStream provided near-perfect results; it dwarfed any other software that we evaluated." Neil Gandhi, Junior Software Engineer, Zinio

The search was led by Mr. Gandhi. He and his team selected eight PDF content extraction libraries they were able to find on the market (both open source and commercial), and subjected them to a rigorous testing process. Mr. Gandhi also performed a qualitative evaluation of each component's API and interface, and made a point of interacting with the vendor or maintainer of each component to gauge relative customer support quality.

The upshot of that multifaceted evaluation process was clear. "We chose PDFxStream primarily on its technical merits. It provided us with the most complete and correct content extracts from the documents we chose to test with," said Mr. Gandhi. "Also, it happened to be the most customizable and configurable, and Snowtide provided the best customer service overall."

The Results

PDFxStream is now used to extract content from every magazine and publication that Zinio distributes to support fast and reliable searching of those publications. This core feature of the Zinio service is regularly used by a majority of their 3.5 million customer base, and is now a reliable cornerstone of Zinio's value proposition.

To top it all off, Mr. Gandhi and his team were also pleased to see that integrating PDFxStream into Zinio's Apache Lucene indexing and search framework was a snap. "PDFxStream was integrated into a task executor leveraging the powerful API that PDFxStream provides. The code on our end in fact is less than 10 lines long; all-told, the integration took about 10 minutes."