PDFxStream for .NET
PDFxStream.NET is produced by translating the PDFxStream for Java binary into a managed .NET assembly. This translation process is complete, preserving PDFxStream’s API, architecture, functionality, and performance characteristics.
This kind of translation is possible because the Java Virtual Machine (JVM) and the .NET Common Language Runtime (CLR) are very similar architecturally, and the Java and .NET object models are conceptually analogous. The actual translation is performed by IKVM's static compilation process. IKVM is an open source toolkit that makes it possible to run Java applications and libraries within the .NET environment.
IKVM and the included OpenJDK library both use a liberal open-source license that makes it possible to redistribute them with commercial products without constraining such products' own licenses.
Requirements
PDFxStream.NET requires v2.0 SP2 or higher of the .NET or Mono runtime. All DLLs for a given PDFxStream release are found in the lib directory of the PDFxStream.NET distribution:
PDFxStream.dll
IKVM.Runtime.dll
IKVM.AWT.WinForms.dll
IKVM.OpenJDK.Beans.dll
IKVM.OpenJDK.Charsets.dll
IKVM.OpenJDK.Core.dll
IKVM.OpenJDK.Media.dll
IKVM.OpenJDK.Security.dll
IKVM.OpenJDK.SwingAWT.dll
IKVM.OpenJDK.Text.dll
IKVM.OpenJDK.Util.dll
IKVM.OpenJDK.XML.API.dll
The IKVM DLL files are PDFxStream.NET's only dependencies. They provide the implementation of Java's standard library in .NET, as well as some runtime components that are required by any Java JAR that has been translated into a .NET assembly. No configuration or special initialization of these DLL files are necessary.
Installation
Using PDFxStream.NET within your .NET project is as simple as adding references to each of the DLL files indicated in the previous section.
Typical Usage
Using PDFxStream.NET is very straightforward, and mirrors typical PDFxStream for Java usage. Here's a sample text extraction function in C#:
using com.snowtide; using com.snowtide.pdf; using java.io; class ExtractTextAllPages { public static void Main(string[] args) { string pdfFilePath = args[0]; StringWriter text = new StringWriter(1024); using (Document doc = PDF.open(pdfFilePath)) { doc.pipe(new OutputTarget(text)); } System.Console.WriteLine("The text extracted from {0} is:", pdfFilePath); System.Console.WriteLine(text.toString()); } }
Without exception, all of the PDFxStream API is available in .NET. Because of this, the PDFxStream javadoc is the authoritative API reference for PDFxStream, whether it is used in Java or .NET.
Notes and Limitations
The sole minor difference between the documented PDFxStream API and its usage in .NET is how one obtains bitmap objects from extracted PDF image data. See this note for details.
Aside from this minor irregularity, PDFxStream.NET carries no limitations; it is a pure .NET assembly, through and through, and it acts like it.
For example, you can freely write com.snowtide.pdf.OutputHandler
implementations in .NET. Here is a contrived example for illustration that
will count the number of characters extracted from a PDF:
namespace SubclassingExample { class CharCountingTarget : com.snowtide.pdf.OutputTarget { private int cnt = 0; public CharCountingTarget (java.lang.Appendable sb) : base(sb) { } public override void textUnit (com.snowtide.pdf.layout.TextUnit tu) { base.textUnit(tu); cnt++; } public int getCount () { int _cnt = cnt; cnt=0; return cnt; } } }
An OutputHandler
(or com.snowtide.pdf.OutputTarget
, in this case)
subclass like this can be used in conjunction with
any pipe(OutputHandler)
method, found on instances
of com.snowtide.pdf.Document
,
com.snowtide.pdf.Page
,
and com.snowtide.pdf.layout.Block
.
Snowtide Collection Method Extensions
The com.snowtide
namespace provides a couple of extension
methods to make it easier to use parts of the PDFxStream API in .NET.
Consuming collections as IEnumerable
Java collections all implement the java.util.Iterable
interface, which is analogous to
.NET's IEnumerable
interface. Unfortunately, the IKVM compilation process does not expose Java
collections as IEnumerable
s; without an appropriate method
extension, this would mean that iterating through any collection returned by
PDFxStream could not be traversed with e.g. foreach
or passed
to any method that requires an IEnumerable
.
Using the com.snowtide
namespace will bring an extension method
into scope that makes it easy to treat any collection returned by PDFxStream
as an IEnumerable
, e.g. here used to easily iterate through the
keys of the document metadata in a PDF document:
using com.snowtide; using com.snowtide.pdf; class ExtractMetadata { public static void Main(string[] args) { string pdfFilePath = args[0]; System.Console.WriteLine("All document metadata from {0}:", pdfFilePath); using (Document doc = PDF.open(pdfFilePath)) { foreach (string attrKey in doc.getAttributeKeys().toIEnumerable<string>()) { System.Console.WriteLine("{0}: {1}", attrKey, doc.getAttribute(attrKey)); } } } }
Using StringBuffer
and StringBuilder
as Appendable
s
Many implementations
of OutputHandler
provided by
PDFxStream accept java.lang.Appendable
objects as their principal
constructor argument. This interface is implemented by a number of useful
sinks for textual output,
including java.lang.StringBuffer
, java.lang.StringBuilder
, java.nio.CharBuffer
,
any subclass of java.io.Writer
, etc.
The one wrinkle to this is that StringBuffer
and StringBuilder
implement Appendable
via a shared
package-private superclass, the methods and implemented interfaces of which
are not visible to code using StringBuffer
or StringBuilder
in .NET. This means that this C# code will not
compile:
using com.snowtide; using com.snowtide.pdf; // ... StringBuilder sb = new java.lang.StringBuilder(); OutputTarget tgt = new OutputTarget(sb);
The simple solution to this is to simply not use java.lang.StringBuilder
or java.lang.StringBuffer
from .NET. Any usage of them in
conjunction with PDFxStream can be replaced with
e.g. java.io.StringWriter
; all PDFxStream code samples
demonstrate and recommend using StringWriter
with OutputHandler
implementations.
The other option is to use the .toAppendable()
extension method provided
by the com.snowtide
namespace:
using com.snowtide; using com.snowtide.pdf; // ... StringBuilder sb = new java.lang.StringBuilder(); OutputTarget tgt = new OutputTarget(sb.toAppendable());