PDFxStream for .NET
PDFxStream.NET is produced by translating the PDFxStream for Java binary into a managed .NET assembly. This translation process is complete, preserving PDFxStream's API, architecture, functionality, and performance characteristics.
This kind of translation is possible because the Java Virtual Machine (JVM) and the .NET Common Language Runtime (CLR) are very similar architecturally, and the Java and .NET object models are conceptually analogous. The actual translation is performed by IKVM's static compilation process. IKVM is an open source toolkit that makes it possible to run Java applications and libraries within the .NET environment.
IKVM and the included OpenJDK library both use a liberal open-source license that makes it possible to redistribute them with commercial products without constraining such products' own licenses.
Requirements
PDFxStream.NET requires v2.0 SP2 or higher of the .NET or Mono runtime.
All DLLs for a given PDFxStream release are found in the lib directory
of the PDFxStream.NET distribution. This includes a number of
IKVM.*.dll
files (e.g. IKVM.Runtime.dll
), as well as two PDFxStream
DLLs, only one of which you will use, depending on the .NET language you
are using:
PDFxStreamVB.dll
, for use only in VB.NET projectsPDFxStream.dll
, for use with any language other than VB.NET
As indicated above, you should choose only one of the PDFxStream DLLs,
based on which .NET language you are using: VB.NET projects should use
PDFxStreamVB.dll
, while all other languages should use
PDFxStream.dll
.
The IKVM DLL files are PDFxStream.NET's only dependencies. They provide the implementation of Java's standard library in .NET, as well as some runtime components that are required by any Java JAR that has been translated into a .NET assembly. No configuration or special initialization of these DLL files are necessary.
Why are there different PDFxStream DLLs for different .NET languages?
Symbols in VB are case-insensitive, which causes a collision between the
com.snowtide.pdf
namespace and our primary entry point, the
com.snowtide.PDF
class. In the PDFxStreamVB.dll
library for use with
VB.NET, the PDF
class is renamed to
com.snowtide.PDFxStream
, eliminating any ambiguity. No other changes
to the API documented here or in our API reference is affected, so you
can continue to use these resources while programming PDFxStream via
VB.NET.
All other .NET languages (including C#, F#, and others) do support case-sensitivity in namespace and class symbols, so they can use the standard PDFxStream API as-is
Installation
Using PDFxStream.NET within your .NET project is as simple as adding
references to each of the DLL files indicated in the previous section:
all of the IKVM.*.dll
s, and one of either PDFxStream.dll
or
PDFxStreamVB.dll
, depending on the .NET language your project uses.
Typical Usage
Using PDFxStream.NET is very straightforward, and mirrors typical PDFxStream for Java usage. Here's a sample text extraction function in C#:
using com.snowtide;
using com.snowtide.pdf;
using java.io;
class ExtractTextAllPages
{
public static void Main(string[] args)
{
string pdfFilePath = args[0];
StringWriter text = new StringWriter(1024);
using (Document doc = PDF.open(pdfFilePath))
{
doc.pipe(new OutputTarget(text));
}
System.Console.WriteLine("The text extracted from {0} is:",
pdfFilePath);
System.Console.WriteLine(text.toString());
}
}
Without exception, all of the PDFxStream API is available in .NET. Because of this, the PDFxStream javadoc is the authoritative API reference for PDFxStream, whether it is used in Java or .NET.
Notes and Limitations
The sole minor difference between the documented PDFxStream API and its usage in .NET is how one obtains bitmap objects from extracted PDF image data. See this note for details.
Aside from this minor irregularity, PDFxStream.NET carries no limitations; it is a pure .NET assembly, through and through, and it acts like it.
For example, you can freely write
com.snowtide.pdf.OutputHandler
implementations in .NET. Here
is a contrived example for illustration that will count the number of
characters extracted from a PDF:
namespace SubclassingExample
{
class CharCountingTarget : com.snowtide.pdf.OutputTarget
{
private int cnt = 0;
public CharCountingTarget (java.lang.Appendable sb) : base(sb)
{
}
public override void textUnit (com.snowtide.pdf.layout.TextUnit tu)
{
base.textUnit(tu);
cnt++;
}
public int getCount ()
{
int _cnt = cnt;
cnt=0;
return cnt;
}
}
}
An OutputHandler
(or
com.snowtide.pdf.OutputTarget
, in this case) subclass like
this can be used in conjunction with any pipe(OutputHandler)
method,
found on instances of com.snowtide.pdf.Document
,
com.snowtide.pdf.Page
, and
com.snowtide.pdf.layout.Block
.
Snowtide Collection Method Extensions
The com.snowtide
namespace provides a couple of extension methods to
make it easier to use parts of the PDFxStream API in .NET.
Consuming collections as IEnumerable
Java collections all implement the java.util.Iterable
interface, which
is analogous to .NET's
IEnumerable
interface. Unfortunately, the IKVM compilation process does not expose
Java collections as IEnumerable
s; without an appropriate method
extension, this would mean that iterating through any collection
returned by PDFxStream could not be traversed with e.g. foreach
or
passed to any method that requires an IEnumerable
.
Using the com.snowtide
namespace will bring an extension method into
scope that makes it easy to treat any collection returned by PDFxStream
as an IEnumerable
, e.g. here used to easily iterate through the keys
of the document metadata in a PDF document:
using com.snowtide;
using com.snowtide.pdf;
class ExtractMetadata
{
public static void Main(string[] args)
{
string pdfFilePath = args[0];
System.Console.WriteLine("All document metadata from {0}:", pdfFilePath);
using (Document doc = PDF.open(pdfFilePath))
{
foreach (string attrKey in doc.getAttributeKeys().toIEnumerable<string>())
{
System.Console.WriteLine("{0}: {1}", attrKey, doc.getAttribute(attrKey));
}
}
}
}
Using StringBuffer
and StringBuilder
as Appendable
s
Many implementations of OutputHandler
provided by PDFxStream accept java.lang.Appendable
objects as their
principal constructor argument. This interface is implemented by a
number of useful sinks for textual output, including
java.lang.StringBuffer
, java.lang.StringBuilder
,
java.nio.CharBuffer
, any subclass of java.io.Writer
, etc.
The one wrinkle to this is that StringBuffer
and StringBuilder
implement Appendable
via a shared package-private superclass, the
methods and implemented interfaces of which are not visible to code
using StringBuffer
or StringBuilder
in .NET. This means that this C#
code will not compile:
using com.snowtide;
using com.snowtide.pdf;
// ...
StringBuilder sb = new java.lang.StringBuilder();
OutputTarget tgt = new OutputTarget(sb);
The simple solution to this is to simply not use
java.lang.StringBuilder
or java.lang.StringBuffer
from .NET. Any
usage of them in conjunction with PDFxStream can be replaced with e.g.
java.io.StringWriter
; all PDFxStream code samples demonstrate and
recommend using StringWriter
with
OutputHandler
implementations.
The other option is to use the .toAppendable()
extension method
provided by the com.snowtide
namespace:
using com.snowtide;
using com.snowtide.pdf;
// ...
StringBuilder sb = new java.lang.StringBuilder();
OutputTarget tgt = new OutputTarget(sb.toAppendable());