PDFTextStream configuration options
PDFTextStream's configuration can be controlled in three different ways:
Globally, by changing the state of the default instance of
com.snowtide.pdf.PDFTextStreamConfig, available via
- Globally, by setting particular system properties that
com.snowtide.pdf.PDFTextStreamConfiguses to initialize its default instance.
- Locally, on a per-document / pdf-PDFTextStream-instance basis, by
providing a separate instance of
com.snowtide.pdf.PDFTextStreamConfig, modified as desired, to each
Each of the options available in
com.snowtide.pdf.PDFTextStreamConfig is detailed in its API
documentation. The rest of this document will walk through how to set
system properties so that they will be picked up by
com.snowtide.pdf.PDFTextStreamConfig, as well as an enumeration
of the available system properties themselves.
Each of the following system properties must be set before referencing PDFTextStream in any way, as the properties are checked and their values (if any) are acted upon when PDFTextStream is statically initialized. Therefore, the safest way to use these configuration-related system properties is to set them when starting your application:
java –cp [classpath] –Dpdfts.config.property=value your.main.classname
You can also set system properties in your code as long as you do so before your first usage of PDFTextStream. Using Java on the JVM:
System.setProperty("pdfts.config.property", "config_value"); PDFTextStream stream = new PDFTextStream(new File("c:\some\path.pdf"));
Using C# on .NET:
using com.snowtide.pdf; java.lang.System.setProperty("pdfts.config.property", "config_value"); PDFTextStream stream = new PDFTextStream(new java.io.File("c:\some\path.pdf"));
PDFTextStream.NET users can also set these properties the
file, which is equivalent to the Java convention of specifying system
properties on the command line using the
-D options (note the
prefix, which exposes the property to the Java namespaces):
<?xml version="1.0"?> <configuration> <appSettings> <add key="ikvm:pdfts.config.property" value="config_value" /> </appSettings> </configuration>
Available system properties
Set this system property to the string you want PDFTextStream to use to
separate lines in text extracts. This defaults to your platform's default
line separator (
"\n" on Linux/Unix/Mac OS X, and
"\r\n" on Windows platforms).
Setting this system property to
"N" will disable
PDFTextStream’s ability to extract Chinese, Japanese, or Korean (CJK) text.
This may be desirable if memory utilization is a concern – CJK character
maps are very large, and can consume significant amounts of memory. As
always, application profiling is recommended to determine the actual
source(s) of memory consumption.
PDFTextStream defaults to using java.util.logging or Log4J for logging
informational and error messages. However, many environments demand
customized logging frameworks. Therefore, PDFTextStream provides a pluggable
logging architecture that enables you to hook your custom logging framework
into PDFTextStream. To do so, simply implement the
interface, and set the
pdfts.logfactory system property to the
full classname of your implementation.
PDFTextStream normally defaults to using the java.util.logging logging
framework. To force PDFTextStream to default to using Log4J, set the
system property to
By default, PDFTextStream will attempt to detect tabular data on each
extracted page, and infer the structure of each table. This structure is
then materialized as rows of
This detection and inference can be disabled globally by setting the
pdfts.layout.detectTables system property to
By default, PDFTextStream does not memory-map opened PDF files. This feature
can be enabled by setting the
pdfts.mmap.enable system property
This option is deprecated, and will be removed in future releases of PDFTextStream.
Due to an unfortunate bug in Java’s implementation of memory-mapped files in Windows environments, it is possible that a PDF file opened and processed by PDFTextStream will remain locked even after the PDFTextStream instance’s close() function has been called, and PDFTextStream has released all of the filesystem handles it has allocated. This locking behaviour (which is known to occur only on Windows) will prevent the PDF file from being deleted or moved until Java’s garbage collector eliminates certain JDK-internal objects that are used to track and manage the previously memory-mapped PDF file.