PDFxStream configuration options

PDFxStream's configuration can be controlled in any of three ways:

  1. Globally, by changing the state of the default instance of com.snowtide.pdf.Configuration, available via com.snowtide.pdf.Configuration.getDefault()
  2. Globally, by setting environment or system properties that Configuration uses to initialize its default instance.
  3. Locally, on a per-document / pdf-PDFxStream-instance basis, by providing a separate instance of Configuration, modified as desired, with each invocation of com.snowtide.PDF.open(), e.g. com.snowtide.PDF.open(String,byte[],Configuration)

All PDFxStream options are programmatically accessible via Configuration. The rest of this document will walk through how to set environment or system properties so that they will be picked up by Configuration when PDFxStream is first initialized, as well as an enumeration of the available options themselves.

Each of the following environment or system properties must be set before referencing PDFxStream in any way, as the properties are checked and their values (if any) are captured when PDFxStream is statically initialized. Therefore, the safest way to set these configuration options is to set the corresponding sysetm properties when starting your application:

java –cp [classpath] –Dpdfxs.config.property=value your.main.classname

You can also set system properties in your code as long as you do so before your first usage of PDFxStream. Using Java on the JVM:

System.setProperty("pdfxs.config.property", "config_value");
Document stream = PDF.open("c:\some\path.pdf");

Using C# on .NET:

using com.snowtide.pdf;
java.lang.System.setProperty("pdfxs.config.property", "config_value");
Document stream = PDF.open("c:\some\path.pdf");

PDFxStream.NET users can also set these properties the app.config file, which is equivalent to the Java convention of specifying system properties on the command line using the -D options (note the ikvm: prefix, which exposes the property to the Java namespaces):

<?xml version="1.0"?>
<configuration>
  <appSettings>
    <add key="ikvm:pdfxs.config.property" value="config_value" />
  </appSettings>
</configuration>

Alternatively, you can set configuration options using environment variables, using whatever facilities are provided by your operating system or shell for doing so.

Available configuration options

line.separator

Set this environment or system property to the string you want PDFxStream to use to separate lines in text extracts. This defaults to your platform's default line separator ("\n" on Linux/Unix/Mac OS X, and "\r\n" on Windows platforms).

pdfxs.cjk.enable

Setting this environment or system property to "N" will disable PDFxStream’s ability to extract Chinese, Japanese, or Korean (CJK) text. This may be desirable if memory utilization is a concern – CJK character maps are very large, and can consume significant amounts of memory. As always, application profiling is recommended to determine the actual source(s) of memory consumption.

pdfxs.logfactory

PDFxStream defaults to using java.util.logging or Log4J for logging informational and error messages. However, many environments demand customized logging frameworks. Therefore, PDFxStream provides a pluggable logging architecture that enables you to hook your custom logging framework into PDFxStream. To do so, simply implement the com.snowtide.util.logging.LogFactory interface, and set the pdfxs.logfactory environment or system property to the full classname of your implementation.

More details about PDFxStream's logging support is available here.

pdfxs.loggingtype

PDFxStream normally defaults to using the java.util.logging logging framework. To force PDFxStream to default to using Log4J, set the pdfxs.loggingtype environtment or system property to "log4j".

pdfxs.layout.detectTables

By default, PDFxStream will attempt to detect tabular data on each extracted page, and infer the structure of each table. This structure is then materialized as rows of com.snowtide.pdf.layout.Blocks within higher-level com.snowtide.pdf.layout.Table blocks.

This detection and inference can be disabled globally by setting the pdfxs.layout.detectTables environment or system property to "N".