Skip to main content

Configuration options

PDFxStream's configuration can be controlled in any of four ways:

  1. Globally, by setting environment variables or Java system properties (before starting your application) that com.snowtide.pdf.Configuration uses to initialize its default instance. This is the most common approach.
  2. Globally, by changing the state of the default instance of Configuration, available via com.snowtide.pdf.Configuration.getDefault()
  3. Locally, on a per-document (per-com.snowtide.pdf.Document) basis, by providing a separate instance of Configuration, modified as desired, with each invocation of com.snowtide.PDF.open(), e.g. com.snowtide.PDF.open(String, byte[], Configuration).
  4. Changing the Configuration on a particular Document, via com.snowtide.pdf.Document.setConfig(Configuration). Note that this will not affect configuration options that are only germane when a document is first being opened/initialized.

PDFxStream checks environment variables and Java system properties only once, when com.snowtide.PDF is statically initialized. Therefore, the safest way to set these configuration options is to set the corresponding system properties when starting your application:

java –cp [classpath] –Dpdfxs.config.property=value your.main.classname

Alternatively, you can set configuration options using environment variables, using whatever facilities are provided by your operating system or shell for doing so.

Available configuration options

line.separator

Set this environment or system property to the string you want PDFxStream to use to separate lines in text extracts. This defaults to your platform's default line separator: \n on Linux/Unix/Mac OS X, and \r\n on Windows platforms.

pdfxs.cjk.enable

Setting this environment or system property to N will disable PDFxStream's ability to extract Chinese, Japanese, or Korean (CJK) text. This may be desirable if memory utilization is a concern -- CJK character maps are very large, and can consume significant amounts of memory. As always, application profiling is recommended to determine the actual source(s) of memory consumption.

pdfxs.logfactory

PDFxStream defaults to using java.util.logging or Log4J for logging informational and error messages. However, many environments require customized logging frameworks. Therefore, PDFxStream provides a pluggable logging architecture that enables you to hook your custom logging framework into PDFxStream. To do so, simply implement the com.snowtide.util.logging.LogFactory interface, and set the pdfxs.logfactory environment or system property to the full classname of your implementation.

More details about PDFxStream's logging support is available here.

pdfxs.loggingtype

PDFxStream normally defaults to using the java.util.logging logging framework. To force PDFxStream to default to using Log4J, make sure a log4j jar file is present on your application's classpath, and set the pdfxs.loggingtype environment variable or system property to log4j.

pdfxs.layout.detectTables

By default, PDFxStream will attempt to detect tabular data on each extracted page, and infer the structure of each table. This structure is then materialized as rows of com.snowtide.pdf.layout.Blocks within higher-level com.snowtide.pdf.layout.Table blocks.

This detection and inference can be disabled globally by setting the pdfxs.layout.detectTables environment or system property to N.