Configuration options
PDFxStream's configuration can be controlled in any of four ways:
- Globally, by setting environment variables or Java system properties
(before starting your application) that
com.snowtide.pdf.Configurationuses to initialize its default instance. This is the most common approach. - Globally, by changing the state of the default instance of
Configuration, available viacom.snowtide.pdf.Configuration.getDefault() - Locally, on a per-document (per-
com.snowtide.pdf.Document) basis, by providing a separate instance ofConfiguration, modified as desired, with each invocation ofcom.snowtide.PDF.open(), e.g.com.snowtide.PDF.open(String, byte[], Configuration). - Changing the
Configurationon a particularDocument, viacom.snowtide.pdf.Document.setConfig(Configuration). Note that this will not affect configuration options that are only germane when a document is first being opened/initialized.
PDFxStream checks environment variables and Java system properties
only once, when com.snowtide.PDF is statically initialized.
Therefore, the safest way to set these configuration
options is to set the corresponding system properties when starting your
application:
java –cp [classpath] –Dpdfxs.config.property=value your.main.classname
Alternatively, you can set configuration options using environment variables, using whatever facilities are provided by your operating system or shell for doing so.
Available configuration options
line.separator
Set this environment or system property to the string you want
PDFxStream to use to separate lines in text extracts. This defaults to
your platform's default line separator: \n on Linux/Unix/Mac OS X,
and \r\n on Windows platforms.
pdfxs.cjk.enable
Setting this environment or system property to N will disable
PDFxStream's ability to extract Chinese, Japanese, or Korean (CJK) text.
This may be desirable if memory utilization is a concern -- CJK
character maps are very large, and can consume significant amounts of
memory. As always, application profiling is recommended to determine the
actual source(s) of memory consumption.
pdfxs.logfactory
PDFxStream defaults to using java.util.logging or Log4J for logging
informational and error messages. However, many environments require
customized logging frameworks. Therefore, PDFxStream provides a
pluggable logging architecture that enables you to hook your custom
logging framework into PDFxStream. To do so, simply implement the
com.snowtide.util.logging.LogFactory interface, and set the
pdfxs.logfactory environment or system property to the full classname
of your implementation.
More details about PDFxStream's logging support is available here.
pdfxs.loggingtype
PDFxStream normally defaults to using the java.util.logging logging
framework. To force PDFxStream to default to using Log4J, make sure
a log4j jar file is present on your application's classpath, and set the
pdfxs.loggingtype environment variable or system property to log4j.
pdfxs.layout.detectTables
By default, PDFxStream will attempt to detect tabular data on each
extracted page, and infer the structure of each table. This structure is
then materialized as rows of com.snowtide.pdf.layout.Blocks
within higher-level com.snowtide.pdf.layout.Table blocks.
This detection and inference can be disabled globally by setting the
pdfxs.layout.detectTables environment or system property to N.