Configuration options
PDFxStream's configuration can be controlled in any of four ways:
- Globally, by setting environment variables or Java system properties
(before starting your application) that
com.snowtide.pdf.Configuration
uses to initialize its default instance. This is the most common approach. - Globally, by changing the state of the default instance of
Configuration
, available viacom.snowtide.pdf.Configuration.getDefault()
- Locally, on a per-document (per-
com.snowtide.pdf.Document
) basis, by providing a separate instance ofConfiguration
, modified as desired, with each invocation ofcom.snowtide.PDF.open()
, e.g.com.snowtide.PDF.open(String, byte[], Configuration)
. - Changing the
Configuration
on a particularDocument
, viacom.snowtide.pdf.Document.setConfig(Configuration)
. Note that this will not affect configuration options that are only germane when a document is first being opened/initialized.
PDFxStream checks environment variables and Java system properties
only once, when com.snowtide.PDF
is statically initialized.
Therefore, the safest way to set these configuration
options is to set the corresponding system properties when starting your
application:
java –cp [classpath] –Dpdfxs.config.property=value your.main.classname
Alternatively, you can set configuration options using environment variables, using whatever facilities are provided by your operating system or shell for doing so.
Available configuration options
line.separator
Set this environment or system property to the string you want
PDFxStream to use to separate lines in text extracts. This defaults to
your platform's default line separator: \n
on Linux/Unix/Mac OS X,
and \r\n
on Windows platforms.
pdfxs.cjk.enable
Setting this environment or system property to N
will disable
PDFxStream's ability to extract Chinese, Japanese, or Korean (CJK) text.
This may be desirable if memory utilization is a concern -- CJK
character maps are very large, and can consume significant amounts of
memory. As always, application profiling is recommended to determine the
actual source(s) of memory consumption.
pdfxs.logfactory
PDFxStream defaults to using java.util.logging
or Log4J for logging
informational and error messages. However, many environments require
customized logging frameworks. Therefore, PDFxStream provides a
pluggable logging architecture that enables you to hook your custom
logging framework into PDFxStream. To do so, simply implement the
com.snowtide.util.logging.LogFactory
interface, and set the
pdfxs.logfactory
environment or system property to the full classname
of your implementation.
More details about PDFxStream's logging support is available here.
pdfxs.loggingtype
PDFxStream normally defaults to using the java.util.logging logging
framework. To force PDFxStream to default to using Log4J, make sure
a log4j jar file is present on your application's classpath, and set the
pdfxs.loggingtype
environment variable or system property to log4j
.
pdfxs.layout.detectTables
By default, PDFxStream will attempt to detect tabular data on each
extracted page, and infer the structure of each table. This structure is
then materialized as rows of com.snowtide.pdf.layout.Block
s
within higher-level com.snowtide.pdf.layout.Table
blocks.
This detection and inference can be disabled globally by setting the
pdfxs.layout.detectTables
environment or system property to N
.