PDFTextStream configuration options
PDFTextStream's configuration can be controlled in three different ways:
-
Globally, by changing the state of the default instance of
com.snowtide.pdf.PDFTextStreamConfig
, available viacom.snowtide.pdf.PDFTextStreamConfig.getDefaultConfig()
- Globally, by setting particular system properties that
com.snowtide.pdf.PDFTextStreamConfig
uses to initialize its default instance. - Locally, on a per-document / pdf-PDFTextStream-instance basis, by
providing a separate instance of
com.snowtide.pdf.PDFTextStreamConfig
, modified as desired, to eachcom.snowtide.pdf.PDFTextStream
constructor.
Each of the options available in
com.snowtide.pdf.PDFTextStreamConfig
is detailed in its API
documentation. The rest of this document will walk through how to set
system properties so that they will be picked up by
com.snowtide.pdf.PDFTextStreamConfig
, as well as an enumeration
of the available system properties themselves.
Each of the following system properties must be set before referencing PDFTextStream in any way, as the properties are checked and their values (if any) are acted upon when PDFTextStream is statically initialized. Therefore, the safest way to use these configuration-related system properties is to set them when starting your application:
java –cp [classpath] –Dpdfts.config.property=value your.main.classname
You can also set system properties in your code as long as you do so before your first usage of PDFTextStream. Using Java on the JVM:
System.setProperty("pdfts.config.property", "config_value"); PDFTextStream stream = new PDFTextStream(new File("c:\some\path.pdf"));
Using C# on .NET:
using com.snowtide.pdf; java.lang.System.setProperty("pdfts.config.property", "config_value"); PDFTextStream stream = new PDFTextStream(new java.io.File("c:\some\path.pdf"));
PDFTextStream.NET users can also set these properties the app.config
file, which is equivalent to the Java convention of specifying system
properties on the command line using the -D
options (note the ikvm:
prefix, which exposes the property to the Java namespaces):
<?xml version="1.0"?> <configuration> <appSettings> <add key="ikvm:pdfts.config.property" value="config_value" /> </appSettings> </configuration>
Available system properties
line.separator
Set this system property to the string you want PDFTextStream to use to
separate lines in text extracts. This defaults to your platform's default
line separator ("\n"
on Linux/Unix/Mac OS X, and
"\r\n"
on Windows platforms).
pdfts.cjk.enable
Setting this system property to "N"
will disable
PDFTextStream’s ability to extract Chinese, Japanese, or Korean (CJK) text.
This may be desirable if memory utilization is a concern – CJK character
maps are very large, and can consume significant amounts of memory. As
always, application profiling is recommended to determine the actual
source(s) of memory consumption.
pdfts.logfactory
PDFTextStream defaults to using java.util.logging or Log4J for logging
informational and error messages. However, many environments demand
customized logging frameworks. Therefore, PDFTextStream provides a pluggable
logging architecture that enables you to hook your custom logging framework
into PDFTextStream. To do so, simply implement the com.snowtide.util.logging.LogFactory
interface, and set the pdfts.logfactory
system property to the
full classname of your implementation.
pdfts.loggingtype
PDFTextStream normally defaults to using the java.util.logging logging
framework. To force PDFTextStream to default to using Log4J, set the pdfts.loggingtype
system property to "log4j"
.
pdfts.mmap.enable
(deprecated)
pdfts.mmap.enable
By default, PDFTextStream does not memory-map opened PDF files. This feature
can be enabled by setting the pdfts.mmap.enable
system property
to "Y"
.
This option is deprecated, and will be removed in future releases of PDFTextStream.
Due to an unfortunate bug in Java’s implementation of memory-mapped files in Windows environments, it is possible that a PDF file opened and processed by PDFTextStream will remain locked even after the PDFTextStream instance’s close() function has been called, and PDFTextStream has released all of the filesystem handles it has allocated. This locking behaviour (which is known to occur only on Windows) will prevent the PDF file from being deleted or moved until Java’s garbage collector eliminates certain JDK-internal objects that are used to track and manage the previously memory-mapped PDF file.