Error handling
PDFxStream is designed to only throw java.io.IOException
s. However, in
a few special cases, PDFxStream will throw other kinds of exceptions in
order to indicate that particular kinds of errors have occurred. Each of
these exception types subclass IOException, which is helpful in keeping
prototyping code simple and clean.
com.snowtide.pdf.EncryptedPDFException
Thrown when an error occurs attempting to decrypt encrypted PDF data,
including when one attempts to open an encrypted, password-protected PDF
document, but do not provide the password to e.g.
com.snowtide.PDF.open(String, byte[])
.
See Reading Encrypted PDF Files for details on PDFxStream's PDF decryption support.
com.snowtide.pdf.FaultyPDFException
This exception type is thrown when PDFxStream encounters unexpected and/or out-of-spec PDF data, e.g.:
- The file in question is not a PDF document
- The file is a PDF document, but is corrupted or otherwise unusable, and PDFxStream "file repair" routines cannot compensate sufficiently
Exception Handling Patterns
In production environments, especially when PDFxStream is being used to extract content from PDF documents sourced from untrusted parties (such as indexing PDF documents found on the internet), handling these exceptions properly is important for proper monitoring of the results of your PDF content extraction efforts.
Here is a sample that illustrates the pattern that should be used for properly handling each of the three types of exceptions most commonly seen when working with PDFxStream.
public static String extractPDFText (File pdfFile) {
try (Document doc = PDF.open(pdfFile)) {
StringBuilder out = new StringBuilder();
doc.pipe(new OutputTarget(out));
return out.toString();
} catch (EncryptedPDFException e) {
System.out.println("PDF document (" + pdfFile + ") is encrypted...");
} catch (FaultyPDFException e) {
System.out.println("PDF document (" + pdfFile +
") cannot be read because: " + e.getMessage());
} catch (IOException e) {
System.out.println("PDF document (" + pdfFile.getAbsolutePath() +
") caused a general IO error: " + e.getMessage());
}
return null;
}
Obviously, logging these errors to stdout
isn't what one would do in
production, but the pattern is the same -- just insert the appropriate
logging or other application-specific routines for handling each type of
exception.