Error handling
PDFxStream is designed to only throw java.io.IOException
s.
However, in a few special cases, PDFxStream will throw other kinds
of exceptions in order to indicate that particular kinds of errors have
occurred. Each of these exception types subclass IOException,
which is helpful in keeping prototyping code simple and clean.
com.snowtide.pdf.EncryptedPDFException
Thrown when an error occurs attempting to decrypt encrypted PDF data,
including when one attempts to open an encrypted, password-protected PDF
document, but do not provide the password to
e.g. com.snowtide.PDF.open(String,byte[])
.
See Reading Encrypted PDF Files for details on PDFxStream's PDF decryption support.
com.snowtide.pdf.FaultyPDFException
This exception type is thrown when PDFxStream encounters file data that it doesn’t understand, which includes scenarios like:
- The file in question is not a PDF document
- The file is a PDF document, but is corrupted or otherwise unusable, and PDFxStream "file repair" routines cannot compensate sufficiently
Exception Handling Patterns
In production environments, especially when PDFxStream is being used to extract content from PDF documents sourced from untrusted parties (such as indexing PDF documents found on the internet), handling these exceptions properly is important for proper monitoring of the results of your PDF content extraction efforts.
Below is a typical pattern that is ideal for such environments – it illustrates the pattern that should be used for properly handling each of the three types of exceptions most commonly seen when working with PDFxStream.
public static String extractPDFText (File pdfFile) { try { Document doc = PDF.open(pdfFile); StringWriter out = new StringWriter(); doc.pipe(new OutputTarget(out)); doc.close(); return out.toString(); } catch (EncryptedPDFException e) { System.out.println("PDF document (" + pdfFile.getAbsolutePath() + ") is encrypted..."); } catch (FaultyPDFException e) { System.out.println("PDF document (" + pdfFile.getAbsolutePath() + ") cannot be read because: " + e.getMessage()); } catch (IOException e) { System.out.println("PDF document (" + pdfFile.getAbsolutePath() + ") caused general IO error: " + e.getMessage()); } return null; }
Obviously, logging these errors to stdout
isn’t what one
would do in production, but the pattern is the same – just insert the
appropriate logging or other application-specific routines for handling each
type of exception.