Error handling

PDFxStream is designed to only throw java.io.IOExceptions. However, in a few special cases, PDFxStream will throw other kinds of exceptions in order to indicate that particular kinds of errors have occurred. Each of these exception types subclass IOException, which is helpful in keeping prototyping code simple and clean.

com.snowtide.pdf.EncryptedPDFException

Thrown when an error occurs attempting to decrypt encrypted PDF data, including when one attempts to open an encrypted, password-protected PDF document, but do not provide the password to e.g. com.snowtide.PDF.open(String,byte[]).

See Reading Encrypted PDF Files for details on PDFxStream's PDF decryption support.

com.snowtide.pdf.FaultyPDFException

This exception type is thrown when PDFxStream encounters file data that it doesn’t understand, which includes scenarios like:

  1. The file in question is not a PDF document
  2. The file is a PDF document, but is corrupted or otherwise unusable, and PDFxStream "file repair" routines cannot compensate sufficiently

Exception Handling Patterns

In production environments, especially when PDFxStream is being used to extract content from PDF documents sourced from untrusted parties (such as indexing PDF documents found on the internet), handling these exceptions properly is important for proper monitoring of the results of your PDF content extraction efforts.

Below is a typical pattern that is ideal for such environments – it illustrates the pattern that should be used for properly handling each of the three types of exceptions most commonly seen when working with PDFxStream.

public static String extractPDFText (File pdfFile) {
    try {
        Document doc = PDF.open(pdfFile);
        StringWriter out = new StringWriter();
        doc.pipe(new OutputTarget(out));
        doc.close();
        return out.toString();
    } catch (EncryptedPDFException e) {
        System.out.println("PDF document (" + pdfFile.getAbsolutePath() + ") is encrypted...");
    } catch (FaultyPDFException e) {
        System.out.println("PDF document (" + pdfFile.getAbsolutePath() +
            ") cannot be read because: " + e.getMessage());
    } catch (IOException e) {
        System.out.println("PDF document (" + pdfFile.getAbsolutePath() + ") caused general IO error: " + e.getMessage());
    }
    
    return null;
}

Obviously, logging these errors to stdout isn’t what one would do in production, but the pattern is the same – just insert the appropriate logging or other application-specific routines for handling each type of exception.