Error handling

PDFTextStream is designed to only throw java.io.IOException exceptions. This is true when invoking any of PDFTextStream’s constructors or other functions. This is convenient in that, in the simplest cases, you only need to worry about catching IOException instances.

However, in a few special cases, PDFTextStream will throw other kinds of exceptions in order to indicate that particular kinds of errors have occurred. Thankfully, each of these exception types subclass IOException, which is helpful in keeping prototyping code simple and clean.

EncryptedPDFException

We saw in the last section that PDFTextStream’s constructors can throw EncryptedPDFExceptions when an encryption-related error occurs. Please refer to the examples and explanation in the previous section for details on this exception type.

FaultyPDFException

PDFTextStream is also capable of throwing com.snowtide.pdf.FaultyPDFException from its constructors, as well as from most of its other functions that access PDF data. This exception type is thrown when PDFTextStream encounters file data that it doesn’t understand. This indicates one of the following:

  1. The file in question is not a PDF document
  2. The file is a PDF document, but is corrupted or otherwise unusable, and PDFTextStream cannot repair it

Exception Handling Patterns

In production environments, especially when PDFTextStream is being used to extract content from PDF documents sourced from untrusted parties (such as indexing PDF documents found on the internet), handling these exceptions properly is important for proper monitoring of the results of your PDF content extraction efforts.

Below is a typical pattern that is ideal for such environments – it illustrates the pattern that should be used for properly handling each of the three types of exceptions most commonly seen when working with PDFTextStream.

public static String extractPDFText (File pdfFile) {
    try {
        PDFTextStream stream = new PDFTextStream(pdfFile); StringBuffer sb = new StringBuffer(1024); OutputTarget tgt = new OutputTarget(sb); stream.pipe(tgt);
        stream.close();
        return sb.toString();
    } catch (EncryptedPDFException e) {
        System.out.println("PDF document (" + pdfFile.getAbsolutePath() + ") is encrypted...");
    } catch (FaultyPDFException e) {
        System.out.println("PDF document (" + pdfFile.getAbsolutePath() +
            ") cannot be read because: " + e.getMessage()); } catch (IOException e) {
        System.out.println("PDF document (" + pdfFile.getAbsolutePath() + ") caused general IO error: " + e.getMessage());
    }
    
    return null;
}

Obviously, logging these errors to System.out isn’t what one would do in production, but the pattern is the same – just insert the appropriate logging or other application-specific routines for handling each type of exception.