Reading encrypted PDF files

PDFTextStream includes support for decrypting PDF files encrypted with 40-bit, 128-bit, 256-bit, and variable bitlength RC4 and AES ciphers. Using PDFTextStream with such files is as easy as using it with unencrypted PDF files.

If it is known that a PDF file is encrypted ahead of time, reading it with PDFTextStream is as simple as providing the file's password (as a byte array) to the appropriate PDFTextStream constructor:

public void readPdfFile (File pdfFile, String passwordStr) {
    // convert the password into a byte array
    byte[] password = passwordStr.getBytes();
    
    // provide the password to PDFTextStream upon creation
    PDFTextStream stream = new PDFTextStream(pdfFile, password);
    
    // [... use PDFTextStream instance as usual ...]
}

Once a PDFTextStream instance has been successfully created using a given password, it can be used normally, without regard to the fact that the file being read is encrypted.

Note that in the case of encrypted PDF files, PDFTextStream's constructors can throw an EncryptedPDFException (a subclass of IOException). There are a number of reasons why an EncryptedPDFException might be thrown by a PDFTextStream constructor; most of them are related to some error in decrypting data contained in a PDF file. However, one reason why such an exception might be thrown is if an incorrect password (or no password) is provided to a PDFTextStream constructor. In this case, an EncryptedPDFException with an error type of EncryptedPDFException.ERROR_BAD_PASSWORD is thrown.

This is very important in an interactive environment, where the application doesn't necessarily know that a PDF is encrypted, and is relying upon a user to enter the password for any encrypted PDF files it does encounter. In this case, the application should attempt to open each PDF file assuming it is unencrypted, watch for an EncryptedPDFException with an error type of EncryptedPDFException.ERROR_BAD_PASSWORD, and then prompt the user in an appropriate manner for the password. This code shows an example of this technique:

public String readPdfText (File pdfFile, String password) throws IOException {
    try {
        PDFTextStream stream;
        if (password == null) {
            // no password, assume the file is unencrypted
            stream = new PDFTextStream(pdfFile);
        } else {
            stream = new PDFTextStream(pdfFile, password.getBytes());
        }
        // [... read PDF text, return resulting string ...]
    } catch (EncryptedPDFException e) {
        if (e.getErrorType() == EncryptedPDFException.ERROR_BAD_PASSWORD) {
            // return null to indicate that a different password is needed
            return null;
        } else {
            // some error in the decryption process
            // treat just like a regular IOException
            throw e;
        }
    }
}

Notice that if an EncryptedPDFException with an error type of EncryptedPDFException.ERROR_BAD_PASSWORD is thrown, then the method returns null. The module calling this method could then appropriately prompt the user for a different password, and then call the method with the new password.

For other types of EncryptedPDFException, the method just rethrows the exception. Those other error types indicate an encryption problem that cannot readily be solved at runtime, including a corrupted or invalid encryption method being used in a PDF file, or the failure of one of the security mechanisms in the JRE or CLR environment that PDFTextStream depends upon in its decryption process.