Reading encrypted PDF files
PDFTextStream includes support for decrypting PDF files encrypted with 40-bit, 128-bit, 256-bit, and variable bitlength RC4 and AES ciphers. Using PDFTextStream with such files is as easy as using it with unencrypted PDF files.
If it is known that a PDF file is encrypted ahead of time, reading it with PDFTextStream is as simple as providing the file's password (as a byte array) to the appropriate PDFTextStream constructor:
public void readPdfFile (File pdfFile, String passwordStr) { // convert the password into a byte array byte[] password = passwordStr.getBytes(); // provide the password to PDFTextStream upon creation PDFTextStream stream = new PDFTextStream(pdfFile, password); // [... use PDFTextStream instance as usual ...] }
Once a PDFTextStream instance has been successfully created using a given password, it can be used normally, without regard to the fact that the file being read is encrypted.
Note that in the case of encrypted PDF files, PDFTextStream's constructors
can throw an EncryptedPDFException
(a subclass of IOException
).
There are a number of reasons why an EncryptedPDFException
might be thrown by a PDFTextStream constructor; most of them are related to
some error in decrypting data contained in a PDF file. However, one reason
why such an exception might be thrown is if an incorrect password (or no
password) is provided to a PDFTextStream constructor. In this case, an EncryptedPDFException
with an error type of EncryptedPDFException.ERROR_BAD_PASSWORD
is thrown.
This is very important in an interactive environment, where the application
doesn't necessarily know that a PDF is encrypted, and is relying upon a user
to enter the password for any encrypted PDF files it does encounter. In this
case, the application should attempt to open each PDF file assuming it is
unencrypted, watch for an EncryptedPDFException
with an error
type of EncryptedPDFException.ERROR_BAD_PASSWORD
, and then
prompt the user in an appropriate manner for the password. This code shows
an example of this technique:
public String readPdfText (File pdfFile, String password) throws IOException { try { PDFTextStream stream; if (password == null) { // no password, assume the file is unencrypted stream = new PDFTextStream(pdfFile); } else { stream = new PDFTextStream(pdfFile, password.getBytes()); } // [... read PDF text, return resulting string ...] } catch (EncryptedPDFException e) { if (e.getErrorType() == EncryptedPDFException.ERROR_BAD_PASSWORD) { // return null to indicate that a different password is needed return null; } else { // some error in the decryption process // treat just like a regular IOException throw e; } } }
Notice that if an EncryptedPDFException
with an error type of EncryptedPDFException.ERROR_BAD_PASSWORD
is thrown, then the method returns null. The module calling this method
could then appropriately prompt the user for a different password, and then
call the method with the new password.
For other types of EncryptedPDFException
, the method just
rethrows the exception. Those other error types indicate an encryption
problem that cannot readily be solved at runtime, including a corrupted or
invalid encryption method being used in a PDF file, or the failure of one of
the security mechanisms in the JRE or CLR environment that PDFTextStream
depends upon in its decryption process.