Able2Extract is an app for extracting the text and formatting from a PDF file. From the PDF, it produces a formatted document in Word, Excel, PowerPoint, Publisher, OpenOffice, HTML, and even AutoCad. It is available for Mac, Windows, and Linux. The kind people at Investintech supplied Legal Yankee with a full working version 9 for review, and we have spent two weeks putting it through its pace. This is our review.
Those in the field of law deal with a lot of documents: mostly hard copies, PDFs, and Word files. These documents often need be edited, or parts of it copied into another document. This is easy if the document is a Word file, but might be a problem if it is a PDF because there are two kinds: those with the text embodied in the PDF, and those that are mere graphic scans of a text file. If you have the latter type, you must either retype it manually, or use an OCR program (which perform well, though with varying results depending on the app and the quality of the document).
Fortunately, most PDFs today are the kind that have the text embedded. What this means is that you can select some of the text, copy, and paste it into another document. The problem is that the formatting is lost, and, in many cases, garbled. If you select several pages of text, and the PDF includes headers or footers, those headers and footers will appear in the midst of your text. Footnotes, headings, block quotes—all formatting is lost.
Able2Extract 9 is the answer. It will not only extract the text and put it in the file format of your choosing, but it will format that text as it was in the PDF. This is quite helpful for Word documents, but we found it even more so for Excel documents, or PDF documents that contained tables or graphics with text flowed around it.
The user interface is straightforward, and we had little trouble using the app without even consulting the documentation. (We found, however, that the documentation clued us in to some tricks and tips that made some of the processes easier.) The interface elements betray the apps roots in Windows, but no one is likely to be bothered by aesthetics in a processing app.
Click on “Open” to open an existing PDF. Once open, you can view thumbnails of each page in a left-hand window. From here, you can select pages for processing, search for specific text (we found this quite helpful when we processed an appellate filing that was over 40 pages long). You can view the pages in the main screen in single- or two-page format.
(Note that you can also create a new PDF. The latter allows you to create a new PDF from an existing document, but this function does not have as many features as Adobe Acrobat Pro. We would defer to that app if we created PDF a lot, but if you are already in Able2Extract and need a quick PDF creation from another document, it is serviceable. If you need to rearrange pages and add new ones, you are out of luck.)
Select the text you want to process. If the document has text flowed around graphics, tables, or text boxes and columns, then you can select an entire area by clicking the “Area” button and dragging over it. If you want to process an entire document, click the “All” icon.
To Microsoft Word. Once your selection is made, click one of the icons under “Convert to File Type.” We first tried a basic document: a simple contract. A few clicks, and the PDF was transformed into an editable, formatted, Word file of the contract. Able2Extract did not miss a thing. Next, we processed a 10-page expert witness report, which contained some graphs and tables scattered throughout the document with text flowed around it. Again, Able2Extract produced a Word document that looked exactly like the PDF. (One of the graphic charts did come out a bit wonky-looking, but it was still in the right place in the document. We suspect it was a problem with the format of the graphic itself.)
We decided to up the game. We had a PDF of a copy of Articles of Incorporation, It had originally been written on a manual typewriter. The original was long-lost, and we had a copy of a copy (how many times, we do not know). Someone had scanned that copy. Obviously, there was no text embedded within. We ran the PDF through a basic OCR program, saved the PDF, then opened it in Able2Extract without any further processing. Again, Able2Extract performed well, and the only problems with the text were the result of the OCR program, to read the letter correctly. (There is a version of Able2Extract that includes an OCR function, Able2Extract Professional, but we did not have this version to test.)
To Microsoft Excel. Though the Word export function performed without a problem for us, the Excel export is what wowed us. How many times have we had to either re-enter a spreadsheet in Excel, or copy and paste the info and then spend hours formatting and moving text into the right cells. We first opened a PDF of a simple, one-page spreadsheet in PDF format. The resulting document was indistinguishable from the original. We then tried a more complex spreadsheet with the same result.
Able2Extract determines the correct positioning of columns and rows. But it also includes options to tailor the columns and rows in the spreadsheet before processing the PDF. This allows you to specify a page range to the table structure, and explained or exclude pages. You can add or delete a table on any page, or have Able2Extract re-do the column structures that stretch across pages. All of this allows you to ensure the spreadsheet comes out as you want it to look in Excel, not as a paged PDF. In our tests, however, we found the most useful function was the one to add columns and rows, erase column lines, and set the type of cell/row column as we wish (text, number, drop-down menu, etc.) This allowed us to format the spreadsheet exactly was we wished it to, without have to open the resulting Excel file and do it by hand there. There are also options for adding and editing headers and footers for the final Excel output.
To Powerpoint. This function is not as fully featured as the previous two. Maybe we were expecting too much that we could turn any PDF into a PowerPoint file. After all, the app would have no way to know what information we want on which slide. However, a PDF that was exported from a PowerPoint file worked almost as well as the previous conversions to Word and Excel. Graphics, text, arrows, boxes—all of it came through as expected. Powerpoint is used in some courtroom and other legal proceedings. If you receive PDFs of those presentations, you could turn them back into a PowerPoint file. We did encounter problems with transitions, and eventually concluded that the transition data was not retained in the PDF, and we were seeing default transitions. Not surprising.
To OpenOffice. These conversation worked just as well as the Word and Excel conversions above—not surprising since OpenOffice is a open source version of those two. If you do not own Word or Excel, this is an excellent option.
To HTML. We are not sure we would ever need this, but can image situations where a legal document might need to be put online, and there might be a preference for an HTML version rather than a downloadable Word file or a browser-viewable PDF. We tested this function using the same documents as above for Word and Excel. Able2Extract performed well—it even placed HTML tables and graphics where they needed to be.
Batch Conversions. This function does just what you might think. We opened 7 PDF documents of various types: a contract, a short ex parte motion, a 25-page trust instrument, several one-page affidavits, and a copy of a court judgment. We clicked “Word” and within a short time we had seven formatted Word documents, all looking almost indistinguishable from the originals.
Other options allow you to edit the PDF, add security, and work across all version of Mac, Windows, and Linux exports. We did not test the other export options (Publisher and AutoCad), as they are unlikely to be used by those in the legal profession. However, we have no reason to believe they would perform any differently tham the above.
Investintech offers a free two-week trial version of the non-OCR version. They offer a full single-user license for $99.95 (USD), or a 30-day single-user license for $34.95. For the OCR version (not tested here), the full price is $129.95 (the 30-day version is the same as the non-OCR version). The 30 days versions might be useful for someone who has a one-time need for such an app. Otherwise, three or for months of use would more than pay for the full version. In our opinion, the price is a bit high. We would expect something around $25 for a 30-day version (and do away with the non-OCR version since they are the same price), and about $65 and $85 respectively for the non-OCR and OCR versions. Still, the program performed admirably, and we are not aware of any apps with the same functions and ease of use (especially the Excel processing!) for the current prices.
If you work with a lot of PDFs, and have a need to translate them (or sections of them) into Word or Excel, we recommend Able2Extract without reservation. A few clicks and you have a formatted document ready to edit. I wish I had this program when I was a beginning law clerk—it would have saved me many hours of formatting!