My New Favorite Way To Convert PDF’s Into Microsoft Word Docs…
I reviewed the download version of Abbyy PDF Transformer 3.0 last year was not “thrilled with the results. I took a stab at it again after purchasing a physical copy (on eBay for $18) and am happy to say that I have a new fav.
I am always searching for alternatives for converting PDF’s to Microsoft Word documents for editing – especially PDF public domain books as found on Google Book Search and Archive.org (including the original Microsoft scans).
As you know, most of these books present a special challenge because they are page by page scans of the actual public domain texts – no OCR has been performed, so all you’ve actually got is images of the pages themselves, no text that can be copied and pasted.
Even IF the content is available in text – the formatting when you copy/paste into word is downright horrible.
In order to extract the text from the pages, you have to have a piece of software that’s capable of running OCR on a PDF file (and there’s only a small handful of programs in the world capable of doing this). After the OCR has been performed and the page scans have been converted into text, then you’ve got something you can copy and paste into a Word document for editing and subsequent product creation.
Up until now, we had been recommending using either Adobe Acrobat Standard, or Able2Doc Pro to handle the conversion -but after my experiments with Abbyy PDF Transformer 3.0, I’m ready to say that Abbyy is my fav!
Imagine downloading a 300 page public domain Google book, pressing just a few big shiny buttons, and in less than 30 minutes having a fully editable version of the book (including pictures) pop out the other end – with no mess, with the books original layout preserved in it’s entirety, and with the highest OCR success rate in the industry (which is truly amazing considering the low quality of the scans to begin with) – that’s what this thing does!
Yes, you will still have to do some clean up – you’ll have to remove watermarks and other verbiage, clean up some text that the OCR conversion mis-translated (but not much), you may even have to ditch some of the pics if they didn’t come over right but…
This is the closest thing I have ever seen to a perfect machine-based conversion of a PDF book to Microsoft Word, hands down, bar none, period. Nothing is perfect, but this is as close as one can reasonably expect.
This software is going to make my life a LOT easier and if you’re serious about republishing public domain book content for websites or products, it’s absolutely a must have.
You Will Need To Proofread Carefully
If the book has a ton of measurements (like 1/4 cup) you will need to check how well they converted… could be a pain with cook books. 8)
If the book has tables (like charts) the may not covert well.
Look carefully and you can see that this chart did NOT covert correctly.
The entire 500 page document converted the “content” perfectly (except for the charts) and only missed one image. I checked the misspellings and for the most part they were due to the original author – not the way the OCR picked up the words.
I hope this is helpful to those of you who are converting PDF to Word.
Let me know what you use – and why you like it below in the comments.