
Last week, I found this great mini-article concerning odd motorcyle related inventions in the above Feb 1921 Public Domain issue of “Popular Mechanics” on Google Books. This content is perfect for a project I’m working on…

I’ve had a few questions recently about how I am able to extract the article’s text out of the magazine since Google isn’t providing PDF download links or easy access to the actual text for these Public Domain Magazines ~ I’m about to take you through how I do it step-by-step using the above article as an example.
The particular article I’m working with can be found HERE.
When I’m actually ready to use the article as content somewhere I have to get it into an editable form right? After all, all I’ve really got here is images of the scanned pages – images of words rather than actual text I can copy and paste.
So my options are…
- Type every word of the article into my favorite text editor manually myself (maybe if the article is short enough)
l - Pay my 9 year old daughter to type the article in for me (that bribe rarely ever works)
l - Send the link to someone on eLance and have them re-type the article for me for a measly 10 bucks
l - Source a copy of the original magazine and scan the article myself but many times that may not be an option
l - Use a few free tools to “capture” the text
Let’s talk about option 5 a little bit…
I’m sure there’s probably many ways to do it, but this is how I’ve always gotten the job done…
In a nutshell, what we’re going to do is use OCR (Optical Character Recognition) software to convert chosen sections of our article to editable text (as opposed to an image of text).
The OCR software I will use for this example is called FreeOCR and it can be downloaded for FREE by Clicking Here.
Typically, in order for OCR software to work correctly, your page scans need to be pretty high-res – 300 DPI at minimum.
These page scans available to us through Google Books are (by design) not even close.
But we can compensate for that by enlarging our target text to the point where even a blind man could read it and give our OCR software a fighting chance.
I like to begin by clicking the “Full screen” button as shown in the image below…

Then to enlarge the “text”, just whack the “Zoom in” button as many times as Google will let you – you should be able to blow the page image up by many times it’s original size…

With the result being something like this…

We need to be able to capture our selected bit of “text” in a format in which we can then import into our OCR software.
This can be as easy as pressing the “PrtScn” button on your PC keyboard to capture the screen on your clipboard and then pasting the results into a simple application like Microsoft Paint – crop out the “text” you are interested in and then save the image as a bitmap (.bmp) file.
Next we crank up FreeOCR, and import our bitmap file into the software by clicking on the “Open” button as shown below…

Once our bitmap file is imported, it’s time for FreeOCR to shine.
To convert the bitmap image to editable text, simply click on the “OCR” button as shown below…

And then like magic, editable text will “pop out” on the right hand side as in the image below….

You should find that the conversion occurred with a very high degree of accuracy but remember, it’s a “garbage in, garbage out” scenario – in other words, the lower the quality of the page scan to begin with, the lower your conversion accuracy will be.
However, you should find that using this method results in some very worthwhile converted text – you’ll definitely have to clean up some boo-boo’s but not much (and it won’t take nearly as long as retyping the whole thing manually).
At this point, you can copy the converted text out of FreeOCR and paste into your favorite text editor – repeat this process until you have worked your way through the entire article.
Working through the whole article one paragraph at a time is usually best but you can experiment and see what works best for you.
Would I use this method for an entire book?
Certainly not ~ that would take forever!
But for a magazine article, it works great.
I’d rather have the original magazine and scan the article myself but as I said earlier sometimes that’s not an option due to scarcity.
Another question I’ve been asked is, “What about the copyright notice that appears on every page of these magazines which are clearly in the Public Domain under U.S. Copyright law?”.
If you’ve been with me for any length of time at all, you know exactly how I feel about that! RIGHT HERE & SAVE if not.
Mechanical Reproduction does NOT a new copyright make.
This particular mag is in the Public Domain because it was published in the U.S. before 1923 ~ it has no copyright protection. Slapping an illegitimate “Copyrighted Material” stamp on it does NOT change that.
Google knows this…and so does the publishing company that currently owns “Popular Mechanics”.
This material is indeed Public Domain so feel free to extract the text and pictures and use in any way you see fit. I wouldn’t feel comfortable telling you this if I didn’t know it to be true.
Just be sure to do your “due diligence” and make sure you are only doing this with Public Domain content not legitimately copyrighted content (if you need help telling the difference you need to pick up a copy of The Public Domain Treasure Hunter’s Survival Kit).
If you don’t agree, press ALT + F + X because you’re probably in the wrong place anyway ; )
Until next time,

P.S. – Let’s see if Google actually indexes this post. LOL, what do ya wanna bet it’ll never see the light of day?
About The Author:
Logan Andrew is an online entrepreneur, information publisher, and author that has been using Public Domain material to create profitable products and businesses since 2001. He is also co-author of "The Public Domain Treasure Hunter's Survival Kit" available here. For more info Logan, click here. |
|---|
[print_link]
Logan Andrew is an online entrepreneur, information publisher, and author that has been using Public Domain material to create profitable products and businesses since 2001. He is also co-author of "The Public Domain Treasure Hunter's Survival Kit" available 







{ 9 comments… read them below or add one }
Hi Logan,
just wanted to add that the same trick works with the full version of Adobe Acrobat. If you right-click somewhere in the document and choose “OCR transfer” and then check “full document” chances are that you can convert a whole file in one go. Typically you will need to re-read the text but – boy, I´m a lucky guy, I did this with a few PDFs that were just image after image after image and it worked perfectly. Take this post just for completion´s sake, I also tried your freeware and it works great, too! Best, Juergen.
Juergen ~ hi buddy, good to hear from you, thanks for the comment.
I use the OCR function in Adobe Acrobat Pro as well ~ see “The ESSENTIAL Public Domain Treasure Hunter’s Reference Guide & Profiteering Companion”, page 202 (from your “Public Domain Treasure Hunter’s Survival Kit”).
The Adobe Acrobat OCR capture function works great for extracting the text out of a Google Public Domain book that you have downloaded in PDF format.
The article above is for how to extract the content from these Public Domain magazines when Google doesn’t provide a PDF download link.
If you can’t download it, how are you supposed to convert it using Adobe Acrobat Pro?
You need a way to capture the content directly from the images in your browser window ~ that’s what this post is all about.
I love your tips, keep ‘em coming.
Thanks Juergen!
Logan
Hi Everyone
Here is another resource if you want to get stuff like copy typing and simple things like that at a good rate.
http://www.fiverr.com/categories/writing/pages/1
People there post tasks they will do and everything costs $5. I got some graphics done … and discovered that you get what you pay for. But things like copy typing would be great.
Cheers, Erin
Hi Erin,
Thanks for this, looks like a great resource. Like you said, you get what you pay for but for really mundane tasks (like retyping an article), looks like this would be a huge time-saver and well worth the small price.
Excellent, thank you for sharing this.
I am continually amazed at the creative ways you share on producing content. Thanks for all you do in showing us new ( or are they old?) methods to enhance our internet businesses.
.-= Chaplain Paul Slater´s last blog ..Needing computer security software, we chose ESET Smart Security for network sec =-.
Thanks Chaplain!
It’s a combination of old and new. I hold no bias as long as I know the methods still work ; )
You can go to Free OCR. There, just click on Browse to upload the file, it will do OCR for you. The result is not bad.
One easy way I use all the time is to fire up Dragon Naturally Speaking–I do a split screen with MS Word on one side and the public domain on the other side. Then I just read the pd material into Word. It works great for short pieces like the article you describe here.
You can save an hires PNG image of the googlebooks page you have on the screen by saving the whole page on your hard disk.
Just look into the folder named as you named the page upon saving.