How To Scan Public Domain Books – Tips From A Professional Book Scanner

[The following is a special guest article from Timo of BookScanning.com:]

Now that you’ve found a copyright-free article or Public Domain book, how do you get it into your computer for editing?

Well, you have two choices…

Your first choice is to type the whole article or book line by line and page by page until you’re finished. It sounds like a lot of hard work, but its one option you can use when you have just one or two articles as the basis for your content. More than a couple articles will set you up for some long, tedious work.

The second and better way is to scan the content into your computer…

You will need a scanner to bring the article or book into your computer. The cheapest scanner models are flatbed scanners, which have USB port access.  These scanners transfer the data to your computer and the USB connection serves as a power source for the scanner, so that no power cord is needed and it plugs right into your computer or laptop.

The price of scanners ranges between 50-150 $ depending on what kind of scanner you have in mind and also what comes with the scanner (accessories, software etc.). 

You can buy a good flatbed scanner online at Amazon.com or TigerDirect.com, but you can also find them at nearly any electronics store such as Best Buy or Circuit City.

How to scan:

Simply put the page with the content on the scanner and you can start scanning; and depending on your scanner’s speed you can have that page on your computer within minutes or even shorter.

Now that you have the article as an image on your computer, the second step is to transform the page with the content into an editable format.  This will let you change and update the document easily in your computer’s word processing program such as Microsoft Word, Corel Wordperfect or Open Office.

But that’s not all… 

Once you get the document scanned, it’s still a photo. 

To make it editable, you will need a program that translates the scanned image of the page into words. This kind of program is called an OCR Program. The term OCR stands for Optical Character Recognition.

Normally a kind of OCR Program should come with every scanner you buy. This kind of OCR Program is usually a downgraded version of their flagship OCR software, but it will do the same basic functions like the expensive software to create an editable file.

If your scanner didn’t come with an OCR program, there is a free OCR software program available called FreeOCR. You can download and use it for free.

The download address for this program is:

http://www.paperfile.net/

This program has a good recognition rate, depending on the quality of the print and scanning. The advantage of this program is that it is free and simple to use.

The steps to scan with this program are easy and straightforward. You simply scan the page or the selected content you are interested in, save it as an image, start the OCR software, load the image and select in the program the OCR function.

Also with FreeOCR you can scan directly into the program and you can do the OCR process as the next step. This helps save an extra step and get your document pages immediately “translated” into editable characters.  From FreeOCR, you can save the document in a number of common word processing formats.

Now, even though using an OCR program like FreeOCR is simple and direct – it’s not without its share of problems.

What are the disadvantages?

  1. You have to correct the content. OCR programs not always get the translation right and misinterpret letters and numbers. For instance an OCR program can read the letter S as the number 5 or the number 0 as the letter O.
  2. Cheaper programs don’t always follows the flow of the page, that means paragraphs are different than in the original content, pictures are maybe under or beside the text instead of on its original place in the content.

When should you consider using scanning service instead of doing it by yourself?

Normally, when you have just a little amount of content to scan, like one or two articles, there is no need to use a scanning service. You’d likely be faster just doing it yourself rather than sending the articles, waiting for them to be scanned an then getting them sent back.

If you have one or two books or more, and you don’t want to go through the long process of scanning them yourself, it’s worth using a scanning service.

In addition, when you do it yourself, you may have to rip the pages from the binding of the book – and many of these free Public Domain articles and content are much more valuable when they’re intact. If you’d rather not destroy the book, you should definitely consider a scanning service as well. 

The reason for this is because of the method used by scanning a book on a flatbed-scanner and a scanning service. Scanning the book on a flatbed-scanner is done by pressing the books spine and the pages on the glass plate of the scanner. Depending on the age of the book the spine already can be weak and if you are using too much force, the spine can break. Scanning services should have a special cradle for the book, designed directly for scanning services to ensure an indestructible scanning of the book.

What do you have to keep in mind when using a scanning service?

  • Ask if they do also the OCR-Process. Some services don’t include the OCR Process at all
  • If the OCR-Process is offered, ask how much it is. Some services offer it as an included service, other charge an extra fee for it.
  • If you want to keep the book intact, ask if they rip the book apart in order to scan the book. This would allow them to scan the book single pages of the book in feeding machines. It’s faster for them but bad for your book.
  • Ask, if they also clean the text files from OCR mistakes. If not you would have to correct the wrong interpreted characters in the file before you can edit the file itself.

In conclusion, to scan content is a perfect way to get things done fast.

Getting the content into your computer is done in three easy steps:

Scanning, Recognition and Editing.

Doing it by yourself can save you money, using a scanning service can saves you a lot of time and trouble.

For any questions regarding the scanning of content or books, you can reach the author of this article on the web at http://www.bookscanning.com/ or via email at timo@bookscanning.com .

[print_link]



Recommended Reading:

Related Posts Plugin for WordPress, Blogger...

{ 6 comments… read them below or add one }

Jim July 19, 2010 at 6:00 am

Since scanning is done by placing the physical product onto the scanner how do you scan an item when it is in the public domain and not in a physical form until it is downloaded from where ever you find it?

Logan July 19, 2010 at 6:01 am

Hi Jim,

I had to read this question a few times but I think I understand what you are asking.

Basically, when it comes to Public Domain content, your objective is to be able to use that content to create a product right? So you need a way to convert the content into an editable format. In other words, you need to be able to get it into your computer so that you can actually do something with it.

There’s two sources of Public Domain content available to you (we’ll keep this conversation limited to Public Domain books for now)…

1) Public Domain books in a physical form – the only to get the content from a physical copy of a book is to have it scanned. Once you have it scanned you’ve got a digital version of the book. You then have something you can edit and convert into a new product. There are millions upon millions of Public Domain books that haven’t been scanned and placed online yet.

2) Public Domain books in digital form – these are books that have already been scanned and placed online for download (which only represents just a very, very small fraction of what’s actually available in the Public Domain). The benefit of these digital versions of course is that much of the work has already been done for you in terms of shifting the content into a digital state. The downside is that everybody and their brother has easy access to the same books. That doesn’t mean these books are worthless, you just have to keep that in mind when considering what to do with the content in terms of product creation.

I hope I’ve answered your question.

I got confused upon first reading it because of this part ~ “how do you scan an item when it is in the public domain and not in a physical form until it is downloaded from where ever you find it”.

If an item is downloadable then by it’s nature, it’s already in a digital state. There’s no need to scan it again. The point of scanning is simply to be able to convert a physical book into digital, editable content so that you can actually use it in some fashion.

Thanks Jim!

Logan

Robb Richardson July 28, 2010 at 10:32 am

We have recently introduced the first high resolution public walk up book scanner for pubilc libraries – the “book2net Spirit”.

Introduced at the ALA in Washington in June. Production units will start delivering in August 2010 – Under $9k or $200/mth.

Look for them at a Library near you.

Robb Richardson
rrichardson@ristech.ca

Logan July 28, 2010 at 12:57 pm

Thanks Mr. Richardson for sharing this news, it’s very exciting. I sooooo want one of these for Christmas!

J Arthur Davis February 1, 2011 at 2:10 pm

If the above price is too much for a scanner take a look at your state surplus store. Here in Pennsylvania we have this kind of store. As items are updated the old ones are sold at the surplus store. You can find good scanners in the 5 to 25 dollar range. I picked up two Dell Optiplex 745 computers for $50.00 each. These are top of the line computers that are 5 years old at best. All I did was add more memory and I now have a screaming machine for less than $100.00.
Check around, you may have to make some calls, but the investment in time will more than pay for itself.

Good Shopping!

andy baird September 20, 2011 at 5:42 am

Hi Debra, I too have been using FreeOCR and have noticed that sometimes it produces a lot of gibberish which makes the OCR’d text difficult to read. I’ve realised that it’s trying to read every mark and blemish on the page and trying to convert it to text, so i’ve taken to using the Crop Image button on the left hand side.(It’s a little dotted square) If you use this you can highlight just the words on the scanned page, it gives a much cleaner, clearer conversion , and there’s a LOT less gibberish to wade through and correct!

Leave a Comment

CommentLuv badge

{ 1 trackback }

Previous post:

Next post: