OCR which actually works

AS
Posted By
Ann_Shelbourne
May 22, 2004
Views
1103
Replies
40
Status
Closed
Faced with a long legal agreement which needed to be edited (and with no available digital original) I suddenly remembered that Acrobat 6 Pro was supposed to be able to extract editable text from a scan.

Amazingly, it can. And with perfect accuracy — which is a lot more than you can say for the so-called OCR programs that I have used in the past.

But what is even more astonishing it was able to do this from 9-point condensed type without getting a single word wrong — and it maintained the formatting.

I scanned to a grayscale Tiff at optical resolution and then sampled-down to 400 ppi (which is the most that Acrobat will handle). Then you just go to "Create PDF from File" and Save As to MS Word Document format.

It seems that you can also "Create PDF from Scan" but I am still operating my UMAX from 9.2.2 so I haven’t been able to test that.

Just a useful trick to know, an one which saved a lousy typist like me a great deal of time.

MacBook Pro 16” Mockups 🔥

– in 4 materials (clay versions included)

– 12 scenes

– 48 MacBook Pro 16″ mockups

– 6000 x 4500 px

R
Ram
May 22, 2004
Ann,

Well, OCR programs can be a huge help, once you get the hang of them and couple them with spelling checkers and other tools. The high-end versions of the CAERE products yield very good results

I’ve been using them very intensively and extensively for a couple of decades, sometimes to scan whole books in a number of languages (eight, to be exact).

What you report is very, very encouraging. I’ll have to look into Acrobat 6 Pro.

Thanks for sharing that.
AS
Ann_Shelbourne
May 22, 2004
I was totally amazed by the results. And it does it so quickly too.
R
Ram
May 22, 2004
Ann,

The Adobe online store is temporarily closed right now. I’ll have to check the price of any possible upgrades to Acrobat 6 Pro tomorrow. Unless it’s pretty attractive, I’ll pass, since I’m happy with the results of my OCR.

BTW, do you know if it works with languages other than English and with scripts other than Roman, specifically Cyrillic?
AS
Ann_Shelbourne
May 22, 2004
I am afraid that I don’t know the answer and I don’t have the knowledge of other languages to test it!
However there is a free Tryout version if you want to give it a whirl: <http://www.adobe.com/products/acrobat/main.html>
L
LRK
May 22, 2004
Ann,

That is so cool. I didn’t know Acrobat would do this. I’ll have to try it the next time I need to extract text from a scan.

Thanks for sharing this!
RS
Ralph_Scherer
May 22, 2004
I just went to download the 30-day tryout version and found it is for Windoze only.

What’s up with that?

R.
AS
Ann_Shelbourne
May 22, 2004
You are right!

This is unbelievable.

I am astounded that Adobe could be so short-sighted. Do they want to sell this product or not?!

This is almost as brilliant as their decision to not print a Manual for what is now an extremely powerful program so that most users never discover its capabilities.
L
Larryr544
May 22, 2004
Does the OCR work for the standard version or just the Pro version?
AS
Ann_Shelbourne
May 22, 2004
Larry, I don’t know — I only have the Pro version, which is invaluable for checking pre-press files for errors and for viewing separations.

It is a program that I wouldn’t want to be without.
Z
Zeb
May 22, 2004
Can I use my stupid question coupon now.
Why would you want to view separations?
MO
Mike_Ornellas
May 22, 2004
because it’s cheaper to do the job right than to do it twice.
T
Todie
May 22, 2004
Zeb, If someone sets a red Illustrator-made logo to overprint and the layout calls for a black background, you waste a proof or (worse) go to plate or (more worse:) print a million brochures and the customer’s logo doesn’t show.

You have to look at the separations.
AS
Ann_Shelbourne
May 22, 2004
Some of you might like to add a pithy comment or two here:

Ann Shelbourne "Ludicrous Marketing Decision" 5/22/04 12:44pm </cgi-bin/webx?14>
B
Buko
May 22, 2004
So you can see if your job is separating correctly.

so you don’t get a phone call from the printer telling you that your job is not separating correctly
MO
Mike_Ornellas
May 22, 2004
why would you want to do that?

Run the job the 3rd time and it’s free!

Ann,

PDF is world wide.

There are more PC’s than Macs.

Acrobat and Distiller, in real commercial environments run on PC’s and Adobe knows that this is were the money is.

Follow the money train…..
B
Buko
May 22, 2004
Once again the Mac side looses out to platformist policies.

Arn’t there laws againt platformism?
T
Todie
May 22, 2004
The PDF reader has more market share on the PC.

Grand Total = $0
MO
Mike_Ornellas
May 22, 2004
smart ass comment = priceless.
AS
Ann_Shelbourne
May 22, 2004
Real World Bottom Line = Loss Leader.

But for how long?
T
Todie
May 22, 2004
Mike, How many Windows users do you know who own AcrobatPro?
R
Ram
May 22, 2004
And the good news is that the Acrobat 6 Standard Version <http://www.adobe.com/products/acrobatstd/overview.html> appears to incorporate the same Scan to PDF capabilities. $99 for the upgrade.
R
Ram
May 22, 2004
Here’s the comparison table <http://www.adobe.com/products/acrobat/matrix.html> of all flavors of Acrobat 6.
AS
Ann_Shelbourne
May 22, 2004
But only Pro has: "Built-in preflighting tools for print production".

That’s the real kicker as far as I am concerned.
R
Ram
May 22, 2004
That’s one feature with which I’m not even remotely concerned myself, but I understand how that can be of paramount importance to designers and prepress pros.

Incidentally, did you notice that one set of features is only available in the Windows version of Acrobat 6 Pro?
MO
Mike_Ornellas
May 22, 2004
Mike, How many Windows users do you know who own AcrobatPro? <

2, shops, many people.

Incidentally, did you notice that one set of features is only available in the Windows version of Acrobat 6 Pro?

yes.
AS
Ann_Shelbourne
May 22, 2004
Yes. But those programs are Window-only programs so you would expect that?

However, you can open AutoCad files in Illustrator.
MO
Mike_Ornellas
May 22, 2004
PDF is not platform specific, it’s file specific.

It’s just that there are many servers, PDF servers running everywhere.
T
Todie
May 22, 2004
I’ve never heard of a Windows print shop (there may be in the financial field). I’ve seen a few PCs in shops that have dozens of Macs.
I assumed that those shops don’t have Acrobat for the PC, since they most certainly have it for the Mac.
MO
Mike_Ornellas
May 23, 2004
Mac is the workstation.

PC’s are the RIP’s

SGI are the servers.
RG
Rene_Garneau
May 26, 2004
OCR capabilities was introduced in Acrobat 4.

Tool > Paper Capture > Capture Page

Use the Index in the help menu of Acrobat to know how to use it.

It will do English US, English UK, Dutch, French, German, Italian, Spanish, Swedish.
R
Ram
May 26, 2004
Rene,

Thank you for pointing that out. I’ll go try it on my Acrobat 5.x. If it works, I have no need for Acrobat 6.
RG
Rene_Garneau
May 26, 2004
You’re welcome.

I’ve used it a couple of years ago (once only) with Acrobat 4 with very good results.
Z
Zeb
Jun 1, 2004
If they are confidential documents, how secure are these OCR’s? Does Adobe keep a copy of them?

Thanks for telling me about separations, I’ll have to read more on this subject, know any good links relevant to a non-printer?
AS
Ann_Shelbourne
Jun 1, 2004
Acrobat doesn’t ask if you were legally entitled to scan a page of type or not; and your scanner mostly doesn’t care either (unless it is one of the newest ones which refuses to scan $50 bills!).

Acrobat just interprets the gray shapes in your scan and turns them back into editable formatted (mostly) text.
Z
Zeb
Jun 1, 2004
Is this editable text secure, such as bank account details, passwords etc?
AS
Ann_Shelbourne
Jun 1, 2004
It is just a PDF.

You can invoke security settings and encryption in Acrobat Pro if you need to.
MC
Murray_Coppold
Jun 2, 2004
What am I doing wrong? I followed Ann’s instructions, did Save As to a Word document, and when I opened the Word document, the scan was there as a picture. I could not select any text.

I’m probably missing something simple, but …
AS
Ann_Shelbourne
Jun 2, 2004
Did you:
Scan and save as a grayscale Tiff (using your Scanner’s optical resolution at 100% size); Open in Photoshop and sample-down to 400 ppi (which is the most that Acrobat will handle);
Launch Acrobat Pro and choose File menu/Create PDF from File; Then go to Document Menu/Paper capture;
Choose the pages you want and hit the "Edit" button; Choose "Formatted text and graphics";
Let Acrobat complete its work then Save As to MS Word Document format. Open the file in MS Word.
———-
Actually, you could have read all of this for yourself in Acrobat’s Help menu/"Converting Scanned pages to Editable Text".

:~Q
MC
Murray_Coppold
Jun 2, 2004
Thanks, Ann
There’s a little more to it than "just go to "Create PDF from File" and Save As to MS Word Document format".
Since other posters seemed equally surprised, I assumed it was an undocumented trick. Thanks for the info.
Murray
AS
Ann_Shelbourne
Jun 2, 2004
A little more to it perhaps.

Sorry if I didn’t spell it out in baby-steps
but did you really expect a totally free-ride?

Requiring a little reading homework isn’t really too much of a burden is it?

:~)

Master Retouching Hair

Learn how to rescue details, remove flyaways, add volume, and enhance the definition of hair in any photo. We break down every tool and technique in Photoshop to get picture-perfect hair, every time.

Related Discussion Topics

Nice and short text about related topics in discussion sections