need help scanning documents

T
Posted By
thedarkman
Jun 1, 2007
Views
330
Replies
9
Status
Closed
I’m engaged in a long term project scanning and annotating an archive. It contains hundreds of photographs and thousands of documents, the latter mostly A4 but including a lot of newspaper articles.

The press reports no problem by and large; if they take up a few columns the size doesn’t matter, but I’d like the A4s to come out more or less as seen. When I scan the photos I use 96dpi and they are okay, ditto the small reports but scanning a document at that resolution leaves the result rather poor quality, and increasing the resolution makes them come out BIG.

Any help regarding getting them coming out as seen greatly appreciated.

I’m using exclusively jpgs but if pdfs are the way forward I’d do that albeit reluctantly.

Thanks

Must-have mockup pack for every graphic designer 🔥🔥🔥

Easy-to-use drag-n-drop Photoshop scene creator with more than 2800 items.

RP
Richard Polhill
Jun 1, 2007
thedarkman wrote:
I’m engaged in a long term project scanning and annotating an archive. It contains hundreds of photographs and thousands of documents, the latter mostly A4 but including a lot of newspaper articles.
The press reports no problem by and large; if they take up a few columns the size doesn’t matter, but I’d like the A4s to come out more or less as seen. When I scan the photos I use 96dpi and they are okay, ditto the small reports but scanning a document at that resolution leaves the result rather poor quality, and increasing the resolution makes them come out BIG.

Any help regarding getting them coming out as seen greatly appreciated.

I’m using exclusively jpgs but if pdfs are the way forward I’d do that albeit reluctantly.

Thanks
Are you scanning them in greyscale or lineart mode? If you’re scanning for the screen then 96dpi is probably about right as your monitor is usually around that. If you’re printing you’ll need to scan at something much higher, say 300dpi (a typical resolution for a laser printer) which will then print at the same size as the original, if printed at 300dpi. You’ll have to have a resized /resampled version to view on screen as 10 inches at 300dpi will take up 31.25 inches on a 96dpi display.

You have to accept the fact that the screen is a different medium to print. Scan at a quality at least as good as your ultimate target.
JB
John Boy
Jun 1, 2007
thedarkman wrote:
I’m engaged in a long term project scanning and annotating an archive. It contains hundreds of photographs and thousands of documents, the latter mostly A4 but including a lot of newspaper articles.
[…]
Thanks

See CS2’s "File->Automate->Image Processing" option. It’s super! You can rip out different size/format files all at once. It creates the folders necessary. Highly recommended. You can use it’s check-box features and add more batch processing besides.

I had a similar project to do – 3,100 scans of monochrome images (B&W) for a historical effort. Here’s how we worked: first, every print was scanned at what would be 360ppi to TIF files in B&W if the prints had not stained, otherwise in color. (Stained B&W prints are more easily fixed in color, then saved as monochrome.) Big external drives are cheap enough to do that. That was the tedious part.

Then we batched them all at once to small (500pixel on the long side) JPG files, high quality (8 on a scale of 1-10) for quick review on screen. The "Automate->Fit Image" option is very good for that.

When the review is finished, and the finals are selected, I will go back and do the color/stain corrections, and so-forth, resaving in TIF for archiving and JPEG for web use.
T
Tacit
Jun 3, 2007
In article ,
thedarkman wrote:

I’m engaged in a long term project scanning and annotating an archive. It contains hundreds of photographs and thousands of documents, the latter mostly A4 but including a lot of newspaper articles.
[snip]
I’m using exclusively jpgs but if pdfs are the way forward I’d do that albeit reluctantly.

You’re already walking down the wrong path. If this is for archival purposes, storage is cheap. Don’t worry about big files; storage is cheap and will only get cheaper.

Never use JPEG for archival projects. JPG uses "lossy" compression; it degrades the quality of your image. You want to avoid this degradation in an archive.


Photography, kink, polyamory, shareware, and more: all at http://www.xeromag.com/franklin.html
BW
Barry Watzman
Jun 3, 2007
One more comment on something that I overlooked earlier.

You say "’m using exclusively jpgs but if pdfs are the way forward I’d do that albeit reluctantly"

There is no issue of JPEGs (JPGs) vs. PDFs.

When you save a file in PDF format, it’s an Adobe Acrobat file for viewing (and this IS what I’d recommend), but the document is stored INTERNALLY within the PDF in some other graphics format. This can be either JPG or TIFF (or any of many other formats), and if there is a reason to do so, the individual pages of the document within the PDF file can be "exported" out of the PDF file back to their native file format (or to other formats which Acrobat supports for export, e.g. you can export a TIFF file even if the internal format is JPEG). Effectively, the PDF file becomes a "wrapper" for the graphics formats of the individual pages.

That said, for a lot of reasons, JPEG is the most commonly used format for internal storage. And in my view (I know that many will disagree), JPEG is fine if you don’t compress excessively.

thedarkman wrote:
I’m engaged in a long term project scanning and annotating an archive. It contains hundreds of photographs and thousands of documents, the latter mostly A4 but including a lot of newspaper articles.
The press reports no problem by and large; if they take up a few columns the size doesn’t matter, but I’d like the A4s to come out more or less as seen. When I scan the photos I use 96dpi and they are okay, ditto the small reports but scanning a document at that resolution leaves the result rather poor quality, and increasing the resolution makes them come out BIG.

Any help regarding getting them coming out as seen greatly appreciated.

I’m using exclusively jpgs but if pdfs are the way forward I’d do that albeit reluctantly.

Thanks
T
thedarkman
Jun 15, 2007
http://www.sendspace.com/file/g5r9wz

Hi All,

I posted here recently; as I said, I’m working on a major archive which involves a lot of scanning but I’m having trouble especially with documents. I received some suggestions, which were helpful. One guy said not to scan in jpg format. I gave that some consideration but decided to use them.

When I’ve scanned A4 documents before I’ve had some problems but the documents on two of my sites

http://www.ismichaelstoneguilty.org/

and http://www.geocities.com/satpalramisguilty/

have come out very well.

I’ve just uploaded the following files in the above archive to SendSpace

m_s_daley_statement_1.jpg
jessie-wey-valley-chess-grading-list-page-1.jpg
jessie-house-of-commons-1.jpg
jessie-lloyds-bank-1996-1.jpg
jessie-surrey-girls-chess-league-january-1996-page-1.jpg

the file m_s_daley_statement_1.jpg displays perfectly on a website, ie when it is linked from an html file. I’d like the others to look the same way. Some of the photos here are of a high resolution. They also need lightening but I’m most concerned about the documents, I want them to display as near perfect A4 reads.

Any help appreciated.
BW
Barry Watzman
Jun 16, 2007
In my opinion, the best way to do this is with Adobe Acrobat as PDF files; the internal format (which you can export to any desired graphics format) is likely to be JPEG (there may be a way to change that, but I don’t know what it is if so). Unless there is very, very fine print and detail, scan at 300 dpi and 256-bit grayscale (I am presuming that these are B&W documents, obviously if there are color documents, that changes things). I’ve done tens of thousands of pages, they are indistinguishable from the originals on screen, and on paper if printed. Acrobat uses your scanner software to do the actual scanning, in my case it’s HP PrecisionScan Pro, and I go to a lot of effort to get the exposure controls optimized for each document (time consuming but it assures a perfect output). Acrobat can properly scan and interleave both sides of a double sided document even when the scanner has a non-duplexing (single sided) document feeder (I am using an HP 5490C).

thedarkman wrote:
http://www.sendspace.com/file/g5r9wz

Hi All,

I posted here recently; as I said, I’m working on a major archive which involves a lot of scanning but I’m having trouble especially with documents. I received some suggestions, which were helpful. One guy said not to scan in jpg format. I gave that some consideration but decided to use them.

When I’ve scanned A4 documents before I’ve had some problems but the documents on two of my sites

http://www.ismichaelstoneguilty.org/

and http://www.geocities.com/satpalramisguilty/

have come out very well.

I’ve just uploaded the following files in the above archive to SendSpace

m_s_daley_statement_1.jpg
jessie-wey-valley-chess-grading-list-page-1.jpg
jessie-house-of-commons-1.jpg
jessie-lloyds-bank-1996-1.jpg
jessie-surrey-girls-chess-league-january-1996-page-1.jpg
the file m_s_daley_statement_1.jpg displays perfectly on a website, ie when it is linked from an html file. I’d like the others to look the same way. Some of the photos here are of a high resolution. They also need lightening but I’m most concerned about the documents, I want them to display as near perfect A4 reads.

Any help appreciated.

BW
Barry Watzman
Jun 16, 2007
Well, I found the source of the problem, although I can’t believe how simple/stupid it is.

There is nothing wrong with the scanner, but the scanner assumes that the image at the end of the film strip starts more or less AT the very edge of the stip (EXACTLY at the edge). I was working with a strip from the end of the film, it had an extra 4 to 8 mm of blank "film" beyond the edge of the image, and that threw off every image on that strip by 4 to 8 mm (quite a bit). Trimming the film to way under 1mm within the edge of the image fixed the issue.

Barry Watzman wrote:
In my opinion, the best way to do this is with Adobe Acrobat as PDF files; the internal format (which you can export to any desired graphics format) is likely to be JPEG (there may be a way to change that, but I don’t know what it is if so). Unless there is very, very fine print and detail, scan at 300 dpi and 256-bit grayscale (I am presuming that these are B&W documents, obviously if there are color documents, that changes things). I’ve done tens of thousands of pages, they are indistinguishable from the originals on screen, and on paper if printed. Acrobat uses your scanner software to do the actual scanning, in my case it’s HP PrecisionScan Pro, and I go to a lot of effort to get the exposure controls optimized for each document (time consuming but it assures a perfect output). Acrobat can properly scan and interleave both sides of a double sided document even when the scanner has a non-duplexing (single sided) document feeder (I am using an HP 5490C).

thedarkman wrote:
http://www.sendspace.com/file/g5r9wz

Hi All,

I posted here recently; as I said, I’m working on a major archive which involves a lot of scanning but I’m having trouble especially with documents. I received some suggestions, which were helpful. One guy said not to scan in jpg format. I gave that some consideration but decided to use them.

When I’ve scanned A4 documents before I’ve had some problems but the documents on two of my sites

http://www.ismichaelstoneguilty.org/

and http://www.geocities.com/satpalramisguilty/

have come out very well.

I’ve just uploaded the following files in the above archive to SendSpace

m_s_daley_statement_1.jpg
jessie-wey-valley-chess-grading-list-page-1.jpg
jessie-house-of-commons-1.jpg
jessie-lloyds-bank-1996-1.jpg
jessie-surrey-girls-chess-league-january-1996-page-1.jpg
the file m_s_daley_statement_1.jpg displays perfectly on a website, ie when it is linked from an html file. I’d like the others to look the same way. Some of the photos here are of a high resolution. They also need lightening but I’m most concerned about the documents, I want them to display as near perfect A4 reads.

Any help appreciated.

T
thedarkman
Jun 16, 2007
On 16 Jun, 02:17, Barry Watzman wrote:
In my opinion, the best way to do this is with Adobe Acrobat as PDF files; the internal format (which you can export to any desired graphics format) is likely to be JPEG (there may be a way to change that, but I don’t know what it is if so). Unless there is very, very fine print and detail, scan at 300 dpi and 256-bit grayscale (I am presuming that these are B&W documents, obviously if there are color documents, that changes things). I’ve done tens of thousands of pages, they are indistinguishable from the originals on screen, and on paper if printed. Acrobat uses your scanner software to do the actual scanning, in my case it’s HP PrecisionScan Pro, and I go to a lot of effort to get the exposure controls optimized for each document (time consuming but it assures a perfect output). Acrobat can properly scan and interleave both sides of a double sided document even when the scanner has a non-duplexing (single sided) document feeder (I am using an HP 5490C).
Hi,

most are colour documents A4 size but I always scan in colour regardless. I have no trouble with small pieces, ie newspaper articles but scanning an entire A4 document causes problems.
BW
Barry Watzman
Jun 16, 2007
I would advise against scanning B&W documents in color. The file is 3x larger, scanning takes longer, and there is actually some loss of quality relative to an original non-color document.

thedarkman wrote:
On 16 Jun, 02:17, Barry Watzman wrote:
In my opinion, the best way to do this is with Adobe Acrobat as PDF files; the internal format (which you can export to any desired graphics format) is likely to be JPEG (there may be a way to change that, but I don’t know what it is if so). Unless there is very, very fine print and detail, scan at 300 dpi and 256-bit grayscale (I am presuming that these are B&W documents, obviously if there are color documents, that changes things). I’ve done tens of thousands of pages, they are indistinguishable from the originals on screen, and on paper if printed. Acrobat uses your scanner software to do the actual scanning, in my case it’s HP PrecisionScan Pro, and I go to a lot of effort to get the exposure controls optimized for each document (time consuming but it assures a perfect output). Acrobat can properly scan and interleave both sides of a double sided document even when the scanner has a non-duplexing (single sided) document feeder (I am using an HP 5490C).
Hi,

most are colour documents A4 size but I always scan in colour regardless. I have no trouble with small pieces, ie newspaper articles but scanning an entire A4 document causes problems.

How to Improve Photoshop Performance

Learn how to optimize Photoshop for maximum speed, troubleshoot common issues, and keep your projects organized so that you can work faster than ever before!

Related Discussion Topics

Nice and short text about related topics in discussion sections