User Tools

Site Tools


Sidebar

Navigator

scanning_ocr:scanning_journals

<!– uid=38ef530386634042d8f838271aa1371e347f0571 –> <!– time=1327585126 –> <!– ip=86.67.96.72 –> <!– content-type=text/html –> <!– name=An Keqiang –> <!– email=campumoru@gmail.com –> There are several methods to scan journals for the creation of PDFs. Which one you will use depends on the physical characteristics of the journal, and the desired quality of the finished PDF. There are two types of scanners at the Institut d'Asie Orientale and two different digibooks at ENS de Lyon and MOM respectively (see Scanning Equipment).

Brief directions for auto-feed scanning using IAO XXXXXX flatbed Scanner

The most time-efficient option for scanning is to use the Fujitsu ScanPartner because of its auto-feed feature. However, this can only be used if you have a loose-leaf document, or you are able to cut the binding off the journal. If so, proceed in the following way:

  1. Press on the “Send” button. - Select your email address in address directory. - Change resolution to “300 x 300” dpi (or higher if necessary). Define output format as “PDF” unless you want image formay (TIFF). You may also want to change the brightness setting depending on the darkness of the text in your document. - Click “Start” and begin scanning. It may be necessary to place pages one by one into the auto-feed tray to avoid mis-feeds if working with thin or low quality paper. - If scanning a double-sided document, when the front sides finish scanning, flip the document over and place in the autofeed tray. - When the scan is complete your document will be emailed to your computer. Scroll through the document to make sure the pages are all there and in the correct order. - Crop pages if necessary by opening the crop tool from the tool menu, or right clicking a page thumbnail in the Pages sidebar. - Add metadata in the Document Properties (Control+D). Under Description enter “full issue” in the title field and in the subject field enter the name of the journal, volume, number, and date. For example: Shilin, Volume 7, Number 2, June-Oct 1983. Click “OK” - Save the file according to file naming conventions (see below).

Brief directions for scanning using IAO XXXXXX flatbed A4 Scanner

  1. Open Scanning XXXXXXX assistant. - Select “Create PDF,” and inside this window choose the desired scanner (Fujitsu Limited TWAIN Driver), the original document's format (Double-sided or Single-sided), and select adapt compression page to content for “Adobe Acrobat 6.0 or later”. Before clicking “OK” make sure the document, or at least the first page, is loaded into the auto-feed tray face down with top of page loading first. - In the scan configuration window, most of the default settings are fine. Change resolution to “300 x 300” dpi. You may also want to change the brightness setting depending on the darkness of the text in your document. - When the scan is complete your document will appear in a new window. Scroll through the document to make sure the pages are all there and in the correct order. - Crop pages if necessary by opening the crop tool from the tool menu, or right clicking a page thumbnail in the Pages sidebar. - Add metadata in the Document Properties (Control+D). Under Description enter “full issue” in the title field and in the subject field enter the name of the journal, volume, number, and date. For example: Kailash, Volume 7, Number 2, June-Oct 1983. Click “OK” - Save the file according to file naming conventions.

NOTE: Poor quality paper, or very thin paper (such as that commonly used for journals produced in Asia) may not always correctly auto-feed. With these types of papers, if loaded into the tray all at once, the auto-feed has a tendency to take more than one page at a time. To avoid misfeeds, it may be necessary to place the pages one by one or a few at a time into the auto-feed tray. Misfeeds can ruin the scan job (because the pages will coallate incorrectly, which is not easily fixed).

Separating a Journal Issue into Individual Article Files

Once a full issue has been scanned, it needs to be broken down into smaller files containing front matter, articles, back, and any other sections. While working on these steps, be sure to keep your whole issue file intact.

  1. Open the whole issue file you created from scanning. - Click on the “Pages” side tab of your document's window. The Pages sidebar makes it easy to select the pages of the various sections. - Click on the very first page (or cover as the case may be) in the sidebar. This will select that page and mark it as such with a blue highlight ring around the thumbnail. - Scroll down, still in the Pages sidebar, to where the front matter ends. This may include things such as the cover, title page, editorial data, contents, list of illustrations or plates, notes about contributors, and preface or forward. It is generally everything up to the first page of the first article. - Hold down the Shift key and click on the last thumbnail page of this section. This will select and highlight all the pages in the section. - Right click on one of the highlighted pages. A menu of tools will pop up. Select “Extract Pages” from the menu. Another window will open verifying the pages to be extracted. Click “OK.” A new window will open with the extracted pages. - Open the Pages sidebar in this window, and scroll through to make sure all your pages are there. At this point, you can delete any blank pages in the section (I have used the convention of leaving blank pages in the whole document file, but deleting them from the separated files). Just select the blank page, or pages, right click on one, and select “Delete Pages.” A window will appear confirming your deletion. Click “OK.” - Now add metadata to this document. You can select “Document Properties” from the File menu, or just press Control+D, this will bring up the Document Properties window. Select “Description” from the left sidebar. Then fill out the fields for Title, Author, and Subject. The Subject field is used for the jounal title, volume and number of issue, and date. For articles also include page numbers. For example: Bulletin of Tibetology, Volume 3, Number 2, June 1966, pp 8-19. In the Title field enter the title of the article or a description of the section, like “front matter” or “full issue” (for whole issue files). In the Author filed enter the author(s) of the article first name first then last name with multiple authors separated by comma or “and”. For example: John Henry and Polly Ann Henry. Or, James Madison (trans.). - When finished adding Document Properties, click “OK.” - Now save your file using correct file name standards. - Go on to the next section, and repeat the process. Select the first page of the section by clicking on the thumbnail of that page. Then scroll down to the last page of the section and click on it while holding down the Shift key. Right click on a page and select “Extract Pages.” Add the necessary metadata to the Document Properties (Control+D) and then save the file with the correct file name.

Tip: If you fill out the Subject field within Document Properties for the whole issue first, then whenever you extract pages from it, this field will already be filled out in the extracted pages file and you only need to add the relevant page numbers to the Subject field. Another Tip: I find it helpful to leave the front matter file open and put it down in the corner of the screen with the table of contents page showing as I separate the rest of the issue. This is a nice little reference to guide you as you extract articles from the full issue.

Optimize the PDF

Optimizing the PDF in most cases will improve the quality and readability of the scan.

  1. Save the PDF with a different name, by adding “-opt” before the .pdf - Pull down the Document menu and select Optimize - After it finishes optimizing, check the quality against the original PDF and use whichever is better.

File Naming Conventions

Files should be given short descriptive names in the following format:

  • JournalName_VolumeNumber_IssueNumber_ArticleNumber(or a descriptive word)

Use underscores between information. If a journal has a long title, sometimes it helps to abbrieviate it. For example, you have scanned volume 3, number 2 of EJEAS, which has a cover and contents section, several articles, a notes and topics section, a book review, and then the back material. You would name these sections as follows:

  • ejeas_03_02_front * ejeas_03_02_01 * ejeas_03_02_02 * ejeas_03_02_03 * ejeas_03_02_notes * ejeas_03_02_reviews * ejeas_03_02_back

Sometimes a journal, like Shilin 史林 only has volumes, so then just put the volume numer.

  • shilin_02_front * shilin_02_01 * shilin_02_02

Sometimes a journal uses the year or issue number like a volume number and then has numbers for each year. For these put the year intead of the volume number, and then the issue number:

  • JournalName_Year_IssueNumber_ArticleNumber, or * JournalName_Number_ArticleNumber

If an issue spans more than one voume or number, use a hyphen. For example, an issue of Ancient Nepal is designated as numbers 53-56, so name the files:

  • ancient_nepal_53-56_front * ancient_nepal_53-56_01 * ancient_nepal_53-56_02, and so forth

Finally, remember that the scanner is your friend, even when it crumples your document and jams.

scanning_ocr/scanning_journals.txt · Last modified: 2013/04/06 23:14 (external edit)