Following are some answers to common support questions. If you cannot find an answer to your question in these FAQs, please contact us.
Can I download PrimeOCR from the web?
- Yes, please contact your sales rep to gain access to the latest release (maintenance must be up to date). Custom solutions or changes may be delivered as e-mail attachments.
What are the limitations of the evaluation software?
- The evaluation software includes all the features of the released software but is limited in page count and includes an expiration date. Contact your sales rep for additional pages or an extension of the evaluation period. Updates can be made through email.
When I run PrimeOCR it reports that I am out of pages.
- The evaluation software download includes 150 pages to process. Contact your sales rep if you require additional pages for extended reliability testing or for multiple PC installation testing. Updates can be made through email.
How do I update my license?
- Run the softwarekeyread.exe program that is in the \prdev\bin directory and send the softwarekeyread.log file to your sales rep.
I plan on processing foreign language documents – do I need additional modules to process them?
- An additional download is required to install the foreign language option pack. Contact your sales rep for the correct download.
- An additional download is required for recognition of Asian languages (Chinese, Japanese, Korean).
How do I start PrimeOCR after installation?
- There are several sample images included with PrimeOCR’s Job Server.
- Start the Job Server by clicking on the Windows Start menu, select the Prime Recognition program group under the Programs menu, and select the PrimeOCR Job Server.
- Click the “Start” button and watch the Job Server process test image files in both the “Priority Job Directory”, and the “Job Directory.” When all jobs are complete, you can review the OCR results in the “Images” directory. As shipped, the output directory is the same as the input directory, the “Images” directory.
How do I selectively OCR just a portion of my image?
- Manual zoning is done with the PrimeView application. Contact your sales rep if you would like to evaluate PrimeView.
- Start PrimeView by selecting “PrimeView” from the “Prime Recognition” program group.
- Load the image you want to manually zone.
- Select the output format type, then select the portion of the image you would like to OCR by pointing and dragging a zone over that portion of the image.
- Save the zone template, and submit the job to the “Job Server.”
- When the Job Server processes the image it will read the zone coordinates form the template file (.PTM) and will only OCR that portion of the image. Template files are text readable and editable files so PrimeView is not required to generate template files, but makes it easy to graphically select and define zones.
How do I generate Accessible PDF files?
- Under Setup…/Output/PDF/Details – select “Make Accessible”. There are a number of options available with Section 508 accessible PDF content creation. Contact your sales rep for further details.
- For single page image PDF files default settings can be used which write the searchable PDF output to the input directory. The searchable PDF output file will overwrite the image only PDF file.
- For multi-page PDF files, under Setup\Output select “Save output to a different directory” and select a path. PrimeOCR will read image only PDF files from the input directory and output them to an output directory.
- If you want the input files to be deleted when processing is complete then under Setup\Input\More Settings select “Erase image files after processing”.
- Install PrimeOCR onto each of the four PCs.
- Mount the repository file server directory to each PrimeOCR server as the same mapped drive (z:\images) on each PC.
- Create a job to process the images with a defined template.
- Save the job file into the job directory then copy that job to each job directory of the four PCs.
- Under Setup\Output set “Skip files with existing output” to “Yes”. Select Ok to exit Setup and save changes.
- Press the START button on each of the four Job Servers. Observe that each OCR server processes the next available file in the repository.
- Yes. Under Setup\OCR Engine\More Settings … in the “Pre OCR String Option 1” type in: “file_output2,13”.
- PrimeOCR will output an additional PDF file during processing with this string option.
- Using the wizard create a job for each directory of images you want to process. Each job file is a simple text file that includes two lines of key information – the first is a path to the images to process and the second a path to the template to be used during processing.
- PrimeOCR will read the first job, process the images then read the images from the second job and so forth until completion.
- Images can be stored either on the local PC or on a remote drive.
- If the images are stored on a remote drive then mount the drive to a drive letter prior to creating a job (z:\images).
Prime Recognition Job File
- As an alternative to mounted drives a UNC path to the images can also be referenced in the job file. A simple text editor (notepad.exe) can be used to modify the simple job file format and insert the UNC path to the images and template. For example:
Prime Recognition Job File
- Under Setup\Input set “Once OCR of job is complete:” to “Do not Erase Job File”.
- Under Setup\Input set “Once all jobs are complete:” to “Continuously poll for next job”.
- Under Setup\Output set “Skip files with existing output?” to “Yes”. Select Ok to exit Setup and save changes.
- With these settings the PrimeOCR Job Server will process images that are inserted into the watched directories. Once output exists for a file PrimeOCR will not re-process the file.
- Under Setup\OCR Engine\deselect both logging settings.
- Logging is useful when first setting up the PrimeOCR Job Server but slows processing during production.
- Errors that occur during processing will still be recorded in the PrimeOCR log even though both logging settings are disabled.
- Under Setup\OCR Engine\More Settings\Variable Processing On then Configure, select “Low quality images are processed quickly” and “High quality images are processed more quickly” See Setup help notes and review user’s manual before using this setting in production.
- Under Setup\OCR Engine\More Settings\# of CPUs. The default setting should be set to Auto. Auto will auto-sense how many CPUs are on the PC – including hyperthreaded CPUs and will compare that with how many CPUs are licensed for processing.
- PrimeOCR has three functions for rotating images. All three can be defined in a template using the first screen of the wizard.
- To rotate each image with a fixed rotation (90, 180 or 270) prior to OCR – perhaps all pages were scanned with a landscape orientation – use the rotate function. Do not use the rotate feature if using auto-rotate or strong auto-rotate since the rotation will occur after auto-rotation takes place.
- If the scanned documents include a mix of orientations then use the auto-rotate function. The auto-rotate function is a fast algorithm that attempts to find the correct orientation of the page. It may be useful for many projects but may not provide the accuracy required for all projects.
- If you find that auto-rotate is not accurate enough for your documents then also use strong auto-rotate. Strong auto-rotate is the most accurate solution for finding the correct orientation documents but it can contribute to longer processing times. It should be used in conjunction with auto-rotate when scanned documents have mixed orientations.
- Customers that usually have a mix of high priority and low priority jobs enable both the primary and the low priority job directories. A typical scenario would be that you have a job that can be processed in the background (low priority) and when a job comes up that needs to be completed sooner it can be placed in the Primary (high priority) Job Directory.
- Most customers just use the Primary Directory and process their jobs sequentially. Other customers, that usually manage several different kinds of conversion projects, will use both job directories to manage their work through PrimeOCR.
- The Primary Job Directory is always enabled. The use of the Low Priority Job Directory can be enabled as an option.
- The PrimeOCR Job Server will look for jobs in the primary job directory first. If a job is not found in the Primary Job Directory or the job has been completed then the PrimeOCR Job Server will look for jobs in the Low Priority Job Directory if the “Enable low priority job directory” checkbox is checked.
- The PrimeOCR Job Server will process a set number of images in a low priority job before returning to check for new jobs in the Primary Job Directory. Under Setup\Input set the “Number of low priority images before poll” to modify the number of images to process before changing over to the high priority directory.
- Under Setup\Input set the “Number of seconds before poll” to the number of seconds that should pass before the PrimeOCR Job Server reads the Primary Job Directory for new jobs.
- Under Setup\Output\More Settings\Change PDF defaults\Details
- Under Setup\OCR Engine\More Settings\Change confidence log attributes\Configure\Save in output directory.
- Lexical check is basic lexical processing within each internal voting OCR engine and lexical plus is a post OCR function that acts as an advanced spell checker.
- Lexical check is functionality that exists within each internal voting OCR engine. Each internal voting OCR engine includes some level of lexical review to see how recognized characters fit into word context. The result of the internal lexical check may improve recognition results.
- Lexical plus is a powerful separate software module that analyzes the OCR results once OCR has been completed by all of the voting OCR engines.
- Lexical plus can auto-correct words that have characters that are not correct (for example: changing misissipi to mississippi) provided most of the characters have high confidence.
- Lexical plus is most useful on documents that contain English text. It does not have any capability to correct numeric data or non-English language words.
- There are number of advanced settings that can be adjusted for lexical plus.
- Lexical plus can also be used to reduce the number of characters required to be verified by 60-90%.
- Yes. Define a template using at least one image enhancement function (first screen of wizard – deskew) then on the next screen of the wizard select to save the processed image either as its same name or as a .fix file.
- Under Setup\Input\More Settings\select “Erase image files after processing”.
- The incoming image will then be written to the output directory and the input image will be deleted during processing.
- For RTF output go to Setup\Output\More Settings\Change RTF defaults\Details\select Wrapped paragraphs.
- For ASCII output go to Setup\OCR Engine\More Settings\Pre Recognition string\type in “ASCII_DEFAULTS, 1”
- Under Setup\Output select “Save output to a different directory”.
- You can then select the option to preserve subdirectories structure if you are processing multi- level directories and you want to retain the image directory structure.
Does your software require a hardware key?
- PrimeOCR can now be managed with either a software key or USB hardware key. At the time of purchase you can specify which method you want to use for managing the license.
What are the advantages of using a hardware key versus a software license?
- Software licenses are issued for a specific PC. A hardware key can be moved from one PC to another. The software will run if the hardware key is plugged into the computer.
- If the PC ever fails then a software license must be re-issued by a multi-step process with our licensing personnel. A hardware key on other hand can be un-plugged from the failed PC and plugged into another PC and OCR continues without interruption.
- A software license enables you to get started immediately. The evaluation software is managed through a software license and can be used until a hardware key is shipped to you.
How do you license the software – on a per seat basis?
- PrimeOCR is licensed on a per PC basis.
- Customers usually license a copy of PrimeOCR and use it as a centralized OCR server that runs on a dedicated, fast PC. Any number of users that have access to the server on the network can submit jobs to the OCR server over the network.
- If a single license is not enough capacity then additional licenses can be purchased to increase the overall OCR capacity.
- PrimeView and PrimeVerify are separate applications from PrimeOCR and are licensed on a per PC basis but unlike PrimeOCR these applications can be used by a single user so the license for a copy of PrimeView and PrimeVerify is essentially on a per seat basis.
- Contact sales if you have additional questions regarding licensing.
Do you offer a site license?
- A site license is not currently available.
My original purchase did not include PDF output. How easy it to upgrade?
- Since all products are shipped with each release, upgrading is simply a question of paying for the new option and then upgrading the license through and email update. Once payment has been made we will send you a small program that will update your license to enable the newly purchased feature.
How can I get the latest release?
- Contact your sales rep for access to the latest release of the software.
- To receive the latest release of the software all items licensed must be currently on maintenance.
- Minor upgrades and maintenance releases are free while major upgrades may be available at a discount.
Does PrimeOCR perform handprint recognition?
- PrimeOCR is designed to read machine-printed characters. For handwriting recognition, you should try searching the web for ICR or Handprint vendors.
What is the best scanning resolution for OCR?
- Most OCR engines, including the ones used by PrimeOCR, are optimized for 300 dpi images. Scanning at true 300 dpi optical resolution is very important. Scanning at a lower resolution and then using scanner software to increase the dpi later on does nothing for OCR. In cases where the font size of characters on an image are very small ( point size of 4 or less), scanning images in at 400 dpi can improve character recognition. This again would require a scanner that supports true 400 dpi optical resolution.
What is the difference between Forms-based OCR and Full-Text OCR?
- We are all familiar with standard paper forms. A typical form has a structured page layout that contains both static and variable information. If the variable information on the form has been filled in using machine printed characters, the form is a candidate for Forms-based OCR. If each page you want to OCR always has the same Form (i.e., the layout of text on the every page is the same), you can create a zone “template” that OCR can use to extract the data you are looking for. Full-Text OCR just means that you intend to OCR the entire page, without prior zoning. In affect, the entire page is treated as a single zone. There are cases, however, when zoning is valuable even in a full-text environment (see below).
Why is Forms-based OCR difficult?
- The complexity of Forms OCR is always being able to match up the zones in your template with the correct data on the page. Scanner feed problems, image stretch and skewing or even slight variations on the page layout of each form can cause “zone to data” misalignment. Techniques such as Forms ID, Registration, and Image Enhancement are all methods of addressing these problems.
- Prime Recognition provides these techniques in its PrimeZone application on a custom basis. Manual zoning and template creation on a page-by-page basis is also available through the PrimeView application within our PrimeProof software. And Image Enhancement, such as image deskew and despeckling, is a standard option in PrimeOCR.
Why do I need to zone multi-column text before OCR?
- Pages with multiple columns are a common entity. You find them in newspapers, books, trade journals and reports, to name a few. It is important to identify the columns through zoning if, after OCR, you intend to search on the data (e.g., using a fuzzy search engine on the data after it has been stored in a database) or if you need to preserve the look and feel of the original page. If you perform a search without separating the columns, hyphenated words that wrap to multiple lines won’t be found. Similarly, without column separation, 2 columns of text on the same line will appear as a single line in a word processor.
How do I zone multi-column text?
- If the text layout of the page is always the same, you can treat the page as if it were a form and perform Forms-based OCR (see above). A good example of this would be a book where each page always has the same number of columns.
- If the page layout varies, as with a newspaper or trade journal, then you have several other choices:
- Manual Zoning – This process involves viewing the image prior to OCR and drawing zones over the areas that you want to read. Prime Recognition’s PrimeProof software includes an application that allows you to view images and draw zones using the mouse. The advantage of manual zoning is that you can specify exactly what to OCR and in what order. The disadvantage is that each page must be zoned individually by hand.
- Automatic Zoning – PrimeOCR includes an zoning option that tries to automatically recognize blocks of text such as a paragraph or column. A zone is generated for each block similar to the manual zoning process. However, since there is no way for the automatic process to determine which sequence the text flows on the page, OCR results may not always be presented in the proper reading order. And automatic zoning will never be as good as someone defining zones manually. But in situations where the image presents a clear delineation between columns and manual zoning is not economical, automatic zoning can provide a major improvement over full-page OCR on the search ability of recognized text