Tesseract is a commercial quality OCR engine originally developed at HP
between 1985 and 1995. In 1995, this engine was among the top 3 evaluated
by UNLV. It was open-sourced by HP and UNLV in 2005.
You will need to get one of the language packs in order to do anything
useful with tesseract, and that language pack tarball should be present
in the same directory as the SlackBuild script when the package is created.
See http://code.google.com/p/tesseract-ocr/downloads/list for a list of
all available language packs. Note that you can install more than one
(or even all) of the language packs, as they do not conflict with each
other. The build script defaults to use English, but this is easily
changed by passing an alternate value on the command line.
Here is the relevant code from the build script:
# Language pack(s) to use
# We'll install English by default, but you can pass another one.
# Edit the LANGNAM variable to switch to another language
# Please use full package name on that variable (including the extension)
# see https://code.google.com/p/tesseract-ocr/downloads/list for the list
This requires: leptonica
Maintained by: Pierre Cazenave
Keywords: optical character recognition,ocr,language,google,scan,tiff,office,document,scanner
(the SlackBuild does not include the source)