Gamera Addon: OCR Toolkit

This is a Gamera toolkit for building standard text recognition applications. It is based on the Gamera framework and requires a working Gamera installation.

About the OCR toolkit

The OCR Toolkit is meant to help building optical character recognition (OCR) systems for standard text documents. Even though it can be used as is, it is specifically designed to make individual steps of the recognition system customizable and replacable. It provides:

Documentation

A detailed documentation is included with the source code package in the subdirectory doc/html. A comprehensive overview of design, usage and customization of the OCR toolkit can be found in the paper

C. Dalitz, R. Baston: Optical Character Recognition with the Gamera Framework. In C. Dalitz (Ed.): "Document Image Analysis with the Gamera Framework." Schriftenreihe des Fachbereichs Elektrotechnik und Informatik, Hochschule Niederrhein, vol. 8, pp. 53-65, Shaker Verlag (2009)

Authors and Acknowledgements

The authors of the OCR toolkit are:

Thanks to Jakub Wilk for providing valuable feedback on this toolkit.

Software Download

The source code of the OCR toolkit is freely distributed under the terms of the GNU General Public License. Note that the toolkit requires a working installation of Gamera. Available file releases are:

For release notes, see the file CHANGES. For installation and usage instructions see the file doc/html/index.html in the source package. When all prerequisites are installed, installation simply requires typing

python setup.py build && sudo python setup.py install

On Windows, you can alternatively also use the binary installer (make sure you have also installed Gamera and also download the above source package for documentation!):