Gamera Addon: OCR Toolkit

This is a Gamera toolkit for building standard text recognition applications. It is based on the Gamera framework and requires a working Gamera installation.

This toolkit already has been ported to Gamera 4 and Python 3.x.

About the OCR toolkit

The OCR Toolkit is meant to help building optical character recognition (OCR) systems for standard text documents. Even though it can be used as is, it is specifically designed to make individual steps of the recognition system customizable and replacable. It provides:

Documentation

A detailed documentation is included with the source code package in the subdirectory doc/html. A comprehensive overview of design, usage and customization of the OCR toolkit can be found in the paper

C. Dalitz, R. Baston: Optical Character Recognition with the Gamera Framework. In C. Dalitz (Ed.): "Document Image Analysis with the Gamera Framework." Schriftenreihe des Fachbereichs Elektrotechnik und Informatik, Hochschule Niederrhein, vol. 8, pp. 53-65, Shaker Verlag (2009)

For testing purposes, we provide a basic demo package ocr-sample.tgz, which includes several test images printed in Fraktur and corresponding training data. See the file README for usage examples.

Authors and Acknowledgements

The authors of the OCR toolkit are:

Thanks to Jakub Wilk, Fabian Schmitt, and Georg Drees for valuable feedback and contributions to this toolkit.

Software Download

The source code of the OCR toolkit is freely distributed under the terms of the GNU General Public License. Note that the toolkit requires a working installation of Gamera.

Available file releases for Gamera 4 and Python 3.x:

Older versions for Gamera 3 and Python 2.x:

For release notes, see the file CHANGES. For installation and usage instructions see the file doc/html/index.html in the source package. When all prerequisites are installed, installation simply requires typing

python setup.py build && sudo python setup.py install