Gamera Addon: GreekOCR Toolkit
This is a Gamera toolkit for building text recognition applications for polytonal (classical) Greek. It is based on the Gamera framework and requires a working installation of both Gamera and the Gamera OCR toolkit.
About the GreekOCR toolkit
The GreekOCR Toolkit is an optical character recognition (OCR) systems for polytonal Greek text documents, i.e. Greek texts with a wide variability of accents. It is currently in an experimental stage and requires still extensive testing, but is nevertheless already usable. It provides:
- two different approaches for dealing with accents (wholistic versus separatistic)
- two different output formats (Unicode or LaTeX utilizing the Teubner style)
- a ready-to-run python script greekocr4gamera.py which acts as a basic GreekOCR-system. Note however, that the character training must be done beforehand by the user: the toolkit does not include any training data.
Further improvements and complementary tools to this toolkit and results of Greek OCR can be found on Bruce Robertson's website on Greek OCR.
Detailed documentation is included with the source code package in the subdirectory doc/html.
For testing purposes, we provide a basic demo package greekocr-demo.tar.gz, which includes a small test image, corresponding training data and symbol tables that can be useful for avoiding class name typos during training. See the file README for usage examples.
Authors and Achnowledgements
The authors of the GreekOCR toolkit are:
We are grateful to Georgios K. Michalakis for initiating this project and to the Association Stoudion for financial support of parts of the development.
The source code of the GreekOCR toolkit is freely distributed under the terms of the GNU General Public License. Note that the toolkit relies both on Gamera and the OCR toolkit and therefore requires both software packages to be installed. Available file releases are:
- greekocr-1.0.1.tar.gz (Sep 19, 2011)
For release notes, see the file CHANGES. For installation and usage instructions see the file doc/html/index.html in the source package. When all prerequisites are installed, installation simply requires typing
python setup.py build && sudo python setup.py install
On Windows, you can alternatively also use the binary installer (make sure you have also installed Gamera and the OCR toolkit and also download the above source package for documentation!):
- greekocr-1.0.1.win-amd64.exe for 64bit Python 2.7 (Sep 19 2011)