Some frequently asked qustions and (hopefully) some answers.
What is Gamera?
Gamera is a toolkit for building document image recognition systems. It consists of a programming library and a set of GUI tools for experimentation and training. Gamera hopes to reduce the development time of document recognition applications by including a number of commonly uses components to prevent "reinvention of wheels" whenever possible. Please see the Gamera overview for more information.
The term "document" is used loosely, and can include many kinds of information presented in two-dimensional form. Gamera has been used to build recognizers for common music notation, medieval manuscript and other things.
What is Gamera not?
Gamera is not a packaged document recognition system, such as Tesseract OCR or Audiveris. It is a tool with which one can develop document recognition applications, but is not one itself. Developing a recognizer for Gamera is designed to be as easy as possible, but still requires a considerable time committment.
Gamera's focus is somewhat biased towards document types that are not well supported by existing, off-the-shelf software. Certain document types, such as medieval manuscripts, are unlikely to provide the financial incentive to support the development of a commercial application.
Why the name "Gamera"?
Gamera is the acronym for "Generalized Algorithms and Methods for Enhancement and Restoration of Archives". The software, which grew out of our research on a system called AOMR (Adaptive Optical Music Recognition), was christened as Gamera on 1 April 2001.
Gamera is also the name of a overgrown turtle in a series of Japanese monster movies. There is some hope that the software, like the turtle in the Turtle and the Hare story, will eventually be triumphant.
What sorts of scripts will Gamera work with?
This is "script" in the sense of "writing system", not "scripting language".
- Scripts with small character sets and (mostly) well-segmented characters (e.g. Latin, Greek, Hebrew, Cyrllic), Gamera performs very well.
- For large character sets (e.g. Kanji) some sort of syntactical or structural analysis of the character is necessary. This sort of thing is not implemented in Gamera at present, but there is nothing stopping an interested researcher from adding these features.
- I'm sure there's other categories of which I'm completely ignorant.
And don't forget Gamera has been used to develop systems for other non-text structured documents such as commmon music notation and lute tablature.
Why can't I put my image in and get text out?
See the question "What is Gamera not?". There is a rudimentary framework for text extraction in roman_text.py, however, expect that there will be a lot of customization necessary for each document domain. For ordinary text documents with isolated characters, the OCR toolkit is a good starting point.
How should I get started?
It is helpful to have a background in programming. A basic knowledge of Python is required, but most people who have experience in another mainstream language generally find Python easy to learn. The recommended reading for starters is:
How can I get help?
The gamera-devel mailing list on Yahoo! Groups is the best way to contact the authors and other members of the community. If you are running into a bug, please be sure to include the following information:
- The versions of Gamera, Python and wxPython you are using
- Your platform
- Any output or backtraces that are being produced
- A short script that produces the error
How should I cite Gamera (in an academic paper etc.)?
The canonical URL for the Gamera website is http://gamera.sourceforge.net/. That URL will always contain the most up-to-date information on Gamera with links to the offical documentation and published papers.
If you need a more formal citation for an academic paper, please use the following citation:
M. Droettboom, K. MacMillan, and I. Fujinaga: The Gamera framework for building custom recognition systems. Symposium on Document Image Understanding Technologies, pp. 275-286 (2003) (see also http://gamera.sourceforge.net/)
I can't get Gamera to run.
First check the following:
- Make sure you have the correct version of Python installed.
(This is 2.3.0 or greater on Linux and OS-X and 2.3.1 or greater
on Windows). Note that most Linux distributions have the development
files for Python in a different package python-dev or
python-devel. Verify that it is installed by typing
>>> import distutils
- Make sure you have the correct version of wxPython installed.
Versions with an odd second digit (like 2.5.x) are unstable development
releases and may not work with Gamera. You will need to visit the
complete list of wxPython releases to download a 2.6.x or 2.8.x
version. You can check your wxPython version with
>>> import wx
- If you are running Gamera on the commandline, try running the gamera_gui script from a directory other than the Gamera source directory. Make sure that no file gamera.py is in the same directory.
If these things fail, please send a message on the mailing list. Include in your message the Python backtrace, the versions of Gamera, Python, wxPython and platform you are using.
- Make sure you have the correct version of Python installed. (This is 2.3.0 or greater on Linux and OS-X and 2.3.1 or greater on Windows). Note that most Linux distributions have the development files for Python in a different package python-dev or python-devel. Verify that it is installed by typing
When running Gamera, I obtain error messages that certain third party modules cannot be loaded.
When loading some plugins for import/export to third party libraries, Gamera tries to load the respective third party modules. When these are not installed, you might obtain the following messages:
Python Imaging Library module Image could not be imported
numpy.numarray could not be imported
numpy could not be imported
numpy.oldnumeric could not be imported
These messages can safely be ignored. Gamera is fully functional without these modules.
How do I write a Gamera script?
Gamera scripts are just Python scripts that import Gamera's modules. It is definitely a good idea to familiarise yourself with the basics of Python before diving in. There are a number of really basic scripts to help get you started in the documentation.
How can I read JPEG images?
The Gamera function load_image can only open TIFF and PNG images. The easiest workaround is to convert the image file to PNG, e.g. with ImageMagick (shipped with Linux):
$ convert file.jpg file.png
or with sips (shipped with MacOS X):
$ sips -s format png infile.jpg --out outfile.png
After classification, how do I get the results?
The classifier stores its classifications in the id_name member variable of images. This id_name member is actually a list of possible classifications. See the id_name documentation for more information.
When you pass a list into classifier.classify_list_automatic or classifier.group_list_automatic, the list itself is not modified. Instead, any glyphs that should be added or removed are returned in a tuple of lists (added, removed). Therefore, to get any glyphs that were newly created by either splitting or grouping, you have to do the following:
added, removed = classifier.group_list_automatic(glyphs)
glyphs += added
There is also a convenience function classifier.classify_and_update_list_automatic which handles this for you.
When should I use C++ and when should I use Python?
There's no straight answer here. This should be considered as a tradeoff between runtimes (always let benchmarking on real-world data determine which is better) and development time. That said, you usually won't want to go through the trouble of implementing something twice, so here is a useful rule of thumb:
- Algorithms that need access to individual pixels should be implemented in C++
- Algorithms that drive other long-running, low-level processes should be implemented in Python
What's the deal with page glyphs and classifier glyphs?
The page glyphs are simply the set of connected components on the page you are currently training. The classifier glyphs are the connected components that the classifier uses to make its classifications (i.e. the training data). They are documented here.
The classifier GUI provides some flexibility as to how these two databases are saved, loaded and merged.
How do I train the classifier to group connected components together (such as for lower case i's)?
The classifier can be used to both repair broken characters and recognize "legitimately broken" characters. To train broken characters, select all parts of a single character and give the symbol name the prefix _group.
For example, to train lower case i's, select both the stem and dot of a single lower case i and train it as _group.lower.i.
What's with id names?
Training is basically the act of assigning symbol names to characters so that the classifier can learn what things are. Symbol names in Gamera may contain Unicode characters, and can be delimited into categories using periods. There is deliberately no standard naming convention in Gamera: that will depend entirely on the type of document being trained. However, if your document type fits neatly into the textual types of documents supported by Unicode, you may want to use standard Unicode character names, if only to avoid reinventing the wheel.
How can I make classification faster?
The first thing to look at is the set of features you're using. Gamera provides a large number of feature generation routines, some of which are rather computationally intensive. Try limiting the set of features to ones you think you'll really need.
You can decrease the time spent loading the training data into the classifier dramatically by using classifier.serialize() to save it in a high-speed but non-portable binary format.
Moreover you can automatically reduce the training data size with editing algorithms. The edit_cnn algorithm reduces the training set without much change to classification decision, and edit_mnn_cnn can (in some cases) not only reduce the training data size, but also improve recognition rates by removing outliers in the training data.