Getting started with Gamera scripts

Last modified: September 16, 2022

Contents

Overview

While the GUI is great for experimentation, the ultimate goal of any Gamera application is usually an automated application, or script. This provides, among other things, the ability to run unattended in batch mode.

Gamera scripts are merely Python scripts (modules) that use Gamera objects and call Gamera functions. The Python language, being a feature-rich general purpose language, provides a number of ways to structure your application. The high-level structure of your application will largely depend on its size and complexity and your own programming preferences and style, and is out of the scope of this document. Obtaining some level of familiarity with Python through any of the numerous Python books and tutorials available is recommended.

The examples below demonstrate how to start creating Gamera-based applications. The source code of these examples are available in the examples directory of the Gamera distribution. Each example builds on the previous one.

Hello, world!

To use Gamera in a Python script, you simply need to import the gamera.core module and then initialize it:

from gamera.core import *
init_gamera()

After that, you can use Gamera's functions and methods. The following code loads a given file specified on the command line, converts it to a one-bit (binary) image, and then saves the result to "output.png".

import sys

# Import the Gamera core and initialize it
from gamera.core import *
init_gamera()

# Load filename specified on the command line.
# load_image is a Gamera function to load a TIFF or PNG file.
# sys.argv is the Pythonic way to access commandline arguments.
# (A better way to deal with commandline arguments in Python
# is the optparse module, but that's too much detail for this
# example...)
image = load_image(sys.argv[-1])

# The variable 'image' now is a reference to the image

# Convert the image to onebit using the default Otsu method
onebit = image.to_onebit()

# Save the result to a PNG file
onebit.save_PNG("output.png")

That's about as simple as it gets with Gamera, but clearly only about as useful as the ubiquitous "Hello, World!" you see in programming texts.

Using the GUI from your scripts

For interactive use, it may be desirable to have the results of a script displayed in a window, rather than saved to a file. (If you do not wish to do this, you can safely skip this section.) This image window will have all of the panning and zooming functionality of the regular Gamera image window, but it will not be possible to use the right-click menu to perform further updates to the image.

Using the GUI for a script is convenient, though slightly tricky for two reasons:

  1. Any code that displays images or otherwise interacts with the GUI must run inside the GUI thread. The simplest way to do this is to do all work inside of a function, and then pass this function as an argument to gui.run(). This will have the GUI start the application once it has been initialized.
  2. One must be careful about the lifetime of image objects. Since Python uses reference counting for memory management, whenever there are no longer any references pointing to an image object, it is automatically deleted. Whenever an image object is deleted, the window is destroyed as well. So to keep windows open for viewing after processing is finished, there needs to be at least one reference remaining. The simplest way to do this is to assign the image to a global variable.

The following is the same as the previous example, except rather than saving the result to disk, it is displayed in a Gamera display window.

import sys

def my_application():
   # Make the image variable a global variable
   # IF YOU DON'T DO THIS, THE WINDOW WILL DISAPPEAR!
   global image

   # Load the image
   image = load_image(sys.argv[-1])

   # Display the image in a window
   image.display()

# Import the Gamera core and initialize it
from gamera.core import *
init_gamera()

# Import the Gamera GUI and start it
from gamera.gui import gui
gui.run(my_application)

# The GUI thread will automatically stop when all windows have
# been closed.
print "Goodbye!"

Using the classifier

One of Gamera's core features is the classifier, which allows individual images to be classified based on some training. The classifier is described in detail in the classifier documentation.

The following is an example that loads an image and classifies its connected components based on a given training set. Some training data for what is expected in the image is required in order for this example to work correctly.

import sys

def my_application():
   global ccs

   # Load the image, and convert it to onebit
   image = load_image(sys.argv[-1])
   onebit = image.to_onebit()

   # Get the connected components from the image
   ccs = onebit.cc_analysis()

   # Classify the cc's
   classifier.classify_list_automatic(ccs)

   # Display the ccs to show their classification
   display_multi(ccs)

# Import the Gamera core and initialize it
from gamera.core import *
init_gamera()

# Import the classifier module
from gamera import knn
# Create a new classifier
classifier = knn.kNNInteractive()
# Load some training data
classifier.from_xml_filename("training.xml")

# Import the Gamera GUI and start it
from gamera.gui import gui
gui.run(my_application)

Advanced use of the classifier

There are a couple of downsides to using the classifier as described in the previous section:

If one is willing to give up the ability to add more elements to the training set at runtime, both of these problems can be ameliorated.

Once you have the serialized version of the classifier, it can be loaded easily from your production script:

from gamera.core import *
init_gamera()

from gamera import knn

# Load the classifier from the binary format
classifier = knn.kNNNonInteractive("training.knn")

# Then, we can use the classifier as we otherwise would...

# Load the image, and convert it to onebit
image = load_image(sys.argv[-1])
onebit = image.to_onebit()

# Get the connected components from the image
ccs = onebit.cc_analysis()

# Classify the cc's
classifier.classify_list_automatic(ccs)

...

Dealing with command line options

Third party scripts are free to deal with command line arguments as they wish. For instance, the Python standard library has two modules for parsing command line arguments, getopt (deprecated) and optparse.

Of course, this freedom means that Gamera will ignore its own command line arguments when run from scripts. There are two ways to deal with this problem:

  1. explicitly tell Gamera to parse the command line arguments
  2. programmatically set Gamera options

Having Gamera parse the command line

Generally, if your script would rather ignore the command line and pass all command line arguments verbatim to Gamera, simply do the following near the top of the script, but after importing Gamera:

from gamera.core import *
import sys
from gamera.config import config
config.parse_args(sys.argv[1:])

Alternatively, you can send "fake" command line arguments for Gamera to parse:

from gamera.core import *
from gamera.config import config
config.parse_args(["--progress-bar"])

Programmatically setting options

Gamera also provides an API to set its options that normally would come from the command line. Simply use the set method on the config object.

For example,

from gamera.config import config
config.set("progress_bar", True)

is directly equivalent to the following command line:

gamera_gui --progress-bar

Note that the name of the command line argument is changed to be a valid Python identifier name when used with the .set method: the hyphens (-) have been replaced by underscores (_). This is the standard behavior of optparse, the command line parsing module that Gamera uses.

Where to go from here

Obviously, there's a lot more to Gamera that isn't covered in this chapter. Where you go from here is largely dependent on the particular document domain. Most of the work in many Gamera applications involves analysing the positional information, symbolic classification and features of the connected components to recognize structure and semantics of the image.

A good example of how to put these things together is in roman_text.py, which puts simple left-to-right, top-to-bottom printed text into its reading order.