Overview of the MIS format support toolkit

Last modified: June 01, 2012

Contents

Author:Colin Baumgarten, Tobias Bolten, and Christoph Dalitz
Version:1.0

File releases of this toolkit are available from the Addons section of the Gamera home page for access to file releases of this toolkit.

About this Toolkit

This toolkit provides reading support for the Multiple Image Set (MIS) image format that has been developped by the US National Institute of Standards and Technology (NIST). Even though this file format is otherwise rarely used, the handwritten character database NIST Special Database 19 is a widely deployed reference data set for OCR evaluation.

Prerequisites and Installation

Prerequisites

First you will need a working installation of Gamera. See the Gamera website for details.

For building this toolkit you will need the GNU autotools. On Debian or Ubuntu based systems you can install them with the following command:

sudo apt-get install autoconf automake libtool

Installation

This toolkit has only been tested on Linux and Mac OS X. We don't know whether and how this toolkit can be compiled and installed on Windows.

The following installation instructions should work for all Unix like operating system:

  1. Build the MIS C++ library:

    sh build_lib.sh
    
  2. Build the toolkit:

    python setup.py build
    
  3. Install the toolkit:

    sudo python setup.py install
    

User's Manual

This documentation is for those, who want to use the toolkit for reading MIS encoded files.

Usage Example

The following code shows how training and test data can be loaded from the NIST Special Database 19.

from gamera.core import *
from gamera import knn
from gamera.toolkits.mis_support.plugins import mis_support

init_gamera()

# load samples from partition zero as training data
train_a = mis_support.load_MIS("nist/DATA/BY_CLASS/61/HSF_0.MIS", "a")
train_c = mis_support.load_MIS("nist/DATA/BY_CLASS/63/HSF_0.MIS", "c")
train_a = train_a[:50]
train_c = train_c[:50]
train_a[0].save_PNG("train_a.png")
train_c[0].save_PNG("train_c.png")

# create kNN classifier with this training data
classifier = knn.kNNInteractive(database = train_a + train_c, features = ["moments"])
classifier.num_k = 3

# load test data from partition four
test_a = mis_support.load_MIS("nist/DATA/BY_CLASS/61/HSF_4.MIS", "a")

# measure recognition rate
classifier.classify_list_automatic(test_a)
errors = 0
for glyph in test_a:
  if glyph.get_main_id() != "a":
    errors += 1
print len(test_a), "test images with error rate", float(errors)/len(test_a)