The Gamera C++ Image API

Last modified: September 16, 2022

Contents

Introduction

This document describes how to manipulate Gamera images from C++. This deals primarily with low-level (pixel-level) operations.

Data model

Gamera uses a "shared" data model. This means that different "views" are applied to the same "data." Views may only look at a subset of the data, or they may change how the data is presented to the programmer. The goal of decoupling the "view" from the "data" is to allow a "view" on the data to be a very lightweight object. This allows it to be passed by value with little worry about performance. This also allows the processing of a portion of an image as if it were an entire image.

For example, to create a new image, the programmer must first create the data:

OneBitImageData image_data(Dim(50, 50));

This creates data that is 50 pixels width and 50 pixels high, and has a logical origin of (0, 0).

Then you can create a view on all of the data:

OneBitImageView image_view(image_data);

Later, if you want to view only a subset of the data, you can use a different overloaded version of the view constructor:

OneBitImageView subimage_view(image_data, Point(25, 25), Dim(10, 10));

This will create a view on the image from (25, 25) that is 10 pixels tall and wide, i.e. the lower-right corner will be at (35, 35). An alternate way to do this would be to specify two points:

OneBitImageView subimage_view(image_data, Point(25, 25), Point(35, 35));

The different constructor forms are discussed in more detail in the ImageView<T> objects section.

Memory management issues

Since multiple "views" can use the same "data," one must be careful about prematurely deallocating data objects, or there may be views left around that "point" to deallocated data. Accessing the data through the view is unchecked and could result in a segmentation fault.

Another common mistake is to deallocate all views on a given data object, forgetting to deallocate the data itself, resulting in a memory leak. Since there is no reference counting on the C++ side, deallocation of all views does not automatically trigger deallocation of the underlying data. For example:

OneBitImageData image_data = new OneBitImageData(Dim(50, 50), Point(0, 0));
OneBitImageView image_view = new OneBitImageView(*image_data);

...

delete image_view->data(); // Don't forget to delete the data
delete image_view;

Of course, if the new image data is the result of a plugin function, it will be returned to the caller. When a pointer to an image view (ImageView* or one of its subclasses) is returned to Python, it will participate in Python memory management, and both the view and data will eventually be deallocated automatically. When an ImageView* is returned to another C++ function, that function is responsible for deallocating the view and data, unless it in turn is passing it back to Python.

Fortunately, things are much easier on the Python side, since the number of views that point to a data object are reference counted, so the data object is kept around only as long as there is at least one view looking at it.

Take for example the following interactive Python session:

Python 2.3.4 (#1, Oct 26 2004, 16:42:40)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from gamera.core import *
>>> init_gamera()
Loading plugins: ----------------------------------------
arithmetic, color, convolution, corelation, deformation, draw,
edgedetect, features, gui_support, id_name_matching,
image_conversion, image_utilities, listutilities, logical,
misc_filters, morphology, png_support, projections, runlength,
segmentation, structural, thinning, threshold, tiff_support, contour
>>> orig = load_image("test.tiff")
>>> del orig
# ImageData dealloc, because the only view pointing to it (orig) has been deleted.
>>> orig = load_image("test.tiff")
>>> orig = orig.otsu_threshold()
# ImageData dealloc, because the pointer stored in orig has been replaced
# with a new onebit image.  This effect is fairly subtle.
>>> ccs = orig.cc_analysis()
>>> del orig
# No ImageData dealloc, since the ccs list contains views on the data.
>>> del ccs
# ImageData dealloc, because the remaining views on the data have been destroyed.
>>>

Image types

You should probably familiarize yourself with different Gamera image types before reading further.

ImageData<T> objects

Data objects are instances of the ImageData<T> class (in include/image_data.hpp) where T is the pixel type. If you wish to use run-length encoding to store the data, you can use the RleImageData<T> class. These classes both have the same interface and are completely interchangeable, but will exhibit different algorithmic complexities.

There are some typedefs in image_types.hpp to make creating different kinds of image data objects more convenient:

/*
  Image Data
 */
typedef ImageData<GreyScalePixel> GreyScaleImageData;
typedef ImageData<Grey16Pixel> Grey16ImageData;
typedef ImageData<FloatPixel> FloatImageData;
typedef ImageData<RGBPixel> RGBImageData;
typedef ImageData<OneBitPixel> OneBitImageData;
typedef RleImageData<OneBitPixel> OneBitRleImageData;

There are a few different constructors available for data objects:

ImageData(const Dim& dim, const Point& offset);
ImageData(const Dim& dim); // offset == (0, 0)
ImageData(const Size& size, const Point& offset);
ImageData(const Size& size); // offset == (0, 0)
ImageData(const Rect& rect);

There are also some deprecated forms that you may see in legacy code, but which are no longer supported since they are not consistent about specifying x before y:

ImageData(size_t nrows = 1, size_t ncols = 1, size_t page_offset_y = 0,
          size_t page_offset_x = 0);
ImageData(const Size& size, size_t page_offset_y = 0,
          size_t page_offset_x = 0);
ImageData(const Dimensions& dim, size_t page_offset_y = 0,
          size_t page_offset_x = 0);

The offset arguments are used to specify a logical offset of the data. For instance, if the image data is from part of the page, the upper left corner may not be logically (0, 0). This is purely for logical purposes when your system needs to know the relative positions of image bounding boxes, and does not affect the size of the image data created in any way.

The Size and Dim classes are simple ways of storing sizes and dimensions defined in gamera/dimensions.hpp. (See the Dimension types chapter for more information). Size(w, h) is always equal to Dim(w + 1, h + 1). Both forms are supported to make it easier to adapt existing algorithms that use one or the other standard.

While there are other public members and methods to the ImageData class, we do not recommend using any of them unless you really know what you're doing. All of the image data can be accessed much more flexibly and conveniently through ImageView objects.

ImageView<T> objects

Views objects are instances of the ImageView<T> class (in include/image_view.hpp), where T is the data object type (a templatization of ImageData<T>). There is also a special version of ImageView<T> for connected components, ConnectedComponent<T> defined in include/connected_components.hpp. Connected components are discussed below.

As with the data objects, there are some typedefs for convenience:

/*
  ImageView
 */
typedef ImageView<GreyScaleImageData> GreyScaleImageView;
typedef ImageView<Grey16ImageData> Grey16ImageView;
typedef ImageView<FloatImageData> FloatImageView;
typedef ImageView<RGBImageData> RGBImageView;
typedef ImageView<OneBitImageData> OneBitImageView;
typedef ImageView<OneBitRleImageData> OneBitRleImageView;

/*
  Connected-components
 */
typedef ConnectedComponent<OneBitImageData> Cc;
typedef ConnectedComponent<OneBitRleImageData> RleCc;

There are a number of different ways to construct image views:

// Creates a view covering all of the data
ImageView(T& image_data);

ImageView(T& image_data, const Rect& rect,
          bool do_range_check = true);
ImageView(T& image_data, const Point& upper_left,
          const Point& lower_right, bool do_range_check = true)
ImageView(T& image_data, const Point& upper_left,
          const Size& size, bool do_range_check = true)
ImageView(T& image_data, const Point& upper_left,
          const Dim& dim, bool do_range_check = true)

There are also some deprecated forms that you may see in legacy code, but which are no longer supported since they are not consistent about specifying x before y:

// Creates a view with a specified bounding box
ImageView(T& image_data, size_t offset_y,
          size_t offset_x, size_t nrows, size_t ncols,
          bool do_range_check = true);
ImageView(T& image_data, const Point& upper_left,
          const Dimensions& dim, bool do_range_check = true)

The Size, Point and Dim classes are simple ways of storing sizes and dimensions defined in gamera/dimensions.hpp. (See the Dimension types chapter for more information).

Coordinates for any given bounding boxes are relative to the offset of the underlying data. Therefore:

// Create data with a size of (64, 64) and an offset of (32, 32);
OneBitImageData image_data(Size(64, 64), Point(32, 32));
// This is a view over all of the data
OneBitImageView image_view(image_data, Point(32, 32), Size(64, 64));
// This raises an exception, since it's out of range for the data
OneBitImageView image_view(image_data, Point(0, 0), Size(64, 64));

You can get the type of the pixels referenced by an ImageView using the expression typename T::value_type.

Image factories

There are other convenience classes for creating image types.

TypeIdImageFactory

TypeIdImageFactory is useful if you want to create an image of a specific type based on a pair of integer constants. For example, the following code snippit creates a DENSE or RLE image based on an integer constant passed in as a parameter to the function. Note that the types of each of the TypeIdImageFactory s are different and not polymorphic, so the interior function (threshold_fill) must be called from two locations.

template<class T>
Image* threshold(const T &m, int threshold, int storage_format) {
  if (storage_format == DENSE) {
    typedef TypeIdImageFactory<ONEBIT, DENSE> fact_type;
    typename fact_type::image_type* view = fact_type::create(m.origin(), m.dim());
    threshold_fill(m, *view, threshold);
    return view;
  } else {
    typedef TypeIdImageFactory<ONEBIT, RLE> fact_type;
    typename fact_type::image_type* view = fact_type::create(m.origin(), m.dim());
    threshold_fill(m, *view, threshold);
    return view;
  }
}

ImageFactory

It is a common use case to create a result image whose type is related to the type of an input image. In many cases, one could just use the same type as the result type. However, if a Cc image type is passed into a function, you would most likely want to return a regular OneBit image as a result. ImageFactory is designed to get around this problem. It will create an appropriate result type from a given input type. For example, see the following code snippet:

template<class T>
Image* abutaleb_threshold(const T &m, int storage_format) {
  typedef typename ImageFactory<T>::view_type view_type;
  view_type* average = mean(m);
...

Accessing pixels

One of the goals of the Gamera framework is to make it easy to incorporate code from other image processing frameworks as painlessly as possible. Therefore, there are a number of interfaces that can be used to access and process the underlying pixel data of an image.

Note

Whenever accessing individual pixels, the row and col given are relative to the offsets of the view or the underlying data. In other words .get(0,0) will always return the pixel in the upper-left hand corner of the view. This makes it easier for algorithms that work on the pixels of a view to ignore the complexity of the "shared" data model.

Different interfaces

The different interfaces for accessing pixels are discussed below:

get and set methods

This is perhaps the most straightforward way to access the pixels of an image. There are two public methods of ImageView:

  • value_type get (const Point& point)
  • void set (const Point& point, value_type value)

There are also deprecated forms that you may see in legacy code:

  • value_type get (size_t row, size_t col)
  • void set (size_t row, size_t col, value_type value)

get returns the value of the pixel at the given row and column. set changes the pixel at the given row and column to the given value.

An example using get / set over an entire image:

template<class T>
typename ImageFactory<T>::view_type* test_get_set(const T& image) {
  typedef typename ImageFactory<T>::data_type data_type;
  typedef typename ImageFactory<T>::view_type view_type;
  data_type* new_data = new data_type(image);
  view_type* new_view = new view_type(*new_data);

  for (size_t r = 0; r < in.nrows(); ++r) {
    for (size_t c = 0; c < in.ncols(); ++c) {
      new_view.set(Point(c, r), image.get(Point(c, r)) / 2);
    }
  }

  return new_view;
}

C-style 2-dimensional array

An alternative is to use the notation of C-style 2-dimensional arrays:

value = image[row][col];
image[row][col] = value;

This interface is provided as a convenience to support the large body of legacy code written in this style. Note that [row][col] is the reverse of get and set's Point(x, y). Keep in mind, it is not really a 2-dimensional array underneath -- this view is "faked."

An example using the C-style 2-dimensional array interface:

template<class T>
typename ImageFactory<T>::view_type* test_c_2d(const T& image) {
  typedef typename ImageFactory<T>::data_type data_type;
  typedef typename ImageFactory<T>::view_type view_type;
  data_type* new_data = new data_type(image);
  view_type* new_view = new view_type(*new_data);

  for (size_t r = 0; r < image.nrows(); ++r) {
    for (size_t c = 0; c < image.ncols(); ++c) {
      (*new_view)[r][c] = image[r][c] / 2;
    }
  }

  return new_view;
}

Iterators

Iterators provide a lot of readability and convenience advantages over the other approaches.

Gamera has three kinds of iterators:

  1. Vector iterator: a one-dimensional iterator that iterates from the upper left hand corner, left-to-right, top-to-bottom.
  2. Row/column iterators: Iterates along either rows or columns. Incrementing the iterator moves one row down or one column to the right. Calling begin() on a row_iterator, returns a col_iterator and vice versa.
  3. Two-dimensional iterators: Provide free movement of the iterator in any direction.

The first two kinds of iterators follow the conventions of the C++ STL enough that they can be used with STL algorithms. All iterators are available in const and non-const versions. const iterators cannot change the underlying data. If you have an image passed into your function as a const ImageView<T> &, you will not be able to change the pixels in the image, and thus will only be able to get const iterators from it.

Note that you can call row( ) or col( ) on any iterator to obtain the current row and column position of the iterator.

All Gamera iterators can get/set their values in two ways, both of which boiling down to exactly the same machine code, and therefore having the same efficiency:

  1. The standard "C pointer-like" way:
// i is any iterator type
value = *i;
*i = value;
  1. Using get and set methods:
// i is any iterator type
value = i.get();
i.set(value);

Vector iterator

Vector iterators (ImageView<T>::vec_iterator) can be convenient when the operation works one pixel at a time and does not need to be aware of any spatial relationships.

An example using vec_iterators:

template<class T>
typename ImageFactory<T>::view_type* test_vec_iterator(const T& image) {
  typedef typename ImageFactory<T>::data_type data_type;
  typedef typename ImageFactory<T>::view_type view_type;
  data_type* new_data = new data_type(image);
  view_type* new_view = new view_type(*new_data);

  typename T::const_vec_iterator i = image.vec_begin();
  typename view_type::vec_iterator j = new_view->vec_begin();

  for ( ; i != image.vec_end(); ++i, ++j) {
    *j = *i / 2;  // or  j.set(i.get() / 2);
  }

  return new_view;
}

Since the Gamera image iterators follow the STL iterator interface, they can be used with builtin STL algorithms. For instance, to fill an entire image with white using the STL std::fill algorithm:

#include <algorithm>

template<class T>
void fill_white(T& image) {
  std::fill(image.vec_begin(), image.vec_end(), white(image));
}

Row/column iterators

Sometimes it is necessary to have nested loops, one for rows and one for columns.

The following is an example using row iterators (ImageView<T>::row_iterators) and column iterators (ImageView<T>::col_iterators):

template<class T>
typename ImageFactory<T>::view_type* test_row_col_iterator(const T& image) {
  typedef typename ImageFactory<T>::data_type data_type;
  typedef typename ImageFactory<T>::view_type view_type;
  data_type* new_data = new data_type(image);
  view_type* new_view = new view_type(*new_data);

  typedef typename T::const_row_iterator IteratorI;
  IteratorI ir = image.row_begin();
  typedef typename view_type::row_iterator IteratorJ;
  IteratorJ jr = new_view->row_begin();
  for ( ; ir != image.row_end(); ++ir, ++jr) {
    typename IteratorI::iterator ic = ir.begin();
    typename IteratorJ::iterator jc = jr.begin();
    for ( ; ic != ir.end(); ++ic, ++jc)
      *jc = *ic / 2;
  }

  return new_view;
}

The fun thing about row and column iterators is that they are interchangeable. If you wish to iterate through the image in column major order instead, you could write:

template<class T>
typename ImageFactory<T>::view_type* test_col_row_iterator(const T& image) {
  typedef typename ImageFactory<T>::data_type data_type;
  typedef typename ImageFactory<T>::view_type view_type;
  data_type* new_data = new data_type(image);
  view_type* new_view = new view_type(*new_data);

  typedef typename T::const_col_iterator IteratorI;
  IteratorI ir = image.col_begin();
  typedef typename view_type::col_iterator IteratorJ;
  IteratorJ jr = new_view->col_begin();
  for ( ; ir != image.col_end(); ++ir, ++jr) {
    typename IteratorI::iterator ic = ir.begin();
    typename IteratorJ::iterator jc = jr.begin();
    for ( ; ic != ir.end(); ++ic, ++jc)
      *jc = *ic / 2;
  }

  return new_view;
}

Of course, that really should be templatized on the iterator type. For an example of this, see include/plugins/projections.hpp.

Row and column iterators also have methods that return the current row and column (relative to the underlying data). i.row() returns the current row and i.col() returns the current column.

Two-dimensional iterators

Unlike the other kinds of iterators, two-dimensional iterators are a long way from C++ STL convention. The idea of these iterators was borrowed from the VIGRA library. More documentation is available there, but I've provided a short summary below.

You can obtain a two-dimensional iterator from an image using of the following methods:

Iterator upperLeft();
Iterator lowerRight();

Once you have a two-dimensional iterator, you can move it using the following:

typename T::Iterator i;

++i.x; // move right
--i.x; // move left
++i.y; // move down
--i.y; // move up

You can see that this gives more freedom and expressivity than the other iterators.

When you assign one two-dimensional iterator to another, it is copied, so there are no persistent connections between them.

typename T::Iterator center, below;

...

below = center;
++below.y;   // 'below' points to the pixel below 'center'
++center.x;  // 'center' moved.  'below' did not.

You can also perform arithmetic on these iterators on the fly:

typename T::Iterator center, below;

...

below = center.y + 1; // 'center' is not moved

If you need to move in two dimensions at once, you can either add or subtract a Diff2D object from the iterator, or use the operator() method.

typename T::Iterator i, relative;

...

relative = i + Diff2D(1, 1)   // down and to the right
relative = i[Diff2D(1, 1)]    // equivalent to above
relative = i + Diff2D(-1, -1) // up and to the left

relative = i - Diff2D(1, 1)   // up and to the left
relative = i - Diff2D(-1, -1) // down and to the right

relative = i(1, 1)            // down and to the right
relative = i(-1, -1)          // up and to the left

For an example of how two-dimensional iterators can be used in a more real-world situation, see the cc_analysis method in include/plugins/segmentation.py.

Speed tests

The different interfaces are provided primarily to support different programming styles. However, they do not run at the same speed.

The following graph shows the relative speeds of different pixel access methods (using the examples above). These results are largely processor and architecture dependent, so take them with a truckload of salt.

images/pixel_access_runtimes.png

The performance of the 1-dimensional iterator is perhaps surprising, since it appears to be the simplest code. In fact, however, vec_iterators must support the image view model, and therefore do basic "range checking" upon each increment. This introduces some overhead.

The row/column iterators in row major order (rows in the outer loop) is the clear winner. Since the underlying data is stored row major order, accessing it in column major order is a major performance hit, in part due to the extra pointer arithmetic and in part due to worse cache performance.

Note

Any performance improvement should be justified only by profiling on real-world data

Making generic programming work

Everything discussed so far has been completely polymorphic: these techniques should work identically on all types of images. However, where the images vary is in the pixel types themselves. Careful programming can keep your algorithms generic across all pixel types, but when that is no longer possible, it is also possible to specialize (i.e. write a special version of an algorithm for a particular pixel type.)

First, let's cover some more details of the various pixel types available in Gamera. They are all defined in include/pixel.hpp.

RGB
RGB pixels are represented by instances of the Rgb class. By default, each plane is represented by an 8-bit unsigned char The individual RGB planes can be get/set using the red, green and blue functions. The values follow the standard hardware conventions: larger values are higher intensity. There are also many utility methods for converting the RGB pixel into other kinds of values. See include/pixel.hpp for more information. Rgb instances are small enough (3 bytes) that you can safely pass them by value without performance concerns.
GREYSCALE
GreyScale pixels are 8-bit unsigned char values in the range 0 - 255. These also follow the hardware convention that larger values are higher intensity (white).
FLOAT
Float pixels are 32-bit floating-point float values. They follows the hardware convention that larger values are higher intensity (white). Unlike the integral pixel types, there is no set range for the values (they can even be negative). For this reason, the Gamera display will always find the lowest and highest values and then normalize the display to match the dynamic range of the display hardware. Floating point images are most useful to represent non-image data, such as the results of the convolution of two images.
ONEBIT
OneBit pixels are 16-bit unsigned char values. Perhaps confusingly, a value of 0 is white (which is usually background in most document images) and all other values are black. The extra bits of range are used when a connected component analysis is performed. Each connected component in an image is assigned a label value, and its pixels are labeled using the value. Since the directionality of OneBit pixels is different from all other pixel types, care must be taken when writing algorithms that accept OneBit images and other images.

Black and white

Convenience functions are provided to get and test values for white and black pixels that work on all pixel types.

T pixel_traits<T>::white();
T pixel_traits<T>::black();

Return the values for white and black, respectively, for the given pixel type T. Often, it is more convenient to get the pixel value from the ImageView type, and not the pixel type. Therefore, the functions

T::value_type white(T& image_type);
T::value_type black(T& image_type);

are also provided.

You can also test whether a pixel is white or black using:

bool is_white(T pixel_value);
bool is_black(T pixel_value);

Writing a specialized function

As specified in Writing Gamera Plugins, each plugin function should be templatized to accept multiple image types. Sometimes, though, a single code base will not work for all images types, or there are performance gains to be had by writing for a particular image type.

The wrinkle that makes this less than straightforward is that you often want to specialize on the pixel type, not the image type. For instance, there are multiple image types where the pixel type is ONEBIT (OneBitImageView, Cc, OneBitRleImageView etc.). Therefore, you will normally want to have a helper function which is templatized on the pixel type, which is called from the main function. See for example, the invert plugin method, which needs to do something different for FLOAT images (since the range of FLOAT pixels is not fixed.):

/* Invert an image */

// This is the generic version
template<class Pixel>
struct invert_specialized {
  template<class T>
  void operator()(T& image) {
    ImageAccessor<typename T::value_type> acc;
    typename T::vec_iterator in = image.vec_begin();
    for (; in != image.vec_end(); ++in)
      acc.set(invert(acc(in)), in);
  }
};

// This is specialized for FLOAT pixels
template<>
struct invert_specialized<FloatPixel> {
  template<class T>
  void operator()(T& image) {
    FloatPixel max = 0;
    max = find_max(image.parent());
    ImageAccessor<FloatPixel> acc;
    typename T::vec_iterator in = image.vec_begin();
    for (; in != image.vec_end(); ++in)
      acc.set(max - acc(in), in);
  }
};

// This is the top level function that calls the correct
// specialization
template<class T>
void invert(T& image) {
  invert_specialized<typename T::value_type> invert_special;
  invert_special(image);
}

Connected components

On the C++ side, the Python image type Cc corresponds to the image type ConnectedComponent. You can us it just like in Python:

ImageList* ccs = cc_analysis(image);
ImageList::iterator ccs_it;
size_t x,y;

// loop over all ccs
for (ccs_it = ccs->begin(); ccs_it != ccs->end(); ++ccs_it) {

  Cc* cc = static_cast<Cc*>(*ccs_it);

  // query cc label
  int label = cc->label();

  // Loop over individual pixels
  for (y=0; y < cc->nrows(); y++)
    for (x=0; x < cc->ncols(); x++)
      if is_black(cc->get(Point(x,y))) { // .... }
}

For creating a list of CCs from C++, two steps are necessary:

  1. Setting all pixels in the image to its label value

  2. Call the constructor for each ConnectedComponent:

    // Assumptions:
    //   the CC labels have been stored in the vector "labelvec"
    //   the CC dimensions have been stored in the vector "rectvec"
    ccs = new ImageList();
    for (size_t i = 0; i < labelvec.size(); ++i) {
      ccs->push_back(new ConnectedComponent<typename T::data_type>(
                       *((typename T::data_type*)image.data()),
                       labelvec[i],
                       Point(rectvec[i]->offset_x() + image.offset_x(),
                             rectvec[i]->offset_y() + image.offset_y()),
                       rectvec[i]->dim()
                      ));
    }
    

Multi-label connected components

On the C++ side, the Python image type MlCc corresponds to the image type MultiLabelCC (two upper case C's at the end). Its methods are the same as on the Python side; for details about its C++ constructors and properties (m_labels etc.) see the source file include/connected_component.hpp of the Gamera source code.