Last modified: February 14, 2023
Contents
[object] bbox_merging (int Ex = -1, int Ey = -1, int iterations = 2)
Operates on: | Image [OneBit] |
---|---|
Returns: | [object] |
Category: | PageSegmentation |
Defined in: | pagesegmentation.py |
Author: | Rene Baston, Karl MacMillan, and Christoph Dalitz |
Segments a page by extending and merging the bounding boxes of the connected components on the page.
How much the segments are extended is controlled by the arguments Ex and Ey. Depending on their value, the returned segments can be lines or paragraphs or something else.
The return value is a list of 'CCs' where each 'CC' represents a found segment. Note that the input image is changed such that each pixel is set to its segment label.
Arguments:
How much each CC is extended to the top and bottom before merging. When -1, it is set to the median height of all CCs. This will typically segment into paragraphs.
If you want to segment into lines, set Ey to something small like one sixth of the median symbol height.
[object] kise_block_extraction (float Ta = 40.00, float fr = 0.34)
Operates on: | Image [OneBit] |
---|---|
Returns: | [object] |
Category: | PageSegmentation |
Defined in: | pagesegmentation.py |
Author: | Christoph Dalitz |
Segments a page into blocks by Kise's method based on the area Voronoi diagram as described in
K. Kise, A. Sato, M. Iwata: Segmentation of Page Images Using the Area Voronoi Diagram. Computer Vision and Image Understandig 70, pp. 370-382, 1998
The return value is a list of 'CCs' where each 'CC' represents a found segment. Note that the input image is changed such that each pixel is set to its segment label.
The algorithm first builds a CC neighborhood graph and then removes edges from this graph based upon the area ratio and distance between adjacent segments. The criterion is
d/Td1 <= 1 OR d/Td2 + A/Ta <= 1
where Td1 < Td2 are the two largest peaks in the CC distance distribution and A is the area ratio of the adjacent CCs.
Arguments:
[object] projection_cutting (int Tx = 0, int Ty = 0, int noise = 0, Choice [cut|ignore] gap_treatment = cut)
Operates on: | Image [OneBit] |
---|---|
Returns: | [object] |
Category: | PageSegmentation |
Defined in: | pagesegmentation.py |
Author: | Maria Elhachimi and Robert Butz |
Segments a page with the Iterative Projection Profile Cuttings method.
The image is split recursively in the horizontal and vertical direction by looking for 'gaps' in the projection profile. A 'gap' is a contiguous sequence of projection values smaller than noise pixels. The splitting is done for each gap wider or higher than given thresholds Tx or Ty. When no further split points are found, the recursion stops.
Whether the resulting segments represent lines, columns or paragraphs depends on the values for Tx and Ty. The return value is a list of 'CCs' where each 'CC' represents a found segment. Note that the input image is changed such that each pixel is set to its CC label.
[object] runlength_smearing (int Cx = -1, int Cy = -1, int Csm = -1)
Operates on: | Image [OneBit] |
---|---|
Returns: | [object] |
Category: | PageSegmentation |
Defined in: | pagesegmentation.py |
Author: | Christoph Dalitz and Iliya Stoyanov |
Segments a page with the Run Length Smearing algorithm.
The algorithm converts white horizontal and vertical runs shorter than given thresholds Cx and Cy to black pixels (this is the so-called 'run length smearing').
The intersection of both smeared images yields the page segments as black regions. As this typically still consists small white horizontal gaps, these gaps narrower than Csm are in a final step also filled out.
The return value is a list of 'CCs' where each 'CC' represents a found segment. Note that the input image is changed such that each pixel is set to its CC label.
Arguments:
IntVector segmentation_error (Image [OneBit] Gseg, Image [OneBit] Sseg)
Returns: | IntVector |
---|---|
Category: | PageSegmentation |
Defined in: | pagesegmentation.py |
Author: | Christoph Dalitz |
Compares a ground truth segmentation Gseg with a segmentation Sseg and returns error count numbers.
The input images must be given in such a way that each segment is uniquely labeled, similar to the output of a page segmentation algorithm like runlength_smearing. For ground truth data, such a labeled image can be obtained from an external color image with colors_to_labels.
The two segmentations are compared by building equivalence classes of overlapping segments as described in
M. Thulke, V. Margner, A. Dengel: A general approach to quality evaluation of document segmentation results. Lecture Notes in Computer Science 1655, pp. 43-57 (1999)
Each class is assigned an error type depending on how many ground truth and test segments it contains. The return value is a tuple (n1,n2,n3,n4,n5,n6) where each value is the total number of classes with the corresponding error type:
Nr | Ground truth segments | Test segments | Error type |
---|---|---|---|
n1 | 1 | 1 | correct |
n2 | 1 | 0 | missed segment |
n3 | 0 | 1 | false positive |
n4 | 1 | > 1 | split |
n5 | > 1 | 1 | merge |
n6 | > 1 | > 1 | splits and merges |
The total segmentation error can be computed from these numbers as 1 - n1 / (n1 + n2 + n3 + n4 + n5 + n6). The individual numbers can be of use to determine what exactly is wrong with the segmentation.
As this function is not an image method, but a free function, it is not automatically imported with all plugins and you must import it explicitly with
from gamera.plugins.pagesegmentation import segmentation_error
tuple sub_cc_analysis ([object cclist])
Operates on: | Image [OneBit] |
---|---|
Returns: | tuple |
Category: | PageSegmentation |
Defined in: | pagesegmentation.py |
Author: | Stephan Ruloff and Christoph Dalitz |
Further subsegments the result of a page segmentation algorithm into groups of actual connected components.
The result of a page segmentation plugin is a list of 'CCs' where each 'CC' does not represent a 'connected component', but a page segment (typically a line of text). In a practical OCR application you will however need the actual connected components (which should roughly correspond to the glyphs) in groups of lines. That is what this plugin is meant for.
The input image must be an image that has been processed with a page segmentation plugin, i.e. all pixels in the image must be labeled with a segment label. The input parameter cclist is the list of segments returned by the page segmentation algorithm.
The return value is a tuple with two entries:
Note
The groups will be returned in the same order as given in cclist. This means that you can sort the page segments by reading order before passing them to sub_cc_analysis.
Note that the order of the returned CCs within each group is not well defined. Hence you will generally need to sort each subgroup by reading order.
[object] textline_reading_order ([object lineccs])
Returns: | [object] |
---|---|
Category: | PageSegmentation |
Defined in: | pagesegmentation.py |
Author: | Christoph Dalitz |
Sorts a list of Images (CCs) representing textlines by reading order and returns the sorted list. Incidentally, this will not only work on textlines, but also on paragraphs, but not on actual Connected Components.
The algorithm sorts all lines in topological order, based on the following criteria for the pairwise order of two lines:
In the reference "High Performance Document Analysis" by T.M. Breuel (Symposium on Document Image Understanding, USA, pp. 209-218, 2003), an additional constraint is made for the first criterion by demanding that no other segment may be between a and b that overlaps horizontally with both. This constraint for taking multi column headings that interrupt columns into account is replaced in this implementation with an a priori sort of all textlines by y-position. This results in a preference of rows over columns (in case of ambiguity) in the depth-first-search utilized in the topological sorting.
As this function is not an image method, but a free function, it is not automatically imported with all plugins and you must import it explicitly with
from gamera.plugins.pagesegmentation import textline_reading_order