deposit_your_work

Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms

Files in this item

Files Size Format View
Pop2001Jan5AddingLing.PDF 147.7Kb application/pdf Thumbnail
Pop2001Jan5AddingLing.PS 291.8Kb application/postscript View/Open

Show simple item record

Item Metadata

dc.contributor.author Popat, Kris
Greene, Dan
Romberg, Justin
Bloomberg, Dan
dc.creator Popat, Kris
Greene, Dan
Romberg, Justin
Bloomberg, Dan
dc.date.accessioned 2007-10-31T00:58:06Z
dc.date.available 2007-10-31T00:58:06Z
dc.date.issued 2001-01-20
dc.date.submitted 2001-01-20
dc.identifier.uri http://hdl.handle.net/1911/20201
dc.description Conference paper
dc.description.abstract Beginning with an observed document image and a model of how the image has been degraded, Document Image Decoding recognizes printed text by attempting to find a most probable path through a hypothesized Markov source. The incorporation of linguistic constraints, which are expressed by a sequential predictive probabilistic language model, can improve recognition accuracy significantly in the case of moderately to severely corrupted documents. Two methods of incorporating linguistic constraints in the best-path search are described, analyzed and compared. The first, called the iterated complete path algorithm, involves iteratively rescoring complete paths using conditional language model probability distributions of increasing order, expanding state only as necessary with each iteration. A property of this approach is that it results in a solution that is exactly optimal with respect to the specified source, degradation, and language models; no approximation is necessary. The second approach considered is the Stack algorithm, which is often used in speech recognition and in the decoding of convolutional codes. Experimental results are presented in which text line images that have been corrupted in a known way are recognized using both the ICP and Stack algorithms. This controlled experimental setting preserves many of the essential features and challenges of real text line decoding, while highlighting the important algorithmic issues.
dc.language.iso eng
dc.subject document image decoding
optical character recognition
convolutional decoding
dc.title Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms
dc.type Conference paper
dc.date.note 2002-07-10
dc.citation.bibtexName inproceedings
dc.date.modified 2002-07-10
dc.subject.keyword document image decoding
optical character recognition
convolutional decoding
dc.citation.conferenceName Proceedings of IS&T/SPIE Electronic Imaging
dc.type.dcmi Text
dc.identifier.citation K. Popat, D. Greene, J. Romberg and D. Bloomberg, "Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms," 2001.

This item appears in the following Collection(s)

  • ECE Publications [1048 items]
    Publications by Rice University Electrical and Computer Engineering faculty and graduate students
  • DSP Publications [508 items]
    Publications by Rice Faculty and graduate students in digital signal processing.