Click or drag to resize

OcrApi Class

Base class for all tesseract APIs.
Inheritance Hierarchy
SystemObject
  Patagames.OcrOcrApi

Namespace:  Patagames.Ocr
Assembly:  Patagames.Ocr (in Patagames.Ocr.dll) Version: 4.2.411
Syntax
public class OcrApi : IDisposable

The OcrApi type exposes the following members.

Properties
  NameDescription
Public propertyAllWordConfidences
Returns all word confidences (between 0 and 100) in an array.
Public propertyAvailableLanguages
Gets the available languages.
Public propertyDataPath
Gets the path to the tessdata folder
Public propertyEngineMode
Gets curent OEM.
Public propertyHandle
Gets the handle to the tesseract API object
Public propertyInitLanguages
Gets the languages string used in the last valid initialization.
Public propertyInputFilters
Gets the collection of filters that are applied to the input image.
Public propertyInputImage
Gets or sets the input image
Public propertyInputName
Gets or sets the name of the input file. Needed for training and reading a UNLV zone file, and for searchable PDF output.
Public propertyIterator
Get a reading-order iterator to the results of LayoutAnalysis and/or Recognize.
Public propertyStatic memberLicenseKey
Gets or sets license key. Null for trial mode.
Public propertyLoadedLanguages
Gets the loaded languages. Includes all languages loaded by the last Init, including those loaded as dependencies of other loaded languages
Public propertyMutableIterator
Get a mutable iterator to the results of LayoutAnalysis and/or Recognize.
Public propertyOutputName
Gets or sets the name of the output files. Needed only for debugging.
Public propertyPageSegmentationMode
Gets or sets the current page segmentation mode.
Public propertyStatic memberPathToEngine
Gets or sets path to the tesseract.dll. Null for automatic detection. See remarks sections for detail.
Public propertyRectangle
Restrict recognition to a sub-rectangle of the image.
Public propertySourceResolution
Gets or set the resolution of the source image in pixels per inch. This should be setted right after SetImage, and will let us return appropriate font sizes for the text.
Public propertyTextConfidences
Gets the (average) confidence value between 0 and 100.
Public propertyThresholdedImage
Get a copy of the internal thresholded image from Tesseract.
Public propertyThresholdedImageScaleFactor
Gets the scale factor of the thresholded image that would be returned ThresholdedImage and the various methods that call GetComponentImages. Equals 0 if no thresholder has been set.
Public propertyVersion
Gets the version identifier as a static string.
Top
Methods
  NameDescription
Public methodAdaptToWordStr
Applies the given word to the adaptive classifier if possible
Public methodAnalyseLayout
Runs page layout analysis in the mode set by PageSegmentationMode.
Public methodClear
Free up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload. Afterwards, you must call SetImage or GetTextFromImage before doing any Recognize or Get* operation.
Public methodClearAdaptiveClassifier
Call between pages or documents etc to free up memory and forget adaptive data.
Public methodClearPersistentCache
Clear any library-level memory caches.
Public methodStatic memberCreate
Create handle to base APIs interface
Public methodDispose
Releases all resources used by this OcrApi
Public methodDumpToPGM Obsolete.
Dump the internal binary image to a PGM file.
Public methodGetAltoText
Make an XML-formatted string with Alto markup from the internal data structures.
Public methodGetBoolVariable
Get the value of an internal "parameter."
Public methodGetBoxText
The recognized text is returned as a char* which is coded in the same format as a box file used in training.
Public methodGetComponentImages(PageIteratorLevel, Boolean, OcrBoxa, OcrPixa, Int32)
Get the given level kind of components (block, textline, word etc.) as a leptonica-style Boxa, Pixa pair, in reading order.
Public methodGetComponentImages(PageIteratorLevel, Boolean, Boolean, Int32, OcrBoxa, OcrPixa, Int32, Int32)
Get the given level kind of components (block, textline, word etc.) as a leptonica-style Boxa, Pixa pair, in reading order.
Public methodGetDoubleVariable
Get the value of an internal "parameter."
Public methodGetHOCRText
Make a HTML-formatted string with hOCR markup from the internal data structures.
Public methodGetIntVariable
Get the value of an internal "parameter."
Public methodGetLSTMBoxText
Make a box file for LSTM training from the internal data structures.
Public methodGetRegions
Get the result of page layout analysis as a leptonica-style Boxa, Pixa pair, in reading order.
Public methodGetStringVariable
Get the value of an internal "parameter."
Public methodGetStrips
Get textlines and strips of image regions as a leptonica-style Boxa, Pixa pair, in reading order.Enables downstream handling of non-rectangular regions.
Public methodGetSymbols
Get the symbols as a leptonica-style Boxa, Pixa pair, in reading order.
Public methodGetTextFromImage(Bitmap)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(String)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(OcrPix)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(Bitmap, Rectangle)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(String, Rectangle)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(OcrPix, Rectangle)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(Bitmap, Point, Size)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(String, Point, Size)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(OcrPix, Point, Size)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(Bitmap, Int32, Int32, Int32, Int32)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(String, Int32, Int32, Int32, Int32)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextFromImage(OcrPix, Int32, Int32, Int32, Int32)
Recognize a rectangle from an image and return the result as a string.
Public methodGetTextlines(OcrBoxa, OcrPixa, Int32)
Get the textlines as a leptonica-style Boxa, Pixa pair, in reading order.
Public methodGetTextlines(Boolean, Int32, OcrBoxa, OcrPixa, Int32, Int32)
Get the textlines as a leptonica-style Boxa, Pixa pair, in reading order.
Public methodGetTsvText
Make a TSV-formatted string with Alto markup from the internal data structures.
Public methodGetUNLVText
The recognized text is returned as a char* which is coded as UNLV format Latin-1 with specific reject and suspect codes
Public methodGetUtf8Text
The recognized text is returned as a string which is coded as UTF8
Public methodGetWords
Get the words as a leptonica-style Boxa, Pixa pair, in reading order.
Public methodGetWordStrBoxText
The recognized text is returned as a char* which is coded in the same format as a WordStr box file used in training.
Public methodInit(Languages, String, OcrEngineMode, String, String, String, Boolean)
Initialize the OCR SDK library
Public methodInit(Languages, String, OcrEngineMode, String, String, String, Boolean)
Initialize the OCR SDK library
Public methodInit(String, String, OcrEngineMode, String, String, String, Boolean, Boolean)
Initialize the OCR SDK library
Public methodInitForAnalysePage
Init only for page layout analysis.
Public methodInitLang
Init only the lang model component of Tesseract.
Public methodIsValidWord
Check whether a word is valid according to Tesseract's language model
Public methodPrintVariablesToFile
Print Tesseract parameters to the given file.
Public methodProcessPage
Turn a single image into symbolic text.
Public methodProcessPages
Turns images into symbolic text.
Public methodReadConfigFiles
Read a "config" file containing a set of parameter name, value pairs.
Public methodReadDebugConfigFiles
Same as ReadConfigFiles(String), but only set debug params from the given config file.
Public methodRecognize
Recognize the image from SetImage, generating Tesseract internal structures.
Public methodRecognizeForChopTest
Variant on Recognize used for testing chopper
Public methodRelease
Close down tesseract and free up all memory. Once Release() has been used, none of the other API functions may be used other than Init.
Public methodSetImage(Bitmap)
Provide an image for Tesseract to recognize.
Public methodSetImage(OcrPix)
Provide an image for Tesseract to recognize.
Public methodSetVariable
Set the value of an internal "parameter."
Top
Thread Safety
Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
See Also