NuGet can be used to automatically add files and references to your Visual Studio projects. You can use the Patagames NuGet packages without installing the ZIP package to development with the Tesseract.Net SDK. All the Patagames components are available as NuGet packages at nuget.org.

PM> Install-Package Tesseract.Net.SDK

To install the package, enter the above command into Package Manager Console, and press the Enter key; or search for tesseract.net.sdk through NuGet Package Manager.

Alternative downloads

There are several other ways to get Tesseract.Net SDK - "7-zip" and "ZIP" archive for manual installation. If you don’t specifically require any of these installers, we recommend using the NuGet package.

Download .Zip version: 4.6.411 | file size: 23.9 Mb
Download .7z version: 4.6.411 | file size: 16.1Mb

All of the above packages include the following:

  • tessdata
    • configs
    • eng.traineddata English language data (tessdata_main)
    • osd.traineddata Orientation and Script Detection Data (tessdata_main)
    • equ.traineddata Math / equation detection module (tessdata_main)
    • pdf.ttf Custom font used on PDF generation
  • net20
    • Patagames.Ocr.dll Main assembly targeted to .Net Framework 2.0
    • Patagames.Ocr.xml Xml comments for main assembly
  • net30
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net35
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net40
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net45
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net451
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net452
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net46
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net461
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net462
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net47
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net471
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net472
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net48
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
  • net50
    • Patagames.Ocr.dll Main assembly targeted to .Net 5.0
    • Patagames.Ocr.xml
    • Patagames.Ocr.deps.json
  • netstandard20
    • Patagames.Ocr.dll Main assembly targeted to .Net Standard 2.0
    • Patagames.Ocr.xml
    • Patagames.Ocr.deps.json
  • netstandard21
    • Patagames.Ocr.dll
    • Patagames.Ocr.xml
    • Patagames.Ocr.deps.json
  • x64
    • tesseract.dll 64-bit version of the tesseract library for Windows
  • x86
    • tesseract.dll 32-bit version of the tesseract library for Windows
  • readme.txt

Language packs

The English language, datafiles are supplied in the standard package. If you need to use other languages, download them separately from this page and put into the tessdata folder.

amh
Amharic language data (A language of Ethiopia) *
asm
Assamese language data (A language of India) *
aze_cyrl
Azerbaijani cyrillic language data
bod
Tibetan (Central) language data (A language of China) *
bos
Bosnian language data (A language of Bosnia and Herzegovina) *
ceb
Cebuano language data (A language of Philippines) *
chi_sim
Chinese (Simplified) language data
chi_tra
Chinese (Traditional) language data
cym
Welsh language data (A language of United Kingdom) *
dan_frak
Danish (Fraktur) language data
deu_frak
German (Fraktur) language data
dzo
Dzongkha language data (A language of Bhutan) *
enm
Middle English (1100-1500) language data
equ
Math / equation detection module
fas
Persian language data (A macrolanguage of Iran) *
frm
Middle French (ca. 1400-1600) language data
gle
Irish language data (A language of Ireland) *
guj
Gujarati language data (A language of India) *
hat
Haitian language data (A language of Haiti) *
iku
Inuktitut language data (A macrolanguage of Canada) *
ita_old
Italian (Old) language data
jav
Javanese language data (A language of Indonesia) *
kat
Georgian language data (A language of Georgia) *
kat_old
Georgian (Old) language data (A language of Georgia) *
kaz
Kazakh language data (A language of Kazakhstan) *
khm
Khmer (Central) language data (A language of Cambodia) *
kir
Kyrgyz language data (A language of Kyrgyzstan) *
kur
Kurdish language data (A macrolanguage of Iraq) *
lao
Laotian language data (A language of Laos) *
lat
Latin language data (A language of Vatican State) *
mar
Maharashtra language data (A language of India) *
mya
Burmese language data (A language of Myanmar) *
nep
Nepali language data (A macrolanguage of Nepal) *
ori
Oriya language data (A macrolanguage of India) *
osd
Orientation and Script Detection Data
pan
Panjabi (Eastern) language data (A language of India) *
pus
Pushto language data (A macrolanguage of Pakistan) *
san
Sanskrit language data (A language of India) *
sin
Singhalese language data (A language of Sri Lanka) *
slk_frak
Slovakian (Fraktur) language data
spa_old
Spanish (Old) language data
srp_latn
Serbian (Latin) language data
syr
Syriac script language data (A macrolanguage of Iraq) *
tgk
Tajik (ISO 639-3) language data (A language of Tajikistan) *
tir
Tigrinya language data (A language of Ethiopia) *
uig
Uyghur (Uighur) language data (A language of China) *
urd
Urdu language data (A language of Pakistan) *
uzb
Uzbek language data (A macrolanguage of Uzbekistan) *
uzb_cyrl
Uzbek (Cyrillic) language data (A macrolanguage of Uzbekistan) *
yid
Yiddish language data (A macrolanguage of Israel) *

All language files are downloaded from the official repository Tesseract Open Source OCR Engine

tessdata_best – Best (most accurate) trained models for the Tesseract .Net SDK.

Best results on Google’s eval data, slower, Float models.
These are the only models that can be used as base for finetune training.
Tessdata_best is for people willing to trade a lot of speed for slightly better accuracy.

These models only work with the LSTM OCR engine of Tesseract .Net SDK ver. 2.x

tessdata_fast – Fast integer versions of trained models for the Tesseract .Net SDK.

Best “value for money” in speed vs accuracy, Integer models.
Provides an alternate set of integerized LSTM models which have been built with a smaller network.

  • These are a speed/accuracy compromise as to what offered the best "value for money" in speed vs accuracy.
  • For some languages, this is still best, but for most not.
  • The "best value for money" network configuration was then integerized for further speed.
  • When using the models in this repository, only the new LSTM-based OCR engine is supported. The legacy tesseract engine is not supported with these files, so Tesseract's oem modes '0' and '2' won't work with them.

These models only work with the LSTM OCR engine of Tesseract.Net SDK ver. 2.x

tessdata_main - Version of trained models for legacy tesseract engine as well as the new LSTM neural net based engine.

The LSTM models in these files have been updated to the integerized versions of tessdata_best. So, they should be faster but probably a little less accurate than tessdata_best.

The legacy tesseract models have been removed for Indic and Arabic script language files.

These models only work with the Tesseract .Net SDK ver. 2.x

tessdata_v3 - Version of trained models for tesseract 3.04 or 3.05.

These models only work with the Tesseract .Net SDK ver. 1.x

Uninstall instructions, release logs, EULA

The release logs for this download can be found here.
The uninstall instructions can be found here.

By downloading software of Patagames or its subsidiaries from this site, you agree to the Tesseract.Net SDK End User License Agreements (EULA) for the trial software. If you do not agree with such eual do not download the software. The terms of an end user license agreement accompanying a particular software file upon installation or download of the software shall supersede the terms presented below.