The C# OCR Library

  • #Convert scanned PDF to searchable document
  • #Fast and Precise Neural Net Based Engine
  • #Correct Low Quality Scans
  • #120+ languages
  • #.Net 2.0+, .Net 5, Standard, Core
Download Tesseract.Net SDK Install with NuGet

Turn your scanned PDF into a searchable PDF

 

4 Lines of Code and Nothing More

var api = OcrApi.Create();
api.Init(Languages.English);
using (var renderer = OcrPdfRenderer.Create("searchable.pdf"))
api.ProcessPages(@"scanned.pdf", renderer);

Just a Magic!

It is thanks to the straightforward API that you can transform a scanned PDF to searchable document with literally few lines of code.

By the way, not only PDF. Adding to your app JPEG, multipage TIFF or PNG conversion to a searchable PDF is now minutes, not hours or days.

Please refer documentation for more information about API

Read numerous image formats in 120+ languages

Tesseract.NET SDK accurately recognizes texts in more than 120 languages, supports multi-language texts and can be trained to work with previously unknown languages. Among the ones supported as standard are English, French, Italian, German, Spanish, Arabic, Chinese, Hebrew, Japanese, Russian, Thai and others.

Correct Low Quality Scans

For example, deskew input filter automatically rotates an image so it is the right way up and orthogonal. The quality of Tesseract’s line segmentation reduces significantly if a page is too skewed, which severely impacts the quality of the OCR.

Input filters to enhance OCR performance which are built into Patagames OCR SDK include: Binarize, Contrast and Contrast Normalization, Deskew, Enhance Resolution, Erode and Dilate, Inflate and Deflate, Invert, Remove Border, Rotate, ToGray, and White Background.

The best way to equip your .Net app with OCR capabilities

While Tesseract is certainly the best OCR library available so far, Tesseract.NET SDK is one of the best ways to equip your application with text recognition capabilities.

Combining easy deployment, exceptional recognition accuracy, lighting-fast OCR and variety of output options including PDF, HOCR, UNLV and plain text, Tesseract.Net SDK offers flexible and simple API with lots of high- and low-level text recognizing procedures.

It is thanks to the straightforward API that you can transform a given image to searchable text with few lines of code. And if you need a more detailed insight into components of the text, the Tesseract.NET SDK API provides a number of classes to retrieve individual letters, words, paragraphs and even font parameters.

You can try Tesseract.NET SDK for free now and experience the fastest and the most faultless optical recognition ever available for .Net applications.

Enjoy robust development of OCR capable .Net applications!

Tesseract.Net SDK

Download

NuGet package is also available in the official repo at nuget.org

PM> Install-Package Tesseract.Net.SDK

Designed for

Microsoft .Net Framework 2.0+ Microsoft .Net Standard 2.0+ Microsoft Visual Studio NuGet Microsoft Azure