4 Lines of Code and Nothing More
var api = OcrApi.Create();
api.Init(Languages.English);
using (var renderer = OcrPdfRenderer.Create("searchable.pdf"))
api.ProcessPages(@"scanned.pdf", renderer);
Just a Magic!
It is thanks to the straightforward API that you can transform a scanned PDF to searchable document with literally few lines of code.
By the way, not only PDF. Adding to your app JPEG, multipage TIFF or PNG conversion to a searchable PDF is now minutes, not hours or days.
Tesseract.NET SDK accurately recognizes texts in more than 120 languages, supports multi-language texts and can be trained to work with previously unknown languages. Among the ones supported as standard are English, French, Italian, German, Spanish, Arabic, Chinese, Hebrew, Japanese, Russian, Thai and others.
For example, deskew input filter automatically rotates an image so it is the right way up and orthogonal. The quality of Tesseract’s line segmentation reduces significantly if a page is too skewed, which severely impacts the quality of the OCR.
Input filters to enhance OCR performance which are built into Patagames OCR SDK include: Binarize, Contrast and Contrast Normalization, Deskew, Enhance Resolution, Erode and Dilate, Inflate and Deflate, Invert, Remove Border, Rotate, ToGray, and White Background.
While Tesseract is certainly the best OCR library available so far, Tesseract.NET SDK is one of the best ways to equip your application with text recognition capabilities.
Combining easy deployment, exceptional recognition accuracy, lighting-fast OCR and variety of output options including PDF, HOCR, UNLV and plain text, Tesseract.Net SDK offers flexible and simple API with lots of high- and low-level text recognizing procedures.
It is thanks to the straightforward API that you can transform a given image to searchable text with few lines of code. And if you need a more detailed insight into components of the text, the Tesseract.NET SDK API provides a number of classes to retrieve individual letters, words, paragraphs and even font parameters.
You can try Tesseract.NET SDK for free now and experience the fastest and the most faultless optical recognition ever available for .Net applications.
Enjoy robust development of OCR capable .Net applications!
NuGet package is also available in the official repo at nuget.org