Free C# OCR library

Experiences, small talk, and other automation gossip.
sheafox
Posts: 26
Joined: Mon Jul 09, 2018 3:54 am

Free C# OCR library

Post by sheafox » Mon Oct 22, 2018 4:10 am

Does anyone know a good free C# OCR library ?

User avatar
odklizec
Ranorex Guru
Ranorex Guru
Posts: 5184
Joined: Mon Aug 13, 2012 9:54 am
Location: Zilina, Slovakia

Re: Free C# OCR library

Post by odklizec » Tue Oct 23, 2018 11:16 am

Hi,

I don't have a use for OCR library, but a quick google search returned this:
http://www.pixel-technology.com/freeware/tessnet2/
Pavel Kudrys
Ranorex explorer at Descartes Systems

Please add these details to your questions:
  • Ranorex Snapshot. Learn how to create one >here<
  • Ranorex xPath of problematic element(s)
  • Ranorex version
  • OS version
  • HW configuration

semate
Posts: 16
Joined: Tue Jul 03, 2018 7:42 am

Re: Free C# OCR library

Post by semate » Thu Dec 13, 2018 10:12 am

Hi!

I have the Tesseract OCR Library running with Ranorex.
I ended up using the Package below:
The attachment Tesseract2.PNG is no longer available

Make sure to have the libs in the Ranorex Project.
Tesseract2.PNG
Tesseract2.PNG (46.87 KiB) Viewed 924 times
My code looks like that:

Code: Select all

        //---------------------------------------------------------------------
    	/// <summary>
    	/// Read graphical Text with the Tesseract OCR module
    	/// </summary>
    	[UserCodeMethod]
    	public static string OCRRead(Bitmap bmp, string whitelist,string enginePath)
    	{
			try{    		
	    		Tesseract.Pix px = PixConverter.ToPix(bmp);	    		
	    		TesseractEngine engine = new TesseractEngine(enginePath, "eng", Tesseract.EngineMode.Default);
	    		engine.DefaultPageSegMode=Tesseract.PageSegMode.Auto;
	    		//engine.SetVariable("classify_bln_numeric_mode",0);
	    		if (whitelist!="")
	    		{
	    			engine.SetVariable("tessedit_char_whitelist",whitelist);    		
	    		}
	    		Tesseract.Page pg = engine.Process(px);    		
	    		string text = pg.GetText();				
				return text;
    		} catch(Exception ex) {
    			Debug.WriteLine("EnginePath: "+enginePath);
    			Debug.WriteLine("Whitelist: "+whitelist);
    			throw new ExceptionOcrImage(ex.ToString(),bmp);
    		}	
    	}
And an example call:

Code: Select all

Bitmap bmp ;   // bitmap, e.g. from screenshot
string whitelist = "0123456789:._-/| ";	    	
string [email protected]"D:\tesseract\DataFiles\tessdata";
string ocrDatetime = OCRRead(bmp, whitelist, tesseractFile);
Make sure you have the Trainingsdata File available in the tessdata folder. If I remember right, the tessdata folder was mandatory.
I downloaded the files eng.traineddata and deu.traineddata from https://github.com/tesseract-ocr/tessdata. Make sure you use the correct version (3.0.4 in my case)

As for the accuracy of the text detection I do have to say that it works best with large texts. Small texts may be challenging and some characters and spaces are not always detected perfectly. Even if I filter all colors to have only white text on black background. But that may be different from case to case. And there should be the possibility to train it yourself - but I haven't looked into that yet.

Hope that helps!

semate
Posts: 16
Joined: Tue Jul 03, 2018 7:42 am

Re: Free C# OCR library

Post by semate » Thu Dec 13, 2018 10:14 am

It messed up the pictures in my earlier post.

Libs picture should be:
TesseractLibs.PNG
TesseractLibs.PNG (3.54 KiB) Viewed 923 times