Part 3 – Train your own model (using Tensorflow)

Let’s come to the trickiest (and challenging) part – the training itself.

I’m using Tensorflow with Keras (for some reasons..)

What do we need for training ?

First: a lot (and I mean: a lot) of images for training and validation. In this example I’m using about 5500 images, 28*28 pixels , grayscale. I have 36 Classes (characters), so I have about 150 images of each character (letter or number). How can you generate such amount of images ? I’ll tell it later…

Second we need our (OSX) Terminal and a lot of python modules.

To make life easier, I’m using Anaconda to have a virtual environment for python.

Let’s start:

Download and install Anaconda .

After that you have to open a new Terminal Shell …


Stay in your home-directory and type (you can paste the line)


his will make a fresh Python (2.7) environment named “ocrTraining“.

Lets go to this environment:


Now we have to activate this environment:


Now let’s install all the necessary modules. One after each other ..

Some modules won’t install from ‘conda‘ – so I’m using instead  ‘pip

We’re using Keras v. 2.0.6 because of Apples CoreMLTools … (thats why we’re using  Python 2.7 ..)

Let’s go ahead:

As I mentioned, you need a lot of training-images;  you can download my dataset from my repository at Github


Extract the content (a lot of PNGs and one CSV) into

(inside this folder should ONLY PNG-Files and one CSV-File)

Now download the Python-script from my repository at Github

and put it into

Great !

Let’s start training –


On my MacBookPro (late 2016) one epoch (and we have five) will take about only 5 seconds – ok, it’s not such amount of images, BUT:

the accuracy is 0.999272726622 what means 99.92 % !!!

If it’s finished (an it should without big errors – some syntax-thingies) you’ll have at the end in your folder a file OCR.mlmodel what you can use direct in your project ( as in part1 and part2)


Part 2 – TextDetection as source for OCR

In my last post you saw how easy OCR (of course: simple classification) can be done.

But what about the images for classification ? Where do they come from ?

Fortunately Apple provides a reliable TextDetection in their Vision.framework.

Let’s give it a try !

First of all – start with an SingleView-App in Xcode (you know ?!)

The “heart” of our simple App will be the TextDetection (VNDetectTextRectanglesRequest)

We need a request

and a completion-handler:


But that’s not enough – we have something to do with request.result

Let’s take a look at the complete handler:


Let me explain, how it works:

  • let’ get the observations – if any (line 4)
  • we have to make a simple GAffineTransform because of different coordinate-systems (line 11)
  • loop through all our observations (line 14)
  • every detected character has a “box” – a boundingBox (CGRect) and four CGPoints (topLeft, topRight, bottomLeft and bottomRight) – the boundingBox is always upright but the four points form perhaps a trapezoid (“Quadrangle”)
  • we are make a transform for the boundingBox and make sure we are in our (source-) image (line 27)
  • now we are transforming the four points (line 34 – 37)
  • The next step is to crop the detected region from our (source) image (line 41) and rectify it using CIFilter (line 42 – 47)
  • the last step is to make a handler for the classification and feed the cropped and rectified image (charImage) to the ocrRequest (line 50 -58) as in my last post..

Beneath we are building an empty String (recognizedRegion) where we append our recognized characters and an String-Array (recognizedWords) where we append our recognizedRegion.

As result we have (hopefully ) a string-array with our detected an recognized text.

The complete project (with some sample-images) can you download in my repository at GitHub


Part 1 – How simple is it ? OCR without Tesseract ? On IOS ? Yeah !!

Let start with a simple design:

Open Xcode and create a new SingleView App. (If you don’t know this … hmm)

You’ll see this (depending on the name of your App) :


Not so much …

Let’s add some functions:

First wee need some imports:


Thats why we are using CoreML and Vision.

Now we have to create a request:


… and a completion handler:


If you have some compile-errors: You don’t have the (trained) model – you’ll find it in my repository

Now let’s load an image (from resource) and feed it to our request – we are doing this in viewDidLoad()


If you run this (simulator is enough – you will see the recognized character in your debug-view:

Simple – Uhhh !!


some remarks:

The trained model (by me) is ONLY for the font “Incosolata” – it’s a free font from Google (downloadable here  or here)

I trained characters “A” .. “Z” (only uppercase) and numbers “0”..”9″.

Later I’ll show you how to train (but that’s not that simple – for some reasons)

You can download the project from my repository at GitHub

Next Part is about TextDetection as Source for OCR

Stay tuned !


I’m not a native (english) speaker – please forgive me some errors / mis-spelling and so on.

I’ll show you how easy you can create a simple OCR using only Vision.framework and CoreML on IOS.

What you need:

  • a Mac (iMac, MacBook, MacBookPro – whatever)
  • Xcode – latest Release ( in my case: Version 9.0 (9A235))

The first two parts will run on IOS-Simulator in Xcode – so you don’t need a real device !