Part 2 – TextDetection as source for OCR

In my last post you saw how easy OCR (of course: simple classification) can be done.

But what about the images for classification ? Where do they come from ?

Fortunately Apple provides a reliable TextDetection in their Vision.framework.

Let’s give it a try !

First of all – start with an SingleView-App in Xcode (you know ?!)

The “heart” of our simple App will be the TextDetection (VNDetectTextRectanglesRequest)

We need a request

and a completion-handler:


But that’s not enough – we have something to do with request.result

Let’s take a look at the complete handler:


Let me explain, how it works:

  • let’ get the observations – if any (line 4)
  • we have to make a simple GAffineTransform because of different coordinate-systems (line 11)
  • loop through all our observations (line 14)
  • every detected character has a “box” – a boundingBox (CGRect) and four CGPoints (topLeft, topRight, bottomLeft and bottomRight) – the boundingBox is always upright but the four points form perhaps a trapezoid (“Quadrangle”)
  • we are make a transform for the boundingBox and make sure we are in our (source-) image (line 27)
  • now we are transforming the four points (line 34 – 37)
  • The next step is to crop the detected region from our (source) image (line 41) and rectify it using CIFilter (line 42 – 47)
  • the last step is to make a handler for the classification and feed the cropped and rectified image (charImage) to the ocrRequest (line 50 -58) as in my last post..

Beneath we are building an empty String (recognizedRegion) where we append our recognized characters and an String-Array (recognizedWords) where we append our recognizedRegion.

As result we have (hopefully ) a string-array with our detected an recognized text.

The complete project (with some sample-images) can you download in my repository at GitHub


Part 1 – How simple is it ? OCR without Tesseract ? On IOS ? Yeah !!

Let start with a simple design:

Open Xcode and create a new SingleView App. (If you don’t know this … hmm)

You’ll see this (depending on the name of your App) :


Not so much …

Let’s add some functions:

First wee need some imports:


Thats why we are using CoreML and Vision.

Now we have to create a request:


… and a completion handler:


If you have some compile-errors: You don’t have the (trained) model – you’ll find it in my repository

Now let’s load an image (from resource) and feed it to our request – we are doing this in viewDidLoad()


If you run this (simulator is enough – you will see the recognized character in your debug-view:

Simple – Uhhh !!


some remarks:

The trained model (by me) is ONLY for the font “Incosolata” – it’s a free font from Google (downloadable here  or here)

I trained characters “A” .. “Z” (only uppercase) and numbers “0”..”9″.

Later I’ll show you how to train (but that’s not that simple – for some reasons)

You can download the project from my repository at GitHub

Next Part is about TextDetection as Source for OCR

Stay tuned !