Part 2 – TextDetection as source for OCR

In my last post you saw how easy OCR (of course: simple classification) can be done.

But what about the images for classification ? Where do they come from ?

Fortunately Apple provides a reliable TextDetection in their Vision.framework.

Let’s give it a try !

First of all – start with an SingleView-App in Xcode (you know ?!)

The “heart” of our simple App will be the TextDetection (VNDetectTextRectanglesRequest)

We need a request

and a completion-handler:


But that’s not enough – we have something to do with request.result

Let’s take a look at the complete handler:


Let me explain, how it works:

  • let’ get the observations – if any (line 4)
  • we have to make a simple GAffineTransform because of different coordinate-systems (line 11)
  • loop through all our observations (line 14)
  • every detected character has a “box” – a boundingBox (CGRect) and four CGPoints (topLeft, topRight, bottomLeft and bottomRight) – the boundingBox is always upright but the four points form perhaps a trapezoid (“Quadrangle”)
  • we are make a transform for the boundingBox and make sure we are in our (source-) image (line 27)
  • now we are transforming the four points (line 34 – 37)
  • The next step is to crop the detected region from our (source) image (line 41) and rectify it using CIFilter (line 42 – 47)
  • the last step is to make a handler for the classification and feed the cropped and rectified image (charImage) to the ocrRequest (line 50 -58) as in my last post..

Beneath we are building an empty String (recognizedRegion) where we append our recognized characters and an String-Array (recognizedWords) where we append our recognizedRegion.

As result we have (hopefully ) a string-array with our detected an recognized text.

The complete project (with some sample-images) can you download in my repository at GitHub


Leave a Reply

Your email address will not be published. Required fields are marked *