Text Detection using the Vision Framework

With the help of vision, the library developer can detect the text in the image. It can be static and dynamic data for detecting text using the library. It can be identified and recognize the text then the developer can enable to rectangular box around the text using the frontend code so that it will be visible for the users in a proper manner. Vision library allows multiple things for the developers and user point of view. There are some other technologies, and language which supports these features but it is not compatible with iOS native, and it will be very difficult to integrate with the iPhone app. Due to, that Apple launched the new library to integrate with the iOS app side and to provide multiple features using vision library.

Before implementation text detection using vision library, the developer should have good knowledge about image capture, some other librarylike AVFoundation, Core Images, and image picker. There is some other open source library available in the market, but accuracy is not proper due to that it is not used so frequently in the market for iPhone app development but using vision library text can be detected by the static way and by the live camera also. When the user captures, the image then it will detect the text from the images and fill the rectangular box around the text so that it can identify easily.If the quality of the image is poor, the focus text can hurt detection accuracy.Developers or users do not get acceptable results from the library, so we need to ask to recapture the image again. If users are detected text in a real-time application, then it might also want to consider the overall dimensions of the input images. A developer needs to choose small images for processing in a faster manner and to reduce latency. Once the image captured properly, then the developer needs to ensure that text should occupy as much as possible. When vision library used then it can capture live video feed, detecting text in the image using VNTextObservation, then the developer needs to draw the different bounding box or rectangle by VNRectangle observation also.

The developer should be aware of the memory requirements of the library and how to manage it to convert image to text.

Implementation part for Text Detection:-

Step1:-

Developer need to set delegate method for AVCapture, AVFoundation, AVCaptureVideoDataOutputSampleBufferDelegate in the controller file

Step 2:-

The developer needs to set up a camera session and then capture some image, for which the text needs to be recognized. Different levels included on the view controller.

fileprivate func CapturingSessionforTextDetetction() { let captureSession1 = AVCaptureSession() captureSession1.sessionPreset = AVCaptureSession.Preset.photo let backCameraenable = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .back)! let input = try! AVCaptureDeviceInput(device: backCamera) captureSession1.addInput(input) let cameraPreviewLayer = AVCaptureVideoPreviewLayer(session: captureSession) setLayerAsBackground(layer: cameraPreviewLayer) let videoOutput = AVCaptureVideoDataOutput() videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "sample buffer delegate")) videoOutput.recommendedVideoSettings(forVideoCodecType: .jpeg, assetWriterOutputFileType: .mp4) captureSession1.addOutput(videoOutput) captureSession1.sessionPreset = .high captureSession1.startRunning() }

Step 3:-

A developer needs to capture the images using camera control. Once the capture is done, then the frame can buffer and pass through the delegate.

func captureOutputCamera(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { connection.videoOrientation = .portrait guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { fatalError("pixel buffer is nil") } let ciImage = CIImage(cvPixelBuffer: pixelBuffer) let context = CIContext(options: nil) guard let cgImage = context.createCGImage(ciImage, from: ciImage.extent) else { fatalError("cg image") } let uiImage = UIImage(cgImage: cgImage, scale: 1.0, orientation: .leftMirrored) let overlay_image:UIImage = self.textDetectionRecognize(dectect_image: uiImage, display_image_view: self.CameraOverlayView) DispatchQueue.main.sync { self.CameraOverlayView.image = overlay_image } }

Step 4:-

Developer needs to send a request to the library to get detected rectangle within the captured frame. The VNImageRequestHandler accept only specific image data like CVPixelBuffer, CGImage and image Data. The developer needs to convert from CMSampleBuffer to CGImage via CVImageBuffer then data will be passing the request to VNDetectText- RectanglesRequest.

func textDetectionRecognize(dectect_image:UIImage, display_image_view:UIImageView)->UIImage{ let handler:VNImageRequestHandler = VNImageRequestHandler.init(cgImage: (dectect_image.cgImage)!) var imageResult:UIImage = UIImage.init(); let vnrequest:VNDetectTextRectanglesRequest = VNDetectTextRectanglesRequest.init(completionHandler: { (vnrequest, error) in if( (error) != nil){ print("Got Error In Run Text Dectect Request"); }else{ imageResult = self.drawRectangleForTextDectectRecognize(image: dectect_image,results: vnrequest.results as! Array<VNTextObservation> ) } }) vnrequest.reportCharacterBoxes = true do { try handler.perform([vnrequest]) return imageResult; } catch { return imageResult; } }

Step 5:-

A developer needs to add rectangle points for text detect. So VNTextObservation formed an array of character and each box of type VNRectangle-Observation in which is belong to individual character bounding boxes found within the observation’s bounding area. Once the user captures the image, then detect text and form the rectangle across the bounding area.

func drawingRectangleFromTextDectectRecognize(image: UIImage, results:Array<VNTextObservation> ) -> UIImage { let renderer = UIGraphicsImageRenderer(size: image.size) var t:CGAffineTransform = CGAffineTransform.identity; t = t.scaledBy( x: image.size.width, y: -image.size.height); t = t.translatedBy(x: 0, y: -1 ); let img = renderer.image { ctx in for item in results { let TextObservation:VNTextObservation = item ctx.cgContext.setFillColor(UIColor.clear.cgColor) ctx.cgContext.setStrokeColor(UIColor.green.cgColor) ctx.cgContext.setLineWidth(1) ctx.cgContext.addRect(item.boundingBox.applying(t)) ctx.cgContext.drawPath(using: .fillStroke) for item_2 in TextObservation.characterBoxes!{ let RectangleObservation:VNRectangleObservation = item_2 ctx.cgContext.setFillColor(UIColor.clear.cgColor) ctx.cgContext.setStrokeColor(UIColor.red.cgColor) ctx.cgContext.setLineWidth(1) ctx.cgContext.addRect(RectangleObservation.boundingBox.applying(t)) ctx.cgContext.drawPath(using: .fillStroke) } } } return img }

Brief about Text Detection using the Vision Framework

Implementation part for Text Detection:-

Step1:-

Step 2:-

Step 3:-

Step 4:-

Step 5:-

Step 6:-

After implementation text will detect like below image from the application.

Talk to our experts about your mobile app requirements