Page updated Jan 16, 2024

Identify text

Amplify iOS v1 is now in Maintenance Mode until May 31st, 2024. This means that we will continue to include updates to ensure compatibility with backend services and security. No new features will be introduced in v1.

Please use the latest version (v2) of Amplify Library for Swift to get started.

If you are currently using v1, follow these instructions to upgrade to v2.

Amplify libraries should be used for all new cloud connected applications. If you are currently using the AWS Mobile SDK for iOS, you can access the documentation here.

The following APIs will allow you to identify text (words, tables, pages from a book) from an image.

For identifying text on iOS we use both AWS backend services as well as Apple's on-device Core ML Vision Framework to provide you with the most accurate results. If your device is offline, we will return results only from Core ML. On the other hand, if you are able to connect to AWS Services, we will return a unioned result from both the service and Core ML. Switching between backend services and Core ML is done automatically without any additional configuration required.

Set up the backend

If you haven't already done so, run amplify init inside your project and then amplify add auth (we recommend selecting the default configuration).

Run amplify add predictions, then use the following answers:

1? Please select from one of the categories below
2❯ Identify
3 Convert
4 Interpret
5 Infer
6 Learn More
7
8? What would you like to identify? (Use arrow keys)
9❯ Identify Text
10 Identify Entities
11 Identify Labels
12
13? Provide a friendly name for your resource
14 <Enter a friendly name here>
15
16? Would you also like to identify documents?
17 <Enter 'y'>
18
19? Who should have access?
20 Auth users only
21❯ Auth and Guest users

Run amplify push to create the resources in the cloud.

Identify text from image

Amplify will make calls to both Amazon Textract and Rekognition depending on the type of text you are looking to identify (i.e. image or document).

If you are detecting text from an image you would send in .plain as your text format as shown below. Using .plain with PredictionsIdentifyRequest.Options() combines results from on device results from Core ML and AWS services to yield more accurate results.

1func detectText(_ image: URL, completion: @escaping ([IdentifiedWord]) -> Void) {
2 Amplify.Predictions.identify(type: .detectText(.plain), image: image) { event in
3 switch event {
4 case let .success(result):
5 let data = result as! IdentifyTextResult
6 completion(data.words!)
7 case let .failure(error):
8 print(error)
9 }
10 }
11}
1func detectText(_ image: URL) -> AnyCancellable {
2 Amplify.Predictions.identify(type: .detectText(.plain), image: image)
3 .resultPublisher
4 .sink {
5 if case let .failure(error) = $0 {
6 print(error)
7 }
8 }
9 receiveValue: { result in
10 let data = result as! IdentifyTextResult
11 print(data.words)
12 }
13}

Note: Bounding boxes in IdentifyTextResult are returned as ratios. If you would like to place bounding boxes on individual recognized words that appear in the image, use the following method to calculate a frame for a single bounding box.

1@IBAction func didTapButton(_ sender: Any) {
2 let imageURL = URL(string: "https://imageWithText")
3 let data = try? Data(contentsOf: imageURL!)
4 let image = UIImage(data: data!)
5 let imageView = UIImageView(image: image)
6 self.view.addSubview(imageView)
7
8 detectText(imageURL!, completion: { words in
9 let word = words.first!
10 DispatchQueue.main.async {
11 let transform = CGAffineTransform(scaleX: imageView.frame.size.width,
12 y: imageView.frame.size.height)
13 let boundingBox = UIView(frame: word.boundingBox.applying(transform))
14 boundingBox.backgroundColor = .red
15 imageView.addSubview(boundingBox)
16 }
17 })
18}

Additionally it's important to note that Rekognition places (0,0) at the top left and Core ML places (0,0) at the bottom left. In order to handle this issue, you have flipped the y axis of the CoreML bounding box for you since iOS starts (0,0) from the top left.

To get results that utilize on-device capabilities (Core ML), without combining results from the backend, you can use the following to pass into the options argument of the Amplify.Predictions.identify function.

1let options = PredictionsIdentifyRequest.Options(defaultNetworkPolicy: .offline, pluginOptions: nil)

Identify text in a document

Sending in .form or .table or .all will do document analysis as well as text detection to detect tables and forms in a document. See below for an example with .form.

1func detectText(_ image: URL) {
2 Amplify.Predictions.identify(type: .detectText(.form), image: image) { event in
3 switch event {
4 case let .success(result):
5 let data = result as! IdentifyDocumentTextResult
6 print(data)
7 case let .failure(error):
8 print(error)
9 }
10 }
11}
1func detectText(_ image: URL) -> AnyCancellable {
2 Amplify.Predictions.identify(type: .detectText(.form), image: image)
3 .resultPublisher
4 .sink {
5 if case let .failure(error) = $0 {
6 print(error)
7 }
8 }
9 receiveValue: { result in
10 let data = result as! IdentifyDocumentTextResult
11 print(data)
12 }
13}