Preview: AWS Amplify's new code-first DX (Gen 2)

The next generation of Amplify's backend building experience with a TypeScript-first DX.

Incorporate machine learning

You are currently viewing the new GraphQL transformer v2 docs Looking for legacy docs?

Amplify allows you to identify text on an image, identify labels on an image, translate text, and synthesize speech from text with the @predictions directive.

Note: The @predictions directive requires a S3 storage bucket configured via amplify add storage or set the predictionsBucket property when using CDK.

Identify text on an image

To configure text recognition on an image use the identifyText action in the @predictions directive.

1type Query {
2  recognizeTextFromImage: String @predictions(actions: [identifyText])
3}

In your GraphQL query, can pass in a S3 key for the image. At the moment, this directive works only with objects located within the public/ folder of your S3 bucket. The public/ prefix is automatically added to the key input. For instance, in the example below, public/myimage.jpg will be used as the input.

1query RecognizeTextFromImage($input: RecognizeTextFromImageInput!) {
2  recognizeTextFromImage(input: { identifyText: { key: "myimage.jpg" } })
3}

Identify labels on an image

To configure label recognition on an image use the identifyLabels action in the @predictions directive.

1type Query {
2  recognizeLabelsFromImage: [String] @predictions(actions: [identifyLabels])
3}

In your GraphQL query, you can pass in a S3 key for the image. At the moment, this directive works only with objects located within public/ folder in your S3 bucket. The public/ prefix is automatically added to the key input. For instance, in the example below, public/myimage.jpg will be used as the input.

The query below will return a list of identified labels. Review Detecting Labels in the Amazon Rekognition documentation for the full list of supported labels.

1query RecognizeLabelsFromImage($input: RecognizeLabelsFromImageInput!) {
2  recognizeLabelsFromImage(input: { identifyLabels: { key: "myimage.jpg" } })
3}

Translate text

To configure text translation use the identifyLabels action in the @predictions directive.

1type Query {
2  translate: String @predictions(actions: [translateText])
3}

The query below will return the translated string. Populate the sourceLanguage and targetLanguage parameters with one of the Supported Language Codes. Pass in the text to translate via the text parameter.

1query TranslateText($input: TranslateTextInput!) {
2  translate(
3    input: {
4      translateText: {
5        sourceLanguage: "en"
6        targetLanguage: "de"
7        text: "Translate me"
8      }
9    }
10  )
11}

Synthesize speech from text

To configure Text-to-Speech synthesis use the convertTextToSpeech action in the @predictions directive.

1type Query {
2  textToSpeech: String @predictions(actions: [convertTextToSpeech])
3}

The query below will return a presigned URL with the synthesized speech. Populate the voiceID parameter with one of the Supported Voice IDs. Pass in the text to synthesize via the text parameter.

1query ConvertTextToSpeech($input: ConvertTextToSpeechInput!) {
2  textToSpeech(
3    input: {
4      convertTextToSpeech: {
5        voiceID: "Nicole"
6        text: "Hello from AWS Amplify!"
7      }
8    }
9  )
10}

Combining Predictions actions

You can also combine multiple Predictions actions together into a sequence. The following action sequences are supported:

identifyText -> translateText -> convertTextToSpeech
identifyLabels -> translateText -> convertTextToSpeech
translateText -> convertTextToSpeech

In the example below, speakTranslatedImageText identifies text from an image, then translates it into another language, and finally converts the translated text to speech.

1type Query {
2  speakTranslatedImageText: String
3    @predictions(actions: [identifyText, translateText, convertTextToSpeech])
4}

An example of that query will look like:

1query SpeakTranslatedImageText($input: SpeakTranslatedImageTextInput!) {
2  speakTranslatedImageText(
3    input: {
4      identifyText: { key: "myimage.jpg" }
5      translateText: { sourceLanguage: "en", targetLanguage: "es" }
6      convertTextToSpeech: { voiceID: "Conchita" }
7    }
8  )
9}

A code example of this using the JS Library is shown below:

1import React, { useState } from 'react';
2import { Amplify } from 'aws-amplify';
3import { uploadData, getUrl } from 'aws-amplify/storage';
4import { generateClient } from 'aws-amplify/api';
5import config from './amplifyconfiguration.json';
6import { speakTranslatedImageText } from './graphql/queries';
7
8/* Configure Exports */
9Amplify.configure(config);
10
11const client = generateClient();
12
13function SpeakTranslatedImage() {
14  const [src, setSrc] = useState('');
15  const [img, setImg] = useState('');
16
17  function putS3Image(event) {
18    const file = event.target.files[0];
19    uploadData({
20      key: file.name,
21      data: file
22    })
23      .result.then(async (result) => {
24        setSrc(await speakTranslatedImageTextOP(result.key));
25        setImg((await getUrl({ key: result.key })).url.toString());
26      })
27      .catch((err) => console.log(err));
28  }
29
30  return (
31    <div className="Text">
32      <div>
33        <h3>Upload Image</h3>
34        <input
35          type="file"
36          accept="image/jpeg"
37          onChange={(event) => {
38            putS3Image(event);
39          }}
40        />
41        <br />
42        {img && <img src={img}></img>}
43        {src && (
44          <div>
45            <audio id="audioPlayback" controls>
46              <source id="audioSource" type="audio/mp3" src={src} />
47            </audio>
48          </div>
49        )}
50      </div>
51    </div>
52  );
53}
54
55async function speakTranslatedImageTextOP(key) {
56  const inputObj = {
57    translateText: {
58      sourceLanguage: 'en',
59      targetLanguage: 'es'
60    },
61    identifyText: { key },
62    convertTextToSpeech: { voiceID: 'Conchita' }
63  };
64  const response = await client.graphql({
65    query: speakTranslatedImageText,
66    variables: { input: inputObj }
67  });
68  return response.data.speakTranslatedImageText;
69}
70
71function App() {
72  return (
73    <div className="App">
74      <h1>Speak Translated Image</h1>
75      <SpeakTranslatedImage />
76    </div>
77  );
78}
79export default App;

How it works

Definition of the @predictions directive:

1directive @predictions(actions: [PredictionsActions!]!) on FIELD_DEFINITION
2enum PredictionsActions {
3  identifyText # uses Amazon Rekognition to detect text
4  identifyLabels # uses Amazon Rekognition to detect labels
5  convertTextToSpeech # uses Amazon Polly in a lambda to output a presigned url to synthesized speech
6  translateText # uses Amazon Translate to translate text from source to target language
7}

@predictions creates resources to communicate with Amazon Rekognition, Translate, and Polly. For each action the following is created:

IAM Policy for each service (e.g. Amazon Rekognition detectText Policy)
An AppSync VTL function
An AppSync DataSource

Finally, a pipeline resolver is created for the query or field. The pipeline resolver is composed of AppSync functions which are defined by the action list provided in the directive.

JavaScript, Android, Swift, and Flutter client code generation

Evolving GraphQL schemas