Amplify has re-imagined the way frontend developers build fullstack applications. Develop and deploy without the hassle.

Page updated Apr 29, 2024

Incorporate machine learning

You are currently viewing the new GraphQL transformer v2 docs Looking for legacy docs?

Amplify allows you to identify text on an image, identify labels on an image, translate text, and synthesize speech from text with the @predictions directive.

Note: The @predictions directive requires a S3 storage bucket configured via amplify add storage or set the predictionsBucket property when using CDK.

Identify text on an image

To configure text recognition on an image use the identifyText action in the @predictions directive.

type Query {
recognizeTextFromImage: String @predictions(actions: [identifyText])
}

In your GraphQL query, can pass in a S3 key for the image. At the moment, this directive works only with objects located within the public/ folder of your S3 bucket. The public/ prefix is automatically added to the key input. For instance, in the example below, public/myimage.jpg will be used as the input.

query RecognizeTextFromImage($input: RecognizeTextFromImageInput!) {
recognizeTextFromImage(input: { identifyText: { key: "myimage.jpg" } })
}

Identify labels on an image

To configure label recognition on an image use the identifyLabels action in the @predictions directive.

type Query {
recognizeLabelsFromImage: [String] @predictions(actions: [identifyLabels])
}

In your GraphQL query, you can pass in a S3 key for the image. At the moment, this directive works only with objects located within public/ folder in your S3 bucket. The public/ prefix is automatically added to the key input. For instance, in the example below, public/myimage.jpg will be used as the input.

The query below will return a list of identified labels. Review Detecting Labels in the Amazon Rekognition documentation for the full list of supported labels.

query RecognizeLabelsFromImage($input: RecognizeLabelsFromImageInput!) {
recognizeLabelsFromImage(input: { identifyLabels: { key: "myimage.jpg" } })
}

Translate text

To configure text translation use the identifyLabels action in the @predictions directive.

type Query {
translate: String @predictions(actions: [translateText])
}

The query below will return the translated string. Populate the sourceLanguage and targetLanguage parameters with one of the Supported Language Codes. Pass in the text to translate via the text parameter.

query TranslateText($input: TranslateTextInput!) {
translate(
input: {
translateText: {
sourceLanguage: "en"
targetLanguage: "de"
text: "Translate me"
}
}
)
}

Synthesize speech from text

To configure Text-to-Speech synthesis use the convertTextToSpeech action in the @predictions directive.

type Query {
textToSpeech: String @predictions(actions: [convertTextToSpeech])
}

The query below will return a presigned URL with the synthesized speech. Populate the voiceID parameter with one of the Supported Voice IDs. Pass in the text to synthesize via the text parameter.

query ConvertTextToSpeech($input: ConvertTextToSpeechInput!) {
textToSpeech(
input: {
convertTextToSpeech: {
voiceID: "Nicole"
text: "Hello from AWS Amplify!"
}
}
)
}

Combining Predictions actions

You can also combine multiple Predictions actions together into a sequence. The following action sequences are supported:

  • identifyText -> translateText -> convertTextToSpeech
  • identifyLabels -> translateText -> convertTextToSpeech
  • translateText -> convertTextToSpeech

In the example below, speakTranslatedImageText identifies text from an image, then translates it into another language, and finally converts the translated text to speech.

type Query {
speakTranslatedImageText: String
@predictions(actions: [identifyText, translateText, convertTextToSpeech])
}

An example of that query will look like:

query SpeakTranslatedImageText($input: SpeakTranslatedImageTextInput!) {
speakTranslatedImageText(
input: {
identifyText: { key: "myimage.jpg" }
translateText: { sourceLanguage: "en", targetLanguage: "es" }
convertTextToSpeech: { voiceID: "Conchita" }
}
)
}

A code example of this using the JS Library is shown below:

import React, { useState } from 'react';
import { Amplify } from 'aws-amplify';
import { uploadData, getUrl } from 'aws-amplify/storage';
import { generateClient } from 'aws-amplify/api';
import config from './amplifyconfiguration.json';
import { speakTranslatedImageText } from './graphql/queries';
/* Configure Exports */
Amplify.configure(config);
const client = generateClient();
function SpeakTranslatedImage() {
const [src, setSrc] = useState('');
const [img, setImg] = useState('');
function putS3Image(event) {
const file = event.target.files[0];
uploadData({
key: file.name,
data: file
})
.result.then(async (result) => {
setSrc(await speakTranslatedImageTextOP(result.key));
setImg((await getUrl({ key: result.key })).url.toString());
})
.catch((err) => console.log(err));
}
return (
<div className="Text">
<div>
<h3>Upload Image</h3>
<input
type="file"
accept="image/jpeg"
onChange={(event) => {
putS3Image(event);
}}
/>
<br />
{img && <img src={img}></img>}
{src && (
<div>
<audio id="audioPlayback" controls>
<source id="audioSource" type="audio/mp3" src={src} />
</audio>
</div>
)}
</div>
</div>
);
}
async function speakTranslatedImageTextOP(key) {
const inputObj = {
translateText: {
sourceLanguage: 'en',
targetLanguage: 'es'
},
identifyText: { key },
convertTextToSpeech: { voiceID: 'Conchita' }
};
const response = await client.graphql({
query: speakTranslatedImageText,
variables: { input: inputObj }
});
return response.data.speakTranslatedImageText;
}
function App() {
return (
<div className="App">
<h1>Speak Translated Image</h1>
<SpeakTranslatedImage />
</div>
);
}
export default App;

How it works

Definition of the @predictions directive:

directive @predictions(actions: [PredictionsActions!]!) on FIELD_DEFINITION
enum PredictionsActions {
identifyText # uses Amazon Rekognition to detect text
identifyLabels # uses Amazon Rekognition to detect labels
convertTextToSpeech # uses Amazon Polly in a lambda to output a presigned url to synthesized speech
translateText # uses Amazon Translate to translate text from source to target language
}

@predictions creates resources to communicate with Amazon Rekognition, Translate, and Polly. For each action the following is created:

  • IAM Policy for each service (e.g. Amazon Rekognition detectText Policy)
  • An AppSync VTL function
  • An AppSync DataSource

Finally, a pipeline resolver is created for the query or field. The pipeline resolver is composed of AppSync functions which are defined by the action list provided in the directive.