Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Name:
interface
Value:
Amplify has re-imagined the way frontend developers build fullstack applications. Develop and deploy without the hassle.

Page updated May 8, 2024

Connect to Amazon Polly for Text-To-Speech APIs

Amazon Polly is a text-to-speech (TTS) service offered by Amazon Web Services (AWS). It uses advanced deep learning technologies to convert written text into lifelike speech, enabling you to create applications with speech capabilities in various languages and voices.

With Amazon Polly, you can easily add voice interactions and accessibility features to your applications. The service supports a wide range of use cases, such as providing audio content for the visually impaired, enhancing e-learning experiences, creating interactive voice response (IVR) systems, and more.

Key features of Amazon Polly include:

  • Multiple Voices and Languages: Amazon Polly supports dozens of voices across various languages and dialects, giving you the flexibility to choose the most appropriate voice for your use case.

  • High-Quality Speech: Amazon Polly's neural and standard voices offer natural and realistic speech quality.

  • Speech Marks and Speech Synthesis Markup Language: Customize your speech output with Speech Synthesis Markup Language tags and obtain speech timing information with speech marks.

  • Scalable and Cost-Effective: Amazon Polly's pay-as-you-go pricing model makes it a cost-effective solution for adding speech capabilities to your applications.

In this section, you'll learn how to integrate Amazon Polly into your application using AWS Amplify, enabling you to leverage its powerful text-to-speech capabilities seamlessly.

Step 1 - Setup the project

Set up your project by following the instructions in the Quickstart guide.

Step 2 - Install Polly Libraries

We'll create a new API endpoint that'll use the the AWS SDK to call the Amazon Polly service. To install the Amazon Polly SDK, run the following command in your project's root folder:

Terminal
npm add @aws-sdk/client-polly

Step 3 - Setup Storage

Create a file named amplify/storage/resource.ts and add the following content to configure a storage resource:

amplify/storage/resource.ts
import { defineStorage } from '@aws-amplify/backend';
export const storage = defineStorage({
name: 'predictions_gen2'
});

Step 4 - Configure IAM Roles

To access Amazon Polly service, you need to configure the proper IAM policy for Lambda to utilize the desired feature effectively. Update the amplify/backend.ts file with the following code to add the necessary permissions to a lambda's Role policy.

amplify/backend.ts
import { defineBackend } from "@aws-amplify/backend";
import { auth } from "./auth/resource";
import { data, convertTextToSpeech } from "./data/resource";
import { Stack } from "aws-cdk-lib";
import { PolicyStatement } from "aws-cdk-lib/aws-iam";
import { storage } from "./storage/resource";
const backend = defineBackend({
auth,
data,
storage,
convertTextToSpeech,
});
backend.convertTextToSpeech.resources.lambda.addToRolePolicy(
new PolicyStatement({
actions: ["polly:StartSpeechSynthesisTask"],
resources: ["*"],
})
);

Step 5 - Configure the function handler

Define the function handler by creating a new file, amplify/data/convertTextToSpeech.ts. This function converts text into speech using Amazon Polly and stores the synthesized speech as an MP3 file in an S3 bucket.

amplify/data/convertTextToSpeech.ts
import { Schema } from "./resource";
import {
PollyClient,
StartSpeechSynthesisTaskCommand,
} from "@aws-sdk/client-polly";
import { env } from "$amplify/env/convertTextToSpeech";
export const handler: Schema["convertTextToSpeech"]["functionHandler"] = async (
event
) => {
const client = new PollyClient();
const task = new StartSpeechSynthesisTaskCommand({
OutputFormat: "mp3",
SampleRate: "8000",
Text: event.arguments.text,
TextType: "text",
VoiceId: "Amy",
OutputS3BucketName: env.PREDICTIONS_GEN_2_BUCKET_NAME,
OutputS3KeyPrefix: "public/",
});
const result = await client.send(task);
return (
result.SynthesisTask?.OutputUri?.replace(
"https://s3.us-east-1.amazonaws.com/" +
env.PREDICTIONS_GEN_2_BUCKET_NAME +
"/public/",
""
) ?? ""
);
};

Step 6 - Define the custom mutation and function

In your amplify/data/resource.ts file, define the function using defineFunction and then reference the function with your mutation using a.handler.function() as a handler.

amplify/data/resource.ts
import {
type ClientSchema,
a,
defineData,
defineFunction,
} from "@aws-amplify/backend";
export const convertTextToSpeech = defineFunction({
entry: "./convertTextToSpeech.ts",
});
const schema = a.schema({
Todo: a
.model({
content: a.string(),
})
.authorization(allow => [allow.publicApiKey()]),
convertTextToSpeech: a
.mutation()
.arguments({
text: a.string().required(),
})
.returns(a.string().required())
.authorization(allow => [allow.publicApiKey()])
.handler(a.handler.function(convertTextToSpeech)),
});
export type Schema = ClientSchema<typeof schema>;
export const data = defineData({
schema,
authorizationModes: {
defaultAuthorizationMode: "apiKey",
// API Key is used for allow.publicApiKey() rules
apiKeyAuthorizationMode: {
expiresInDays: 30,
},
},
});

NOTE: At least one query is required for a schema to be valid. Otherwise, deployments will fail a schema error. The Amplify Data schema is auto-generated with a Todo model and corresponding queries under the hood. You can leave the Todo model in the schema until you add the first custom query to the schema in the next steps.

Step 7 - Update Storage permissions

Customize your storage settings to manage access to various paths within your storage bucket. It's necessary to update the Storage resource to provide access to the convertTextToSpeech resource. Modify the file amplify/storage/resource.ts as shown below.

amplify/storage/resource.ts
import { defineStorage } from "@aws-amplify/backend";
import { convertTextToSpeech } from "../data/resource";
export const storage = defineStorage({
name: "predictions_gen2",
access: (allow) => ({
"public/*": [
allow.resource(convertTextToSpeech).to(["write"]),
allow.guest.to(["read", "write"]),
],
}),
});

Step 8 - Configure the frontend

Import and load the configuration file in your app. It's recommended you add the Amplify configuration step to your app's root entry point.

main.tsx
import { Amplify } from "aws-amplify";
import outputs from "../amplify_outputs.json";
Amplify.configure(outputs);

Invoke the API

Example frontend code to create an audio buffer for playback using a text input.

App.tsx
import "./App.css";
import { generateClient } from "aws-amplify/api";
import type { Schema } from "../amplify/data/resource";
import { getUrl } from "aws-amplify/storage";
import { useState } from "react";
const client = generateClient<Schema>();
type PollyReturnType = Schema["convertTextToSpeech"]["returnType"];
function App() {
const [src, setSrc] = useState("");
const [file, setFile] = useState<PollyReturnType>("");
return (
<div className="flex flex-col">
<button
onClick={async () => {
const { data, errors } = await client.mutations.convertTextToSpeech({
text: "Hello World!",
});
if (!errors && data) {
setFile(data);
} else {
console.log(errors);
}
}}
>
Synth
</button>
<button
onClick={async () => {
const res = await getUrl({
path: "public/" + file,
});
setSrc(res.url.toString());
}}
>
Fetch audio
</button>
<a href={src}>Get audio file</a>
</div>
);
}
export default App;