Connect to Amazon Polly for Text-To-Speech APIs
Amazon Polly is a text-to-speech (TTS) service offered by Amazon Web Services (AWS). It uses advanced deep learning technologies to convert written text into lifelike speech, enabling you to create applications with speech capabilities in various languages and voices.
With Amazon Polly, you can easily add voice interactions and accessibility features to your applications. The service supports a wide range of use cases, such as providing audio content for the visually impaired, enhancing e-learning experiences, creating interactive voice response (IVR) systems, and more.
Key features of Amazon Polly include:
-
Multiple Voices and Languages: Amazon Polly supports dozens of voices across various languages and dialects, giving you the flexibility to choose the most appropriate voice for your use case.
-
High-Quality Speech: Amazon Polly's neural and standard voices offer natural and realistic speech quality.
-
Speech Marks and Speech Synthesis Markup Language: Customize your speech output with Speech Synthesis Markup Language tags and obtain speech timing information with speech marks.
-
Scalable and Cost-Effective: Amazon Polly's pay-as-you-go pricing model makes it a cost-effective solution for adding speech capabilities to your applications.
In this section, you'll learn how to integrate Amazon Polly into your application using AWS Amplify, enabling you to leverage its powerful text-to-speech capabilities seamlessly.
Step 1 - Setup the project
Set up your project by following the instructions in the Quickstart guide.
Step 2 - Install Polly Libraries
We'll create a new API endpoint that'll use the the AWS SDK to call the Amazon Polly service. To install the Amazon Polly SDK, run the following command in your project's root folder:
npm add @aws-sdk/client-polly
Step 3 - Setup Storage
Create a file named amplify/storage/resource.ts
and add the following content to configure a storage resource:
import { defineStorage } from '@aws-amplify/backend';
export const storage = defineStorage({ name: 'predictions_gen2'});
Step 4 - Configure IAM Roles
To access Amazon Polly service, you need to configure the proper IAM policy for Lambda to utilize the desired feature effectively. Update the amplify/backend.ts
file with the following code to add the necessary permissions to a lambda's Role policy.
import { defineBackend } from "@aws-amplify/backend";import { auth } from "./auth/resource";import { data, convertTextToSpeech } from "./data/resource";import { Stack } from "aws-cdk-lib";import { PolicyStatement } from "aws-cdk-lib/aws-iam";import { storage } from "./storage/resource";
const backend = defineBackend({ auth, data, storage, convertTextToSpeech,});
backend.convertTextToSpeech.resources.lambda.addToRolePolicy( new PolicyStatement({ actions: ["polly:StartSpeechSynthesisTask"], resources: ["*"], }));
Step 5 - Configure the function handler
Define the function handler by creating a new file, amplify/data/convertTextToSpeech.ts
. This function converts text into speech using Amazon Polly and stores the synthesized speech as an MP3 file in an S3 bucket.
import { Schema } from "./resource";import { PollyClient, StartSpeechSynthesisTaskCommand,} from "@aws-sdk/client-polly";import { env } from "$amplify/env/convertTextToSpeech";
export const handler: Schema["convertTextToSpeech"]["functionHandler"] = async ( event) => { const client = new PollyClient(); const task = new StartSpeechSynthesisTaskCommand({ OutputFormat: "mp3", SampleRate: "8000", Text: event.arguments.text, TextType: "text", VoiceId: "Amy", OutputS3BucketName: env.PREDICTIONS_GEN_2_BUCKET_NAME, OutputS3KeyPrefix: "public/", }); const result = await client.send(task);
return ( result.SynthesisTask?.OutputUri?.replace( "https://s3.us-east-1.amazonaws.com/" + env.PREDICTIONS_GEN_2_BUCKET_NAME + "/public/", "" ) ?? "" );};
Step 6 - Define the custom mutation and function
In your amplify/data/resource.ts
file, define the function using defineFunction and then reference the function with your mutation using a.handler.function() as a handler.
import { type ClientSchema, a, defineData, defineFunction,} from "@aws-amplify/backend";
export const convertTextToSpeech = defineFunction({ entry: "./convertTextToSpeech.ts",});
const schema = a.schema({ Todo: a .model({ content: a.string(), }) .authorization(allow => [allow.publicApiKey()]), convertTextToSpeech: a .mutation() .arguments({ text: a.string().required(), }) .returns(a.string().required()) .authorization(allow => [allow.publicApiKey()]) .handler(a.handler.function(convertTextToSpeech)),});
export type Schema = ClientSchema<typeof schema>;
export const data = defineData({ schema, authorizationModes: { defaultAuthorizationMode: "apiKey", // API Key is used for allow.publicApiKey() rules apiKeyAuthorizationMode: { expiresInDays: 30, }, },});
Step 7 - Update Storage permissions
Customize your storage settings to manage access to various paths within your storage bucket. It's necessary to update the Storage resource to provide access to the convertTextToSpeech
resource. Modify the file amplify/storage/resource.ts
as shown below.
import { defineStorage } from "@aws-amplify/backend";import { convertTextToSpeech } from "../data/resource";
export const storage = defineStorage({ name: "predictions_gen2", access: (allow) => ({ "public/*": [ allow.resource(convertTextToSpeech).to(["write"]), allow.guest.to(["read", "write"]), ], }),});
Step 8 - Configure the frontend
Import and load the configuration file in your app. It's recommended you add the Amplify configuration step to your app's root entry point.
import { Amplify } from "aws-amplify";import outputs from "../amplify_outputs.json";
Amplify.configure(outputs);
Invoke the API
Example frontend code to create an audio buffer for playback using a text input.
import "./App.css";import { generateClient } from "aws-amplify/api";import type { Schema } from "../amplify/data/resource";import { getUrl } from "aws-amplify/storage";import { useState } from "react";
const client = generateClient<Schema>();
type PollyReturnType = Schema["convertTextToSpeech"]["returnType"];
function App() { const [src, setSrc] = useState(""); const [file, setFile] = useState<PollyReturnType>(""); return ( <div className="flex flex-col"> <button onClick={async () => { const { data, errors } = await client.mutations.convertTextToSpeech({ text: "Hello World!", });
if (!errors && data) { setFile(data); } else { console.log(errors); } }} > Synth </button> <button onClick={async () => { const res = await getUrl({ path: "public/" + file, });
setSrc(res.url.toString()); }} > Fetch audio </button> <a href={src}>Get audio file</a> </div> );}
export default App;