Conversion Of Text-To-Speech & Speech-To-Text Using AWS-Cloud Services in Python

Today I was stuck finding a solution on a very specific problem: find a way to convert Text-to-Speech and Speech-to-Text at a time and also to store the resultant output in S3 Bucket.

As you probably already know,Amazon Polly helps in converting Text-to-speech and Amazon Transcribe helps in converting Speech-to-Text and after conversions the resultant outputs will be in particular S3 Buckets.Using these AWS services Let’s find a solution…!!!

AMAZON POLLY: Amazon Polly is a cloud service that converts text into lifelike speech.You can use Amazon Polly to develop applications that increase engagement and accessibility.Amazon Polly supports multiple languages.

Features:

High quality

Low latency

Support for a large portfolio of languages and voices

Cost-effective

AMAZON TRANSCRIBE: Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech. You can also send a live audio stream to Amazon Transcribe and receive a stream of transcripts in real time.

Features:

Easy-to-Read Transcriptions

Timestamp Generation

Recognize Multiple Speakers

Improving Customer Service

Conversion of Text-to-Speech using Amazon Polly and Speech-to-Text using Amazon Transcribe:

Architecture:

Actually, user sends Text from Lambda and then it is integrated with Amazon polly so the Text is converted to Speech (.mp3 file) and stored in S3 bucket.And then Amazon polly generates a ID and URL is sent to SQS. Using another Lambda we will pull the ID and URL and also takes the audio file from S3 bucket and sends Audio file to Amazon Transcribe.It helps in converting Speech to Text and stores the .txt file in another S3 Bucket.

NOTE: In this we are using Scheduler Trigger for checking the message availability in SQS because generated S3 object is not available to Amazon Transcribe. So we are pushing the messages in SQS. In the mean time the S3 object is available.Using another lambda we scheduled a cron expression and checks for message avaliblity.

Steps forConversion of Text-to-Speech using Amazon Polly and Speech-to-Text using Amazon Transcribe: