Main menu

Post navigation

“I Heard it Through the Grape Van”

With the background set in the previous post for what we’ll be aiming to achieve, it’s time to move forward with getting things into gear.
Todays post covers how to upload the media file to s3, create the Transcribe job to process it, and finally download the results locally.

Quick Recap

This projects demonstrates the use of the AWS Transcribe service and PowerShell to create an SRT (subtitle) file from a media file.

Our project makes use of:

PowerShell Core

AWS PowerShell Core Cmdlets

AWS S3

AWS Transcribe Service

An MP4 video file

Prerequisites

Before going into the nitty gritty, you need to ensure all of the following are in place:

You have cloned the repo for the project from either its source or your own fork

Sequence of Events to Transcribe the File

The order of events that need to happen is relatively straightforward:

Upload the file to S3

Create a Transcribe job

Wait for job to finish

Download the JSON file

Upload the file to S3

We’ll start out defining some variables and defaults to make things a bit easier, then the Write-S3Object cmdlet takes care of itself:

Uploading the file to S3

PowerShell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

$AWSDefaultParameters=@{

profilename='development'

region='eu-west-1'

}

#Set parameters as required

$Bucket='tim-training-thing'

$Path="~/Desktop/videoplayback.mp4"

#Let's get the file item so we can use some of its properties

$fileitem=Get-Item-Path$Path

#Set the S3 uri prefix and uri for the S3 object's key

$prefix='https://s3-eu-west-1.amazonaws.com'

$s3uri="$prefix/$Bucket/$($fileitem.name)"

#Upload it to S3

Write-S3Object-BucketName$Bucket-File$Path@AWSDefaultParameters

Create a Transcribe job

All Transcribe jobs have an associated job name associated with this. For this script, I’ve used the GUID class to create a unique one. We define this and the name of the results file that will be used when it’s downloaded from a completed job. Then the Start-TRSTranscriptionJob cmdlet is used to initiate the task. The $s3uri variable is used to tell Transcribe where to get the file it is to process.

Creating the Job

PowerShell

1

2

3

4

#Define a unique guid to be used as the job name and the output results file.

Wait for job to finish

A basic loop is put in place which checks the status of the Transcribe job every five seconds. The loops continues until the job status changes from IN_PROGRESS, indicating either a failure or completion of it.

Wait for the Job to Finish

PowerShell

1

2

3

4

5

6

7

8

#Job processing will run async, so it's up to you how you deal with this.

#For this one we'll take ten second naps in between checks of the status

Download the JSON file

When a job has successfully executed, visible by its COMPLETED status, it stores the result in an s3 bucket of its own choice. The location is not in your own personal bucket, and has an expiry life. By querying the TranscriptFileUri property of the job status, we can get the location where it is stored. You’ve then got the choice of using the S3 Manager cmdlet for downloading the file, or alternatively (in this case), simply with Invoke-Webrequest.

Download the JSON file

PowerShell

1

2

3

4

If($results.TranscriptionJobStatus-eq'COMPLETED'){

$transcripturi=$results.Transcript.TranscriptFileUri

Invoke-Webrequest-Uri$transcripturi-OutFile$resultsfile

}

Part III will cover the largest part of the process, converting the job results into the SRT file we’ll use with the original video.
Thanks for reading!