We provide a browser based-editor which can be used to quickly correct the automated transcripts. Click the Edit Transcript button to launch it.

The first thing you will notice is the audio waveform at the top. That is the audio player. Clicking anywhere on it will take you to the corresponding word in the transcript.

The first row of buttons are the controls. Each button also has a corresponding keyboard shortcut so that you don’t have to use the mouse which saves a lot of time. The important shortcuts to remember are CTRL+P to play/pause and CTRL+O to rewind (CMD for Mac).

The second row of buttons are some controls for the text editor. Hover the mouse over the button to get a description of what the button does. It’s mostly self-explanatory.

You will also notice some text underlined in blue and red. The red ones are spelling mistakes. Run the spell check to correct those. The blue ones are where our speech recognition engine was not confident enough and so those may be mistakes. You can right click on those and choose Play Word to check the corresponding audio.

The following are the list of corrections which tend to be required in the automated transcripts:

Mistakes: These are words which are incorrectly transcribed. Most of these words will have blue underlines.

Speaker Turns: Our speech recognition engine misses around 40% of the turns. So some paragraphs may actually have two speakers in them (we are working to improve it).

Punctuations: There may be some missing periods. The commas and other punctuations are mostly correct, although we only provide the start quote. The end quote has to be manually inserted.

Capitalization: Some of the capitalized words may be wrong. Some other words may need to be capitalized.

We recommend the 2-pass approach to make the corrections. First play and check the blue underlines. Those are the low-hanging fruits and you can get them out of the way fast.

Next, play the audio from the beginning and make corrections as you go along. Whenever you notice a mistake, pause, make the correction, and resume play. Rinse and repeat till you reach the end of the file. Increasing the playback speed can also help in cases where the accuracy is more than 80%.

Once you are done with the edits, Click the Download button at the bottom for the Word Document or other formats.

Effectively, it takes around 3-4 times the duration of the file to correct the automated transcript, if you include the time for replays. It is also easy to lose focus on long files. So, remember to take breaks. Without the automated transcript, you may have to spend 8-10 times the duration of the file.

Of course, if you do not have the time, our transcribers will be happy to make the corrections for you. We guarantee 99% accuracy for our manual transcripts. Please do try it out.

Our latest speech and language models have been released. There are several new features in this release. The following is a list:

Acoustic Model: This is our fourth acoustic model trained on our data. The dataset contained mostly accented speakers (eg. Indian, African, Irish etc.). It also contained some noisy files. The accuracy of the automated transcript on accented files should be better now.

Language Model: We have added more data to our language model and doubled its size. The model now model has now been trained on around 46 million lines and has improved the WER by around 2%.

Punctuations: The biggest feature of this release is expanded punctuations. We now support all types of punctuations including quotes and hyphens. To our knowledge, nobody else including Google Web Speech, AWS Transcribe and Speechmatics supports quotes.

Speaker Turns: We also have updated our speaker turns model. The accuracy of the model is around 80% on long paragraphs. The automated transcripts will be better segmented now. We are currently working on adding speaker diarization to the automated transcript and it should be out soon. We do speaker turns a bit differently and do not require the number of speakers as an input. That is also one of our unique features. Google Web Speech does not support multi-speaker files and AWS Transcribe and Speechmatics require the number of speakers as an input for diarization.

This release also fixes the issue of missing predictions where some words, especially near speaker turns were not being transcribed. The automated transcripts should now capture all utterances, except filler words. We also benchmarked our model with LibriSpeech Clean and our internal dataset. The following are our numbers.

As you can see, for conversational audio, our models outperform PaddlePaddle by a wide margin. We are working on improving our models for non-conversational audio as well. Our ASR is a DeepSpeech-based system and therefore a comparison with PaddlePaddle is a good benchmark for us. The Continual Learning blog post has some more details on how we trained our DeepSpeech models.

We are back with our spring special promotion. Avail a 10% discount on all orders at Scribie with the SPRING18 discount code. It will be valid till May 20th, 2018. Don’t forget to apply before ordering!

Flawless transcripts and fast turnaround time are the hallmarks of Scribie. Not only are our transcripts highly accurate, but also priced reasonably. But have you ever wondered what makes that possible? The answer lies in constantly improving our speech-to-text engine, which assists our transcribers. We provide automatic word completion to our transcribers, and the better those autocompletes are, the less they have to type.

Our speech recognition engine is a Deep Learning system. For the uninitiated, Deep Learning is a subdomain of Machine Learning. It makes use of Artificial Neural Networks that, in a way, mimic the structure and function of the human brain. Our speech recognition engine is based on the DeepSpeech 2 network from Baidu, and written in PyTorch.

Scribie has a large dataset of audio and transcripts — over 100,000 hours at the last count. Training Deep Learning models over such a large dataset is very expensive in practice, as it requires a large number of GPUs and SSDs. For comparison, Baidu trained their models with 256 GPUs on custom hardware when they developed the DeepSpeech architecture. We don’t have the time or money to do that. So we developed an approach which we call Continual Learning.

Continual Learning

We first built and trained a large model with a 3,000-hour dataset. That took around three weeks on our rig. Since then, every month we have built a ‘corrections dataset’ of around 1,000 hours. This corrections dataset is made up of predictions from the previous model that were wrong, and then manually corrected by our transcribers. In each iteration we remove an equal amount of data from the previous training set and fine-tune the model over the newly combined data. This ensures that our model keeps improving over time.

Results

We have completed three such iterations and the results are promising. We have been able to consistently decrease the Word Error Rate, a common metric for automated transcription accuracy. The following is the chart of our WER.

We are providing free automated transcripts for a limited time, so please don’t hesitate to try out our online speech recognition system soon! Please note that we support only English at the moment and it works best for files with North American speakers and clean audio.

Deep Learning and AI has been in the news a lot lately, and there are concerns that Machine Learning will end up taking our jobs and replace humans. We have taken a different approach and built a system to assist our transcribers instead. Eventually, we want to reach a point where a human would have to spend just 10 minutes on a one-hour file, and still produce a highly accurate transcript of it. We still have a long way to go and we are working hard at it!

But with that being said, a president has been extremely generous with what he said. I like him a lot. I have a great relationship with them, as you know, have a great relationship with prime minister abe in japan, and I probably have a very good relationship with m gun f not care. I have relationships with people to surprise.

So our AI agrees with WSJ. President Trump did say ‘I’. So there you go!

The transcript is missing few words towards the end and we are working to fix it. However, if you have a clean audio file then head here to get a free automated transcript!

We are happy to announce that we now support billing accounts on Scribie.com. Billing accounts are where you can order your transcripts online and pay on a Net 15 or Net 30 basis. We send you a bill at the end of the month or whenever you request it. The volume requirement for billing accounts are higher though. We can only consider it for order amounts of more than $1000. We also require a contract to be signed before the billing account can be set up.

We were unable to support billing accounts previously as we followed a different model where we paid our transcribers as soon as their work was reviewed. That meant that we had to charge our customers upfront. This was radically different from other freelance marketplaces where there are restrictions on withdrawal of earnings. We decided to do away with such restrictions as our aim was to build the best place for audio/video transcription.

But Net billing is an important requirement for Enterprises and SMEs and many of our customers requested it. Our solution to this problem was to get a line of credit from our bank. We finally got the approval for it last week. A big shout out to our bankers for this!

So if you are looking for a billing account with us, just get in touch with us and we will start the process.

We are pleased to announce an across-the-board 20% drop in our pricing effective today. Our new transcription rates are as follows:

Scribie New Transcription Rates

Old

New

Savings

Budget

$0.75/min

$0.60/min

20%

Regular

$1.50/min

$1.20/min

20%

Express

$3.00/min

$2.40/min

20%

We started with the mission to build the best place for transcription; both for transcribers and customers. We have been relentlessly pursuing our goal and recently have built technology that helps reduce the time and effort of transcribers, without compromising accuracy in any way. We are happy to pass on the savings to our customers.

We have always stood for accuracy and our goal has been to provide the highest quality transcript, at the lowest possible cost. However, we still want to compensate our transcribers fairly. The only way to solve this problem was with technology. Our tech has now been rolled out in production and we are happy to reach this milestone. This is the real test of whether our tech is good enough or not!

We will be talking more about our tech here in the coming days. So check back here if you’re interested in the details. In the meantime, upload your files online and order transcripts online to enjoy the benefits tech can offer with our reduced pricing.

If you have used Scribie’s service before, you probably know about our high standards in terms of the quality and the accuracy of the transcripts.

So a very important question that comes to mind is, how is Scribie able to churn out 99.9% accurate transcripts all the time, while some industry players are afraid of even claiming that benchmark? Continue reading “No Room for Errors”→