As speech is central to human interaction, artificial intelligence research has long focused on speech recognition, the first step in designing and building systems allowing humans to interact intuitively with machines. The diversity in languages, accents and voices makes this an incredibly difficult problem, requiring expert skills, extremely large data sets, and vast amounts of computing power to train efficient models.
In order to help organizations and developers use speech recognition in their applications, we launched Amazon Transcribe at AWS re:Invent 2017, an automatic speech recognition service. Thanks to Amazon Transcribe, customers such as VideoPeel, Echo360, or GE Appliances have been able to quickly and easily add speech recognition capabilities to their applications and devices.
A single API call is all that it takes… and you don’t need to know the first thing about machine learning. You can analyze audio files stored in Amazon Simple Storage Service (S3) and have the service return a text file of the transcribed speech. You can also send a live audio stream to Amazon Transcribe and receive a stream of transcripts in real time.
Since launch, the team has constantly added new languages, and today we are happy to announce support for Mandarin and Russian, bringing the total number of supported languages to 16.
Working with Amazon Transcribe is extremely simple: let me show you how to get started in just a few minutes.
Let’s try Mandarin first. Starting from this Little Red Riding Hood video, I extracted the audio track, saved it in MP3 format, and uploaded it to one of my Amazon Simple Storage Service (S3) buckets. Here’s the actual file.
$ aws transcribe start-transcription-job--media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/little_red_riding_hood-mandarin.mp3 --media-format mp3 --language-code zh-CN --transcription-job-name little_red_riding_hood-mandarin
After a few minutes, the job is complete. Looking at the AWS console, I can either download it using the URL provided by Amazon Transcribe, or read it directly.
Let’s try Russian now, using the dialogue in this short video.
|Добрый день!||Good day!|
|Давайте познакомимся. Меня зовут Слава.||Let’s introduce ourselves. My name is Slava.|
|Очень приятно, а меня – Наташа.||Nice to meet you, and mine – Natasha.|
|Наташа, кто вы по профессии?||Natasha, what is your profession?|
|Я врач. А вы?||I (am a) doctor. And you?|
|Я инженер.||I (am an) engineer.|
This time, I will ask Amazon Transcribe to perform speaker identification too.
$ aws transcribe start-transcription-job --media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/russian-dialogue.mp3 --media-format mp3 --language-code ru-RU --transcription-job-name russian_dialogue --settings ShowSpeakerLabels=true,MaxSpeakerLabels=2
Here is the result.
As you can see, not only has Amazon Transcribe faithfully converted speech to text, it has also correctly assigned each sentence to the correct speaker.
You can start using these two new languages today in the following regions:
- Americas: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), AWS GovCloud (US-West), Canada (Central), South America (Sao Paulo).
- Europe: EU (Frankfurt), EU (Ireland), EU (London), EU (Paris).
- Asia Pacific: Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney).
The free tier covers 60 minutes for the first 12 months, starting from your first transcription request.