Amazon Transcribe is a speech recognition service that transcribes audio files into text.
The service, which uses machine learning technology, also enables a developer to add speech-to-text capabilities into an application. A developer, for example, could build an application that uses Amazon Transcribe to create transcriptions of customer service calls in a contact center, or to generate subtitles for audio or video content in real time.
How to use Amazon Transcribe
To use Amazon Transcribe, a developer must first have an AWS account and create an AWS Identity and Access Management user. Then, he or she can access the service through the AWS Management Console, AWS Command Line Interface (CLI) or Transcribe API.Content Continues Below
Audio files for Transcribe -- which a developer uploads and stores in S3 -- can be in MP3, MP4, WAV or FLAC format, and no longer than two hours in length. The service supports both 16-kilohertz (kHz) and 8-kHz audio streams.
A developer must specify the language and format of the audio file he or she wants to transcribe with the service. As of mid-2018, Transcribe supports only US English and Spanish.
Other Amazon Transcribe features
Transcribe uses deep learning to incorporate punctuation and formatting into each text output, and to limit the amount of editing required after it completes a transcription. During each transcription, the service will also generate a timestamp for each word, in case a user needs to return to a point in time in the original audio file for clarification.
Transcribe can identify between two and 10 different speakers in an audio file, and then label segments of its text file to indicate which speaker spoke which words. Transcribe also enables a developer to input files with custom vocabulary -- such as jargon or proper names that are relevant to a particular industry or use case -- to ensure a more accurate text output.
Transcribe integrates with a range of other Amazon services, including Amazon Comprehend, a natural language processing (NLP) service; Amazon Translate, a language translation service; and Amazon Polly, a service that converts text files into speech.
Amazon Transcribe pricing and availability
Amazon charges for Transcribe using a pay-as-you-go model, based on the seconds of audio transcribed per month. There is a free tier of service that lets a developer analyze up to 60 minutes of audio every month for a year. When the free tier expires, or if a developer exceeds the free tier limit, Amazon bills Transcribe at $0.0004 per second.
Amazon also imposes a minimum of 15 seconds per Transcribe API request; any audio files under 15 seconds will be billed at 15 seconds.
As of mid-2018, Amazon Transcribe is available in six different regions:
- US-East-1 (Northern Virginia)
- US-East-2 (Ohio)
- US-West-2 (Oregon)
- CA-Central-1 (Montreal)
- EU-West-1 (Ireland)
- AP-Southeast-2 (Sydney)