There are two main factors that affect the cost of using
Speech-to-Text :

The type of recognition model that you use, either standard or premium

Whether you opt in for data logging or not

Speech-to-Text has multiple different types of
machine learning models
that it can use for speech recognition. Two of these models—the enhanced phone
call and video models—provide improved recognition performance. Each of
these models is tailored for a specific use case and produces higher quality
results when used correctly.

With data logging, customers can allow
Google to record audio data sent to Speech-to-Text . This data helps
Google to improve the machine learning models used for speech transcription.
Customers who opt in to data logging benefit from lower Speech-to-Text
pricing.

This pricing is for applications on personal systems (e.g., phones,
tablets, laptops, desktops). Please
contact us
for approval and pricing to use the Speech-to-Text API on embedded devices (e.g., cars,
TVs, appliances, or speakers).

Each request is rounded up to the nearest increment of 15 seconds. For
example, if you make three separate requests, each containing 7 seconds of audio,
you are billed $0.018 USD for 45 seconds (3 × 15 seconds) of audio.
Fractions of seconds are
included when rounding up to the nearest increment of 15 seconds. That is,
15.14 seconds are rounded up and billed as 30 seconds.

Monthly usage is capped at 1 million minutes per month. For usage above 1
million minutes of audio per month, we would like to understand more about your
needs. Please submit a Cloud Speech-to-Text
Quota Request
for your project.

Google Cloud Platform Costs

If you store audio files to be recognized in Google Cloud Storage, or use other
Google Cloud Platform resources in tandem with Speech-to-Text ,
such as Google App Engine instances, then
you will also be billed for the use of those services. See the
Google Cloud Platform Pricing Calculator
to determine other costs based on current rates.