I recently came across the IBM Watson Text to Speech service. This service, which is available under standard, premium and lite (restricted) plans on IBM Cloud, makes it quick and easy to transcribe speech from any application. It can be accessed using a standard HTTP REST API.

With the IBM Cloud CLI, launching a new Text to Speech service on IBM Cloud is a simple two-step process:

  1. Create the service instance (read more about service instance creation).

    ibmcloud resource service-instance-create myt2s text-to-speech lite us-east
    
  2. Create a service key associated with that service instance (read more about service credential creation). Make sure that you assign the Manager role to the key so that you can write to the database:

    ibmcloud resource service-key-create myt2s-key Manager --instance-name myt2s
    

    Note the apikey and host fields in the response, as you will need them to use the service.

Once instantiated, the Text to Speech service becomes visible in the IBM Cloud dashboard:

alt

Once the service is active, you can send a JSON-formatted POST request to the host endpoint, with the text to be transcribed in the request body. The apikey must be included to authorize the request. Here’s an example

curl -X POST -u "apikey:YOUR-API-KEY-HERE" \
    --header "Content-Type: application/json" \
    --header "Accept: audio/ogg" \
    --output hello.ogg \
    --data '{"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." }' \
"YOUR-HOST-HERE/v1/synthesize"

By default, the service returns an audio file in Ogg format, but a number of other audio formats are also supported, including FLAC, MP3 and WAV. More information is available in the API documentation.