UCONN

UCONN
UCONN

Cloud Speech and Text APIs

 Cloud Speech and Text APIs


The Speech-to-Text and Text-to-Speech APIs are google tools that allow developers to convert audio streams or files to text and text to audio streams or files. These APIs leverage Google’s machine learning The Google Cloud Speech-to-Text API (often simply called the Cloud Speech API) is a powerful tool that uses Google’s advanced machine learning models to convert spoken language into written text.

Let's look at Cloud Speech first. Basically it takes audio files or streams of audio and converts it to text that can be stored or processed.

The API handles 3 modes of processing.

Synchronous files for smaller files by sending the file and receiving the transcribed data all at once.

Asynchronous files for larger files where google will process in the background and deliver to a storage bucket when complete.

Real-time streaming audio you send to google and transcriptions come back in chunks immediately.


Supports over 125 languages.


Examples

 Call centers for transcribing customer calls.

Media for closed captioning.

Voice commands like Siri


Tell the Cloud Speech API the format of the audio.

API needs to know the sample rate of the file. 

Tell the audio processor the clock time covered by each data point.

Know the language spoken in the audio.

Speech to Text

Cloud Speech

Humana 


Add python library

pip install google-cloud-speech


Enable the API


gcloud services enable speech.googleapis.com



Parameters


language_code: like en-US or es-ES.

encoding: The format of the audio MP3

sample_rate_hertz: frequency 16,000 Hz 

Enable_automatic_punctuation: True -  automatically add punctuation


Need service account JSON key

export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”

Execute program

python3 speech-to-text.py

======================================================

import io

from google.cloud import speech


client = speech.SpeechClient()


def transcribe_gcs_to_file(gcs_uri, convert_filename):

   

    audio = speech.RecognitionAudio(uri=gcs_uri)

   

    config = speech.RecognitionConfig(

   

        encoding=speech.RecognitionConfig.AudioEncoding.MP3,

        sample_rate_hertz=16000,

        language_code="en-US",

        enable_automatic_punctuation=True,

    )


    print(f"Waiting for API")

   

    response = client.recognize(config=config, audio=audio)


    if not response.results:

        print("No file found")

        return


   

    with open(convert_filename, "w", encoding="utf-8") as f:

        for result in response.results:

            transcript = result.alternatives[0].transcript

            f.write(f"{transcript}\n")

                       

            print(f"Converted: {transcript} )")


    print(f"\nFile Converted to: {convert_filename}")


if __name__ == "__main__":

    file_uri = "gs://cloud-storage-exam/test-audio.mp3"

    convert_file = "transcription_output.txt"

   

    transcribe_gcs_to_file(file_uri, convert_file)

=========================================================

Results:

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ cat transcription_output.txt 

Convert audio file to text.

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ 



Google Cloud Text-to-Speech API takes a standard text file and creates an audio file e.g. MP3.

Google has a large voice library consisting of various accents like British or American.

You can also build custom voices based upon voice recordings.


Add python library

pip install google-cloud-text-to-speech

Enable the API


gcloud services enable texttospeech.googleapis.com



Parameters

Language_code: use en-US for english.

name:  use  en-US-Neural2-F for generic

Ssml_gender: MALE,  FEMALE or NEUTRAL.

audio_encoding: use MP3 


Need service account JSON key

export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”

Execute program

python3 text-to-speech.py


====================================================

from google.cloud import texttospeech

from google.cloud import storage


tts_client = texttospeech.TextToSpeechClient()

storage_client = storage.Client()



def synthesize_to_gcs(text, bucket_name, destination_blob_name):

   

    synthesis_input = texttospeech.SynthesisInput(text=text)

   

    voice = texttospeech.VoiceSelectionParams(

        language_code="en-US",

        name="en-US-Neural2-F"

    )

    audio_config = texttospeech.AudioConfig(

        audio_encoding=texttospeech.AudioEncoding.MP3

    )


    print(f"speech to text for bucket: {bucket_name}...")

    response = tts_client.synthesize_speech(

        input=synthesis_input, voice=voice, audio_config=audio_config

    )


    bucket = storage_client.bucket(bucket_name)

    blob = bucket.blob(destination_blob_name)


    blob.upload_from_string(response.audio_content, content_type="audio/mpeg")


    print(f"file uploaded '{destination_blob_name}' to storage '{bucket_name}'.")


if __name__ == "__main__":

    BUCKET = "cloud-storage-exam"

    FILENAME = "test_message.mp3"

    TEXT_TO_SAY = "file was created by Google text to speech API and moved to Cloud bucket."


    synthesize_to_gcs(TEXT_TO_SAY, BUCKET, FILENAME)

=======================================================

Results:

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ python3 text-to-speech.py

speech to text for bucket: cloud-storage-exam...

file uploaded 'test_message.mp3' to storage 'cloud-storage-exam'.

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ 


No comments:

Post a Comment

Optional Assignment #4

  I created a shorter simpler version for the Python CRUD example for those who were having issues and wish to try it out. https://uconnstam...