UCONN Stamford Google Cloud Development Platform: Cloud Speech and Text APIs

Cloud Speech and Text APIs

The Speech-to-Text and Text-to-Speech APIs are google tools that allow developers to convert audio streams or files to text and text to audio streams or files. These APIs leverage Google’s machine learning The Google Cloud Speech-to-Text API (often simply called the Cloud Speech API) is a powerful tool that uses Google’s advanced machine learning models to convert spoken language into written text.

Let's look at Cloud Speech first. Basically it takes audio files or streams of audio and converts it to text that can be stored or processed.

The API handles 3 modes of processing.

Synchronous files for smaller files by sending the file and receiving the transcribed data all at once.

Asynchronous files for larger files where google will process in the background and deliver to a storage bucket when complete.

Real-time streaming audio you send to google and transcriptions come back in chunks immediately.

Supports over 125 languages.

Examples

Call centers for transcribing customer calls.

Media for closed captioning.

Voice commands like Siri

Tell the Cloud Speech API the format of the audio.

API needs to know the sample rate of the file.

Tell the audio processor the clock time covered by each data point.

Know the language spoken in the audio.

Speech to Text

Cloud Speech

Humana

Add python library

pip install google-cloud-speech

Enable the API

gcloud services enable speech.googleapis.com

Parameters

language_code: like en-US or es-ES.

encoding: The format of the audio MP3

sample_rate_hertz: frequency 16,000 Hz

Enable_automatic_punctuation: True - automatically add punctuation

Need service account JSON key

export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”

Execute program

python3 speech-to-text.py

======================================================

import io

from google.cloud import speech

client = speech.SpeechClient()

def transcribe_gcs_to_file(gcs_uri, convert_filename):

audio = speech.RecognitionAudio(uri=gcs_uri)

config = speech.RecognitionConfig(

encoding=speech.RecognitionConfig.AudioEncoding.MP3,

sample_rate_hertz=16000,

language_code="en-US",

enable_automatic_punctuation=True,

)

print(f"Waiting for API")

response = client.recognize(config=config, audio=audio)

if not response.results:

print("No file found")

return

with open(convert_filename, "w", encoding="utf-8") as f:

for result in response.results:

transcript = result.alternatives[0].transcript

f.write(f"{transcript}\n")

print(f"Converted: {transcript} )")

print(f"\nFile Converted to: {convert_filename}")

if __name__ == "__main__":

file_uri = "gs://cloud-storage-exam/test-audio.mp3"

convert_file = "transcription_output.txt"

transcribe_gcs_to_file(file_uri, convert_file)

=========================================================

Results:

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ cat transcription_output.txt

Convert audio file to text.

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$

Google Cloud Text-to-Speech API takes a standard text file and creates an audio file e.g. MP3.

Google has a large voice library consisting of various accents like British or American.

You can also build custom voices based upon voice recordings.

Add python library

pip install google-cloud-text-to-speech

Enable the API

gcloud services enable texttospeech.googleapis.com

Parameters

Language_code: use en-US for english.

name: use en-US-Neural2-F for generic

Ssml_gender: MALE, FEMALE or NEUTRAL.

audio_encoding: use MP3

Need service account JSON key

export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”

Execute program

python3 text-to-speech.py

====================================================

from google.cloud import texttospeech

from google.cloud import storage

tts_client = texttospeech.TextToSpeechClient()

storage_client = storage.Client()

def synthesize_to_gcs(text, bucket_name, destination_blob_name):

synthesis_input = texttospeech.SynthesisInput(text=text)

voice = texttospeech.VoiceSelectionParams(

language_code="en-US",

name="en-US-Neural2-F"

)

audio_config = texttospeech.AudioConfig(

audio_encoding=texttospeech.AudioEncoding.MP3

)

print(f"speech to text for bucket: {bucket_name}...")

response = tts_client.synthesize_speech(

input=synthesis_input, voice=voice, audio_config=audio_config

)

bucket = storage_client.bucket(bucket_name)

blob = bucket.blob(destination_blob_name)

blob.upload_from_string(response.audio_content, content_type="audio/mpeg")

print(f"file uploaded '{destination_blob_name}' to storage '{bucket_name}'.")

if __name__ == "__main__":

BUCKET = "cloud-storage-exam"

FILENAME = "test_message.mp3"

TEXT_TO_SAY = "file was created by Google text to speech API and moved to Cloud bucket."

synthesize_to_gcs(TEXT_TO_SAY, BUCKET, FILENAME)

=======================================================

Results:

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ python3 text-to-speech.py

speech to text for bucket: cloud-storage-exam...

file uploaded 'test_message.mp3' to storage 'cloud-storage-exam'.

john_iacovacci1@cloudshell:~/API (cloud-project-examples)$

UCONN Stamford Google Cloud Development Platform

UCONN

Cloud Speech and Text APIs

No comments:

Post a Comment

Assignment 10 due before grading

Report Abuse