Cloud Speech and Text APIs
The Speech-to-Text and Text-to-Speech APIs are google tools that allow developers to convert audio streams or files to text and text to audio streams or files. These APIs leverage Google’s machine learning The Google Cloud Speech-to-Text API (often simply called the Cloud Speech API) is a powerful tool that uses Google’s advanced machine learning models to convert spoken language into written text.
Let's look at Cloud Speech first. Basically it takes audio files or streams of audio and converts it to text that can be stored or processed.
The API handles 3 modes of processing.
Synchronous files for smaller files by sending the file and receiving the transcribed data all at once.
Asynchronous files for larger files where google will process in the background and deliver to a storage bucket when complete.
Real-time streaming audio you send to google and transcriptions come back in chunks immediately.
Supports over 125 languages.
Examples
Call centers for transcribing customer calls.
Media for closed captioning.
Voice commands like Siri
Tell the Cloud Speech API the format of the audio.
API needs to know the sample rate of the file.
Tell the audio processor the clock time covered by each data point.
Know the language spoken in the audio.
Add python library
pip install google-cloud-speech
Enable the API
gcloud services enable speech.googleapis.com
Parameters
language_code: like en-US or es-ES.
encoding: The format of the audio MP3
sample_rate_hertz: frequency 16,000 Hz
Enable_automatic_punctuation: True - automatically add punctuation
Need service account JSON key
export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”
Execute program
python3 speech-to-text.py
======================================================
import io
from google.cloud import speech
client = speech.SpeechClient()
def transcribe_gcs_to_file(gcs_uri, convert_filename):
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.MP3,
sample_rate_hertz=16000,
language_code="en-US",
enable_automatic_punctuation=True,
)
print(f"Waiting for API")
response = client.recognize(config=config, audio=audio)
if not response.results:
print("No file found")
return
with open(convert_filename, "w", encoding="utf-8") as f:
for result in response.results:
transcript = result.alternatives[0].transcript
f.write(f"{transcript}\n")
print(f"Converted: {transcript} )")
print(f"\nFile Converted to: {convert_filename}")
if __name__ == "__main__":
file_uri = "gs://cloud-storage-exam/test-audio.mp3"
convert_file = "transcription_output.txt"
transcribe_gcs_to_file(file_uri, convert_file)
=========================================================
Results:
john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ cat transcription_output.txt
Convert audio file to text.
john_iacovacci1@cloudshell:~/API (cloud-project-examples)$
Google Cloud Text-to-Speech API takes a standard text file and creates an audio file e.g. MP3.
Google has a large voice library consisting of various accents like British or American.
You can also build custom voices based upon voice recordings.
Add python library
pip install google-cloud-text-to-speech
Enable the API
gcloud services enable texttospeech.googleapis.com
Parameters
Language_code: use en-US for english.
name: use en-US-Neural2-F for generic
Ssml_gender: MALE, FEMALE or NEUTRAL.
audio_encoding: use MP3
Need service account JSON key
export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”
Execute program
python3 text-to-speech.py
====================================================
from google.cloud import texttospeech
from google.cloud import storage
tts_client = texttospeech.TextToSpeechClient()
storage_client = storage.Client()
def synthesize_to_gcs(text, bucket_name, destination_blob_name):
synthesis_input = texttospeech.SynthesisInput(text=text)
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Neural2-F"
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
print(f"speech to text for bucket: {bucket_name}...")
response = tts_client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(response.audio_content, content_type="audio/mpeg")
print(f"file uploaded '{destination_blob_name}' to storage '{bucket_name}'.")
if __name__ == "__main__":
BUCKET = "cloud-storage-exam"
FILENAME = "test_message.mp3"
TEXT_TO_SAY = "file was created by Google text to speech API and moved to Cloud bucket."
synthesize_to_gcs(TEXT_TO_SAY, BUCKET, FILENAME)
=======================================================
Results:
john_iacovacci1@cloudshell:~/API (cloud-project-examples)$ python3 text-to-speech.py
speech to text for bucket: cloud-storage-exam...
file uploaded 'test_message.mp3' to storage 'cloud-storage-exam'.
john_iacovacci1@cloudshell:~/API (cloud-project-examples)$
No comments:
Post a Comment