Google API Assignment
Assignment:
Take a picture that has characters on it.
Upload picture on your storage bucket
Use vision OCR functionality to extract the text from image
Translate the text to spanish using the Google translate API
Create an MP3 file from the translated text using the text to speech API.
Install all python libraries
pip install --upgrade google-cloud-vision
pip install google-cloud-translate
pip install google-cloud-text-to-speech
Enable all APIs needed
gcloud services enable vision.googleapis.com
gcloud services enable translate.googleapis.com
gcloud services enable texttospeech.googleapis.com
Need service account JSON key
export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”
Execute program
python3 speech-to-text.py
======================================================
from google.cloud import vision
from google.cloud import storage
from google.cloud import translate_v2 as translate
from google.cloud import texttospeech
vision_client = vision.ImageAnnotatorClient()
storage_client = storage.Client()
translate_client = translate.Client()
tts_client = texttospeech.TextToSpeechClient()
def process_image_to_audio(bucket_name, image_blob_name):
print(f"Extract text gs://{bucket_name}/{image_blob_name} ---")
image_uri = f"gs://{bucket_name}/{image_blob_name}"
image = vision.Image()
image.source.image_uri = image_uri
response = vision_client.text_detection(image=image)
if response.error.message:
raise Exception(f"Vision Error: {response.error.message}")
texts = response.text_annotations
if not texts:
print("No text.")
return
original_text = texts[0].description
print(f"Extracted :\n{original_text}\n")
print("Translate to Spanish")
translation = translate_client.translate(original_text, target_language='es')
translated_text = translation['translatedText']
print("Generate Audio")
synthesis_input = texttospeech.SynthesisInput(text=translated_text)
voice = texttospeech.VoiceSelectionParams(
language_code="es-ES",
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
tts_response = tts_client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
bucket = storage_client.bucket(bucket_name)
base_name = image_blob_name.split('.')[0]
text_blob = bucket.blob(f"{base_name}_es.txt")
text_blob.upload_from_string(translated_text)
audio_blob = bucket.blob(f"{base_name}_es.mp3")
audio_blob.upload_from_string(tts_response.audio_content, content_type="audio/mpeg")
print(f"Completed")
print(f"Text: gs://{bucket_name}/{base_name}_es.txt")
print(f"Audio: gs://{bucket_name}/{base_name}_es.mp3")
if __name__ == "__main__":
BUCKET = "cloud-storage-exam"
IMAGE_FILE = "lunch.jpeg"
process_image_to_audio(BUCKET, IMAGE_FILE)
===================================================
Results:
john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$ python3 ocr_to_speech.py
Extract text gs://cloud-storage-exam/lunch.jpeg ---
Extracted :
Easy and affordable
lunches you'll love
Hot summer days call for laid-back sandwiches just the
way you like them. Pick your favorite meats from our deli
and pair them with fresh, seasonal produce for your
perfect combo.
Translate to Spanish
Generate Audio
Completed
Text: gs://cloud-storage-exam/lunch_es.txt
Audio: gs://cloud-storage-exam/lunch_es.mp3
john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$
Link to audio file
No comments:
Post a Comment