UCONN

UCONN
UCONN

Google API Assignment

 Google API Assignment


Assignment: 


Take a picture that has characters on it.



Upload picture on your storage bucket



Use vision OCR functionality to extract the text from image


Translate the text to spanish using the Google translate API


Create an MP3 file from the translated text using the text to speech API.


Install all python libraries


pip install --upgrade google-cloud-vision

pip install google-cloud-translate

pip install google-cloud-text-to-speech


Enable all  APIs needed


gcloud services enable vision.googleapis.com

gcloud services enable translate.googleapis.com

gcloud services enable texttospeech.googleapis.com


Need service account JSON key

export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”

Execute program

python3 speech-to-text.py

======================================================

from google.cloud import vision

from google.cloud import storage

from google.cloud import translate_v2 as translate

from google.cloud import texttospeech

vision_client = vision.ImageAnnotatorClient()

storage_client = storage.Client()

translate_client = translate.Client()

tts_client = texttospeech.TextToSpeechClient()


def process_image_to_audio(bucket_name, image_blob_name):

    print(f"Extract text gs://{bucket_name}/{image_blob_name} ---")

    image_uri = f"gs://{bucket_name}/{image_blob_name}"

    image = vision.Image()

    image.source.image_uri = image_uri

    response = vision_client.text_detection(image=image)

    if response.error.message:

        raise Exception(f"Vision Error: {response.error.message}")


    texts = response.text_annotations

    if not texts:

        print("No text.")

        return


    original_text = texts[0].description

    print(f"Extracted :\n{original_text}\n")


    print("Translate to Spanish")

    translation = translate_client.translate(original_text, target_language='es')

    translated_text = translation['translatedText']


    print("Generate Audio")

    synthesis_input = texttospeech.SynthesisInput(text=translated_text)

    voice = texttospeech.VoiceSelectionParams(

        language_code="es-ES",

        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL

    )

    audio_config = texttospeech.AudioConfig(

        audio_encoding=texttospeech.AudioEncoding.MP3

    )

    tts_response = tts_client.synthesize_speech(

        input=synthesis_input, voice=voice, audio_config=audio_config

    )


    bucket = storage_client.bucket(bucket_name)

    base_name = image_blob_name.split('.')[0]

    text_blob = bucket.blob(f"{base_name}_es.txt")

    text_blob.upload_from_string(translated_text)

    audio_blob = bucket.blob(f"{base_name}_es.mp3")

    audio_blob.upload_from_string(tts_response.audio_content, content_type="audio/mpeg")


    print(f"Completed")

    print(f"Text: gs://{bucket_name}/{base_name}_es.txt")

    print(f"Audio: gs://{bucket_name}/{base_name}_es.mp3")


if __name__ == "__main__":

    BUCKET = "cloud-storage-exam"

    IMAGE_FILE = "lunch.jpeg"

    process_image_to_audio(BUCKET, IMAGE_FILE)


===================================================

Results:

john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$ python3 ocr_to_speech.py

Extract text gs://cloud-storage-exam/lunch.jpeg ---

Extracted :

Easy and affordable

lunches you'll love

Hot summer days call for laid-back sandwiches just the

way you like them. Pick your favorite meats from our deli

and pair them with fresh, seasonal produce for your

perfect combo.


Translate to Spanish

Generate Audio

Completed

Text: gs://cloud-storage-exam/lunch_es.txt

Audio: gs://cloud-storage-exam/lunch_es.mp3

john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$ 


Link to audio file


Lunch


No comments:

Post a Comment

Optional Assignment #4

  I created a shorter simpler version for the Python CRUD example for those who were having issues and wish to try it out. https://uconnstam...