UCONN

UCONN
UCONN

Google API Assignment

 Google API Assignment


Assignment: 


Take a picture that has characters on it.



Upload picture on your storage bucket



Use vision OCR functionality to extract the text from image


Translate the text to spanish using the Google translate API


Create an MP3 file from the translated text using the text to speech API.


Install all python libraries


pip install --upgrade google-cloud-vision

pip install google-cloud-translate

pip install --upgrade google-cloud-texttospeech




Enable all  APIs needed


gcloud services enable vision.googleapis.com

gcloud services enable translate.googleapis.com

gcloud services enable texttospeech.googleapis.com


Need service account JSON key


Generating a JSON key for GOOGLE_APPLICATION_CREDENTIALS is a straightforward process, but it requires navigating the Google Cloud Console. This key acts as a "passport" for your local environment or server to talk to Google APIs securely.

Here is the step-by-step guide to getting it done.

1. Navigate to the IAM & Admin Console

First, you need to head to the Google Cloud Console.

2. Create a Service Account (If you don't have one)

If you already have a service account you want to use, skip to step 3. Otherwise:

  • Click + CREATE SERVICE ACCOUNT at the top.

  • Give it a name (e.g., my-app-service-account).

  • Click Create and Continue.

  • Grant access: Assign a "Role" that has the permissions your app needs (e.g., Storage Object Viewer or Pub/Sub Publisher).

  • Click Done.

3. Generate the JSON Key

Now, you’ll extract the actual credential file:

  • In the list of service accounts, click on the Email address of the account you just created (or an existing one).

  • Click on the Keys tab at the top.

  • Click the ADD KEY dropdown and select Create new key.

  • Choose JSON as the key type and click Create.

Your browser will automatically download a .json file. Keep this file safe! It contains private keys that grant access to your cloud resources; never commit it to public GitHub repositories.


4. Set the Environment Variable

Once you have the file (let's say it's named service-account-file.json), you need to point your system to it so your code can find it.

On macOS / Linux (Terminal)

Add this to your .bashrc, .zshrc, or run it in your current session:

Bash

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"



Upload key




export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”

ocr_to_speech.py

======================================================

from google.cloud import vision

from google.cloud import storage

from google.cloud import translate_v2 as translate

from google.cloud import texttospeech

vision_client = vision.ImageAnnotatorClient()

storage_client = storage.Client()

translate_client = translate.Client()

tts_client = texttospeech.TextToSpeechClient()


def process_image_to_audio(bucket_name, image_blob_name):

    print(f"Extract text gs://{bucket_name}/{image_blob_name} ---")

    image_uri = f"gs://{bucket_name}/{image_blob_name}"

    image = vision.Image()

    image.source.image_uri = image_uri

    response = vision_client.text_detection(image=image)

    if response.error.message:

        raise Exception(f"Vision Error: {response.error.message}")


    texts = response.text_annotations

    if not texts:

        print("No text.")

        return


    original_text = texts[0].description

    print(f"Extracted :\n{original_text}\n")


    print("Translate to Spanish")

    translation = translate_client.translate(original_text, target_language='es')

    translated_text = translation['translatedText']


    print("Generate Audio")

    synthesis_input = texttospeech.SynthesisInput(text=translated_text)

    voice = texttospeech.VoiceSelectionParams(

        language_code="es-ES",

        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL

    )

    audio_config = texttospeech.AudioConfig(

        audio_encoding=texttospeech.AudioEncoding.MP3

    )

    tts_response = tts_client.synthesize_speech(

        input=synthesis_input, voice=voice, audio_config=audio_config

    )


    bucket = storage_client.bucket(bucket_name)

    base_name = image_blob_name.split('.')[0]

    text_blob = bucket.blob(f"{base_name}_es.txt")

    text_blob.upload_from_string(translated_text)

    audio_blob = bucket.blob(f"{base_name}_es.mp3")

    audio_blob.upload_from_string(tts_response.audio_content, content_type="audio/mpeg")


    print(f"Completed")

    print(f"Text: gs://{bucket_name}/{base_name}_es.txt")

    print(f"Audio: gs://{bucket_name}/{base_name}_es.mp3")


if __name__ == "__main__":

    BUCKET = "cloud-storage-exam"

    IMAGE_FILE = "lunch.jpeg"

    process_image_to_audio(BUCKET, IMAGE_FILE)


===================================================

Results:

john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$ python3 ocr_to_speech.py

Extract text gs://cloud-storage-exam/lunch.jpeg ---

Extracted :

Easy and affordable

lunches you'll love

Hot summer days call for laid-back sandwiches just the

way you like them. Pick your favorite meats from our deli

and pair them with fresh, seasonal produce for your

perfect combo.


Translate to Spanish

Generate Audio

Completed

Text: gs://cloud-storage-exam/lunch_es.txt

Audio: gs://cloud-storage-exam/lunch_es.mp3

john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$ 


Link to audio file


Lunch



No comments:

Post a Comment

Assignment 10 due before grading

  Pick any stock and send me the linear regression chart for that stock. https://uconnstamfordslp.blogspot.com/p/machine-learning-exercise.h...