Google API Assignment
Assignment:
Take a picture that has characters on it.
Upload picture on your storage bucket
Use vision OCR functionality to extract the text from image
Translate the text to spanish using the Google translate API
Create an MP3 file from the translated text using the text to speech API.
Install all python libraries
pip install --upgrade google-cloud-vision
pip install google-cloud-translate
pip install --upgrade google-cloud-texttospeech
Enable all APIs needed
gcloud services enable vision.googleapis.com
gcloud services enable translate.googleapis.com
gcloud services enable texttospeech.googleapis.com
Need service account JSON key
Generating a JSON key for GOOGLE_APPLICATION_CREDENTIALS is a straightforward process, but it requires navigating the Google Cloud Console. This key acts as a "passport" for your local environment or server to talk to Google APIs securely.
Here is the step-by-step guide to getting it done.
1. Navigate to the IAM & Admin Console
First, you need to head to the Google Cloud Console.
Go to the Service Accounts page.
Select your Project from the top dropdown menu if it isn't already selected.
2. Create a Service Account (If you don't have one)
If you already have a service account you want to use, skip to step 3. Otherwise:
Click + CREATE SERVICE ACCOUNT at the top.
Give it a name (e.g., my-app-service-account).
Click Create and Continue.
Grant access: Assign a "Role" that has the permissions your app needs (e.g., Storage Object Viewer or Pub/Sub Publisher).
Click Done.
3. Generate the JSON Key
Now, you’ll extract the actual credential file:
In the list of service accounts, click on the Email address of the account you just created (or an existing one).
Click on the Keys tab at the top.
Click the ADD KEY dropdown and select Create new key.
Choose JSON as the key type and click Create.
Your browser will automatically download a .json file. Keep this file safe! It contains private keys that grant access to your cloud resources; never commit it to public GitHub repositories.
4. Set the Environment Variable
Once you have the file (let's say it's named service-account-file.json), you need to point your system to it so your code can find it.
On macOS / Linux (Terminal)
Add this to your .bashrc, .zshrc, or run it in your current session:
Bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
Upload key
export GOOGLE_APPLICATION_CREDENTIALS="/home/john_iacovacci1/cloud-project-examples-316c375c6892.json”
ocr_to_speech.py
======================================================
from google.cloud import vision
from google.cloud import storage
from google.cloud import translate_v2 as translate
from google.cloud import texttospeech
vision_client = vision.ImageAnnotatorClient()
storage_client = storage.Client()
translate_client = translate.Client()
tts_client = texttospeech.TextToSpeechClient()
def process_image_to_audio(bucket_name, image_blob_name):
print(f"Extract text gs://{bucket_name}/{image_blob_name} ---")
image_uri = f"gs://{bucket_name}/{image_blob_name}"
image = vision.Image()
image.source.image_uri = image_uri
response = vision_client.text_detection(image=image)
if response.error.message:
raise Exception(f"Vision Error: {response.error.message}")
texts = response.text_annotations
if not texts:
print("No text.")
return
original_text = texts[0].description
print(f"Extracted :\n{original_text}\n")
print("Translate to Spanish")
translation = translate_client.translate(original_text, target_language='es')
translated_text = translation['translatedText']
print("Generate Audio")
synthesis_input = texttospeech.SynthesisInput(text=translated_text)
voice = texttospeech.VoiceSelectionParams(
language_code="es-ES",
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
tts_response = tts_client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
bucket = storage_client.bucket(bucket_name)
base_name = image_blob_name.split('.')[0]
text_blob = bucket.blob(f"{base_name}_es.txt")
text_blob.upload_from_string(translated_text)
audio_blob = bucket.blob(f"{base_name}_es.mp3")
audio_blob.upload_from_string(tts_response.audio_content, content_type="audio/mpeg")
print(f"Completed")
print(f"Text: gs://{bucket_name}/{base_name}_es.txt")
print(f"Audio: gs://{bucket_name}/{base_name}_es.mp3")
if __name__ == "__main__":
BUCKET = "cloud-storage-exam"
IMAGE_FILE = "lunch.jpeg"
process_image_to_audio(BUCKET, IMAGE_FILE)
===================================================
Results:
john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$ python3 ocr_to_speech.py
Extract text gs://cloud-storage-exam/lunch.jpeg ---
Extracted :
Easy and affordable
lunches you'll love
Hot summer days call for laid-back sandwiches just the
way you like them. Pick your favorite meats from our deli
and pair them with fresh, seasonal produce for your
perfect combo.
Translate to Spanish
Generate Audio
Completed
Text: gs://cloud-storage-exam/lunch_es.txt
Audio: gs://cloud-storage-exam/lunch_es.mp3
john_iacovacci1@cloudshell:~/vision (cloud-project-examples)$
Link to audio file
No comments:
Post a Comment