Vision Example Explained
Start Cloud Shell
While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Cloud Shell, a command line environment running in the Cloud.
Activate Cloud Shell
From the Cloud Console, click Activate Cloud Shell .
Before you can begin using the Vision API, run the following command in Cloud Shell to enable the API:
gcloud services enable vision.googleapis.com
You should see something like this:
Operation "operations/..." finished successfully.
Now, you can use the Vision API!
Navigate to your home directory:
cd ~
Create a Python virtual environment to isolate the dependencies:
virtualenv venv-vision
Activate the virtual environment:
source venv-vision/bin/activate
Install IPython and the Vision API client library:
pip install ipython google-cloud-vision
You should see something like this:
...
Installing collected packages: ..., ipython, google-cloud-vision
Successfully installed ... google-cloud-vision-3.4.0 ...
Now, you're ready to use the Vision API client library!
Welcome to Cloud Shell! Type "help" to get started.
Your Cloud Platform project in this session is set to uconn-customers.
Use “gcloud config set project [PROJECT_ID]” to change to a different project.
john_iacovacci1@cloudshell:~ (uconn-customers)$
gcloud services enable vision.googleapis.com
Operation "operations/acat.p2-320375012625-adfd3c6c-3941-4886-b334-f14e80b49c37" finished successfully.
john_iacovacci1@cloudshell:~ (uconn-customers)$
john_iacovacci1@cloudshell:~ (uconn-customers)$ virtualenv venv-vision
created virtual environment CPython3.9.2.final.0-64 in 640ms
creator CPython3Posix(dest=/home/john_iacovacci1/venv-vision, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/john_iacovacci1/.local/share/virtualenv)
added seed packages: pip==23.2.1, setuptools==68.2.0, wheel==0.41.2
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
john_iacovacci1@cloudshell:~ (uconn-customers)$ pip install ipython google-cloud-vision
john_iacovacci1@cloudshell:~ (uconn-customers)$ ipython
Enter interactive python mode
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
4. Perform label detection
One of the Vision API core features is to identify objects or entities in an image,
known as label annotation. Label detection identifies general objects, locations,
activities, animal species, products, and more. The Vision API takes an input image
and returns the most likely labels which apply to that image. It returns the top-matching
labels along with a confidence score of a match to the image.
In this example, you will perform label detection on an image (courtesy of Alex Knight) of Setagaya, a popular district in Tokyo:
Copy the following code into your IPython session:
==========================================
from typing import Sequence
from google.cloud import vision
def analyze_image_from_uri(
image_uri: str,
feature_types: Sequence,
) -> vision.AnnotateImageResponse:
client = vision.ImageAnnotatorClient()
image = vision.Image()
image.source.image_uri = image_uri
features = [vision.Feature(type_=feature_type) for feature_type in feature_types]
request = vision.AnnotateImageRequest(image=image, features=features)
response = client.annotate_image(request=request)
return response
def print_labels(response: vision.AnnotateImageResponse):
print("=" * 80)
for label in response.label_annotations:
print(
f"{label.score:4.0%}",
f"{label.description:5}",
sep=" | ",
)
=========================================================
Take a moment to study the code and see how it uses the annotate_image client library method to analyze an image for a set of given features.
Send a request with the LABEL_DETECTION feature:
=================================
image_uri = "gs://cloud-samples-data/vision/label/setagaya.jpeg"
features = [vision.Feature.Type.LABEL_DETECTION]
response = analyze_image_from_uri(image_uri, features)
print_labels(response)
You should get the following output:
=========================================================
97% | Bicycle
96% | Tire
94% | Wheel
91% | Automotive lighting
89% | Infrastructure
87% | Bicycle wheel
86% | Mode of transport
85% | Building
83% | Electricity
82% | Neighbourhood
=========================================================
In [1]: from typing import Sequence
...:
...: from google.cloud import vision
...:
...:
...: def analyze_image_from_uri(
...: image_uri: str,
...: feature_types: Sequence,
...: ) -> vision.AnnotateImageResponse:
...: client = vision.ImageAnnotatorClient()
...:
...: image = vision.Image()
...: image.source.image_uri = image_uri
...: features = [vision.Feature(type_=feature_type) for feature_type in feature_types]
...: request = vision.AnnotateImageRequest(image=image, features=features)
...:
...: response = client.annotate_image(request=request)
...:
...: return response
...:
...:
...: def print_labels(response: vision.AnnotateImageResponse):
...: print("=" * 80)
...: for label in response.label_annotations:
...: print(
...: f"{label.score:4.0%}",
...: f"{label.description:5}",
...: sep=" | ",
...: )
Hit return until prompt occurs at prompt paste in information regarding picture
In [2]:
In [2]: image_uri = "gs://cloud-samples-data/vision/label/setagaya.jpeg"
...: features = [vision.Feature.Type.LABEL_DETECTION]
...:
...: response = analyze_image_from_uri(image_uri, features)
...: print_labels(response)
...: ...: print_labels(response)
=========================================================
97% | Bicycle
96% | Tire
94% | Wheel
91% | Automotive lighting
89% | Infrastructure
87% | Bicycle wheel
86% | Mode of transport
85% | Building
83% | Electricity
82% | Neighbourhood
In [3]:
5. Perform text detection
Text detection performs Optical Character Recognition (OCR).
It detects and extracts text within an image with support for a broad range
of languages. It also features automatic language identification.
In this example, you will perform text detection on a traffic sign image
Copy the following code into your IPython session:
def print_text(response: vision.AnnotateImageResponse):
print("=" * 80)
for annotation in response.text_annotations:
vertices = [f"({v.x},{v.y})" for v in annotation.bounding_poly.vertices]
print(
f"{repr(annotation.description):42}",
",".join(vertices),
sep=" | ",
)
===========================================
Send a request with the TEXT_DETECTION feature:
================================================
image_uri = "gs://cloud-samples-data/vision/ocr/sign.jpg"
features = [vision.Feature.Type.TEXT_DETECTION]
response = analyze_image_from_uri(image_uri, features)
print_text(response)
================================================
You should get the following output:
=========================================================
'WAITING?\nPLEASE\nTURN OFF\nYOUR\nENGINE' | (310,821),(2225,821),(2225,1965),(310,1965)
'WAITING' | (344,821),(2025,879),(2016,1127),(335,1069)
'?' | (2057,881),(2225,887),(2216,1134),(2048,1128)
'PLEASE' | (1208,1230),(1895,1253),(1891,1374),(1204,1351)
'TURN' | (1217,1414),(1718,1434),(1713,1558),(1212,1538)
'OFF' | (1787,1437),(2133,1451),(2128,1575),(1782,1561)
'YOUR' | (1211,1609),(1741,1626),(1737,1747),(1207,1731)
'ENGINE' | (1213,1805),(1923,1819),(1920,1949),(1210,1935)
In [3]: def print_text(response: vision.AnnotateImageResponse):
...: print("=" * 80)
...: for annotation in response.text_annotations:
...: vertices = [f"({v.x},{v.y})" for v in annotation.bounding_poly.vertices]
...: print(
...: f"{repr(annotation.description):42}",
...: ",".join(vertices),
...: sep=" | ",
...: )
...:
In [4]:
In [4]: image_uri = "gs://cloud-samples-data/vision/ocr/sign.jpg"
...: features = [vision.Feature.Type.TEXT_DETECTION]
...:
...: response = analyze_image_from_uri(image_uri, features)
...: print_text(response)
=========================================================
'WAITING?\nPLEASE\nTURN OFF\nYOUR\nENGINE' | (310,821),(2225,821),(2225,1965),(310,1965)
'WAITING' | (344,821),(2025,879),(2016,1127),(335,1069)
'?' | (2057,881),(2225,887),(2216,1134),(2048,1128)
'PLEASE' | (1207,1231),(1895,1254),(1891,1374),(1203,1351)
'TURN' | (1217,1414),(1718,1434),(1712,1559),(1212,1539)
'OFF' | (1787,1437),(2133,1451),(2128,1576),(1782,1562)
'YOUR' | (1211,1609),(1741,1626),(1737,1747),(1207,1731)
'ENGINE' | (1213,1805),(1922,1819),(1919,1949),(1210,1935)
=========================================================
6. Perform landmark detection
Landmark detection detects popular natural and man-made structures within an image.
In this example, you will perform landmark detection on an image
(courtesy of John Towner) of the Eiffel Tower:
Copy the following code into your IPython session:
def print_landmarks(response: vision.AnnotateImageResponse, min_score: float = 0.5):
print("=" * 80)
for landmark in response.landmark_annotations:
if landmark.score < min_score:
continue
vertices = [f"({v.x},{v.y})" for v in landmark.bounding_poly.vertices]
lat_lng = landmark.locations[0].lat_lng
print(
f"{landmark.description:18}",
",".join(vertices),
f"{lat_lng.latitude:.5f}",
f"{lat_lng.longitude:.5f}",
sep=" | ",
)
Send a request with the LANDMARK_DETECTION feature:
image_uri = "gs://cloud-samples-data/vision/landmark/eiffel_tower.jpg"
features = [vision.Feature.Type.LANDMARK_DETECTION]
response = analyze_image_from_uri(image_uri, features)
print_landmarks(response)
You should get the following output:
=========================================================
Trocadéro Gardens | (303,36),(520,36),(520,371),(303,371) | 48.86160 | 2.28928
Eiffel Tower | (458,76),(512,76),(512,263),(458,263) | 48.85846 | 2.29435
Here is how the results are presented by the online demo:
=======================================================
def print_landmarks(response: vision.AnnotateImageResponse, min_score: float = 0.5): print("=" * 80) for landmark in response.landmark_annotations: if landmark.score < min_score: continue vertices = [f"({v.x},{v.y})" for v in landmark.bounding_poly.vertices] lat_lng = landmark.locations[0].lat_lng print( f"{landmark.description:18}", ",".join(vertices), f"{lat_lng.latitude:.5f}", f"{lat_lng.longitude:.5f}", sep=" | ", )
=======================================================
image_uri = "gs://cloud-samples-data/vision/landmark/eiffel_tower.jpg"
features = [vision.Feature.Type.LANDMARK_DETECTION]
response = analyze_image_from_uri(image_uri, features)
print_landmarks(response)
=========================================================
Champ De Mars | (0,0),(640,0),(640,426),(0,426) | 48.85565 | 2.29863
Pont De Bir-Hakeim | (0,0),(640,0),(640,426),(0,426) | 48.85560 | 2.28759
Eiffel Tower | (0,0),(640,0),(640,426),(0,426) | 48.85837 | 2.29448
In [7]:
7. Perform face detection
Facial features detection detects multiple faces within an image along
with the associated key facial attributes such as emotional state or wearing
headwear.
In this example, you will detect faces in the following picture (courtesy of Himanshu Singh Gurjar):
Copy the following code into your IPython session:
===========================================================
def print_faces(response: vision.AnnotateImageResponse):
print("=" * 80)
for face_number, face in enumerate(response.face_annotations, 1):
vertices = ",".join(f"({v.x},{v.y})" for v in face.bounding_poly.vertices)
print(f"# Face {face_number} @ {vertices}")
print(f"Joy: {face.joy_likelihood.name}")
print(f"Exposed: {face.under_exposed_likelihood.name}")
print(f"Blurred: {face.blurred_likelihood.name}")
print("-" * 80)
=========================================================
Send a request with the FACE_DETECTION feature:
========================================================
image_uri = "gs://cloud-samples-data/vision/face/faces.jpeg"
features = [vision.Feature.Type.FACE_DETECTION]
response = analyze_image_from_uri(image_uri, features)
print_faces(response)
=========================================================
You should get the following output:
=========================================================
# Face 1 @ (1077,157),(2146,157),(2146,1399),(1077,1399)
Joy: VERY_LIKELY
Exposed: VERY_UNLIKELY
Blurred: VERY_UNLIKELY
--------------------------------------------------------------------------------
# Face 2 @ (144,1273),(793,1273),(793,1844),(144,1844)
Joy: VERY_UNLIKELY
Exposed: VERY_UNLIKELY
Blurred: UNLIKELY
--------------------------------------------------------------------------------
# Face 3 @ (785,167),(1100,167),(1100,534),(785,534)
Joy: VERY_UNLIKELY
Exposed: LIKELY
Blurred: VERY_LIKELY
--------------------------------------------------------------------------------
In [7]: def print_faces(response: vision.AnnotateImageResponse):
...: print("=" * 80)
...: for face_number, face in enumerate(response.face_annotations, 1):
...: vertices = ",".join(f"({v.x},{v.y})" for v in face.bounding_poly.vertices)
...: print(f"# Face {face_number} @ {vertices}")
...: print(f"Joy: {face.joy_likelihood.name}")
...: print(f"Exposed: {face.under_exposed_likelihood.name}")
...: print(f"Blurred: {face.blurred_likelihood.name}")
...: print("-" * 80)
...:
In [8]: image_uri = "gs://cloud-samples-data/vision/face/faces.jpeg"
...: features = [vision.Feature.Type.FACE_DETECTION]
...:
...: response = analyze_image_from_uri(image_uri, features)
...: print_faces(response)
=========================================================
# Face 1 @ (1076,151),(2144,151),(2144,1392),(1076,1392)
Joy: VERY_LIKELY
Exposed: VERY_UNLIKELY
Blurred: VERY_UNLIKELY
--------------------------------------------------------------------------------
# Face 2 @ (784,170),(1097,170),(1097,534),(784,534)
Joy: VERY_UNLIKELY
Exposed: VERY_UNLIKELY
Blurred: VERY_UNLIKELY
--------------------------------------------------------------------------------
# Face 3 @ (664,28),(816,28),(816,205),(664,205)
Joy: VERY_UNLIKELY
Exposed: VERY_UNLIKELY
Blurred: VERY_UNLIKELY
--------------------------------------------------------------------------------
# Face 4 @ (143,1273),(793,1273),(793,1844),(143,1844)
Joy: VERY_UNLIKELY
Exposed: VERY_UNLIKELY
Blurred: VERY_UNLIKELY
--------------------------------------------------------------------------------
In [9]:
8. Perform object detection
In this example, you will perform object detection on the same prior image
(courtesy of Alex Knight) of Setagaya:
Copy the following code into your IPython session:
=======================================================
def print_objects(response: vision.AnnotateImageResponse):
print("=" * 80)
for obj in response.localized_object_annotations:
nvertices = obj.bounding_poly.normalized_vertices
print(
f"{obj.score:4.0%}",
f"{obj.name:15}",
f"{obj.mid:10}",
",".join(f"({v.x:.1f},{v.y:.1f})" for v in nvertices),
sep=" | ",
)
===================================================
Send a request with the OBJECT_LOCALIZATION feature:
=====================================================
image_uri = "gs://cloud-samples-data/vision/label/setagaya.jpeg"
features = [vision.Feature.Type.OBJECT_LOCALIZATION]
response = analyze_image_from_uri(image_uri, features)
print_objects(response)
You should get the following output:
=========================================================
93% | Bicycle | /m/0199g | (0.6,0.6),(0.8,0.6),(0.8,0.9),(0.6,0.9)
92% | Bicycle wheel | /m/01bqk0 | (0.6,0.7),(0.7,0.7),(0.7,0.9),(0.6,0.9)
91% | Tire | /m/0h9mv | (0.7,0.7),(0.8,0.7),(0.8,1.0),(0.7,1.0)
75% | Bicycle | /m/0199g | (0.3,0.6),(0.4,0.6),(0.4,0.7),(0.3,0.7)
51% | Tire | /m/0h9mv | (0.3,0.6),(0.4,0.6),(0.4,0.7),(0.3,0.7)
In [9]: def print_objects(response: vision.AnnotateImageResponse):
...: print("=" * 80)
...: for obj in response.localized_object_annotations:
...: nvertices = obj.bounding_poly.normalized_vertices
...: print(
...: f"{obj.score:4.0%}",
...: f"{obj.name:15}",
...: f"{obj.mid:10}",
...: ",".join(f"({v.x:.1f},{v.y:.1f})" for v in nvertices),
...: sep=" | ",
...: )
...:
In [10]: image_uri = "gs://cloud-samples-data/vision/label/setagaya.jpeg"
...: features = [vision.Feature.Type.OBJECT_LOCALIZATION]
...:
...: response = analyze_image_from_uri(image_uri, features)
...: print_objects(response)
=========================================================
93% | Bicycle | /m/0199g | (0.6,0.6),(0.8,0.6),(0.8,0.9),(0.6,0.9)
92% | Bicycle wheel | /m/01bqk0 | (0.6,0.7),(0.7,0.7),(0.7,0.9),(0.6,0.9)
91% | Tire | /m/0h9mv | (0.7,0.7),(0.8,0.7),(0.8,1.0),(0.7,1.0)
75% | Bicycle | /m/0199g | (0.3,0.6),(0.4,0.6),(0.4,0.7),(0.3,0.7)
51% | Tire | /m/0h9mv | (0.3,0.6),(0.4,0.6),(0.4,0.7),(0.3,0.7)
In [11]:
9. Multiple features
You've seen how to use some features of the Vision API, but there are many more and you can request multiple features in a single request.
Here the kind of request you can make to get all insights at once:
image_uri = "gs://..."
features = [
vision.Feature.Type.OBJECT_LOCALIZATION,
vision.Feature.Type.FACE_DETECTION,
vision.Feature.Type.LANDMARK_DETECTION,
vision.Feature.Type.LOGO_DETECTION,
vision.Feature.Type.LABEL_DETECTION,
vision.Feature.Type.TEXT_DETECTION,
vision.Feature.Type.DOCUMENT_TEXT_DETECTION,
vision.Feature.Type.SAFE_SEARCH_DETECTION,
vision.Feature.Type.IMAGE_PROPERTIES,
vision.Feature.Type.CROP_HINTS,
vision.Feature.Type.WEB_DETECTION,
vision.Feature.Type.PRODUCT_SEARCH,
vision.Feature.Type.OBJECT_LOCALIZATION,
]
# response = analyze_image_from_uri(image_uri, features)
And there are more possibilities, like performing detections on a batch of images, synchronously or asynchronously. Check out all the how-to guides.
No comments:
Post a Comment