UCONN

UCONN
UCONN

NLP Codelab

 NLP Example Explained


Codelabs NLP

Overview

In this codelab, you will focus on using the Natural Language API with Python. You will learn how to perform sentiment analysis, entity analysis,

syntax analysis, and content classification.

What you'll learn

  • How to use Cloud Shell

  • How to enable the Natural Language API

  • How to authenticate API requests

  • How to install the client library for Python

  • How to perform sentiment analysis

  • How to perform entity analysis

  • How to perform syntax analysis

  • How to perform content classification

What you'll need

  • A Google Cloud Project

  • A Browser, such as Chrome or Firefox

  • Familiarity using Python 3



2. Setup and requirements

Self-paced environment setup

Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one.)

Note: You can easily access Cloud Console by memorizing its URL,

which is console.cloud.google.com.

WvZ4Lcm6k-4AOA_v1j9HE6MO5-TJvI3TlGp4PNpqf_gbPLt9pq3c0ZiMOtT4bZZ1urP6tVOzAsTdpQ9OLYDcvvhpbdHcqqLRoQ7JFy-tCyKyE8gn9BtOwo8T3CLjOoTVL17Czok8-A

RQVrJ6Mj_SCQFipYRkdxqlWmokfg9D05dd5oq8qWaF_ci584t_xHW6M5-8fGR5i5JIUkx2r1G_5Ue630biVrQmSLKf_s2OiU9dnc4PZD1VnWSJf2AzZwzZP8UYiI__zqurLdj_7Qvw

ARsF00OMeFtN4MASqu6GKhjNAf8ANuL29fBrkJ6qnh3DrxdXFAftS-klvH5WNZ8XculywaNSnNZk3WoUP0d5UBQstpHXKgiWGJ3Wox3MVsL5TVhhSWt66U8yBfZBmNRksXds-_6wdA

Remember the project ID, a unique name across all Google Cloud projects

(the name above has already been taken and will not work for you, sorry!).

It will be referred to later in this codelab as PROJECT_ID.

Note: If you're using a Gmail account, you can leave the default location set to

No organization. If you're using a G Suite account, then choose a location that

makes sense for your organization.

Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost much, if anything at all.

Be sure to to follow any instructions in the "Cleaning up" section which

advises you how to shut down resources so you don't incur billing beyond this tutorial.

New users of Google Cloud are eligible for the $300USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop,

in this codelab you will be using Cloud Shell, a command line environment running in the Cloud.

Activate Cloud Shell

  1. From the Cloud Console, click Activate Cloud Shell U3KqmcaywP9RcSjMDokVtfw7D3-7w-bFzhVEwwLx4kib1pcVJyZTc1AiQ5uf0x263MlnH23MO7OUTXsXwDdEjvsQsosC_kmZ4xmoFaVUMdTUeD6817oEW6G_cCw7ZzXSKL1z1L0PHQ.

HMjAm7ml5v7v4KhMG9Bptnp1MUiRh9c4M0K95kDG5vOQNCcMgNcGRqi9Z1JFIUpXzduJkQQai7o4T117GgcAx4VvJtm81L3EMO3tSFdo50BSfSh2uaQzMjP6eSLfgEzKVtwFbvWqig

If you've never started Cloud Shell before, you'll be presented with an

intermediate screen (below the fold) describing what it is.

If that's the case, click Continue (and you won't ever see it again).

Here's what that one-time screen looks like:

oT-m4j50PqHO4zJIeW-_AG7XGlu76LUvukyZtWZBb9HIVZ-3PllJ103_NHIrhalNmfHT-2lCJFYJsFUTcCCdSe58ziWkSlBjbpfyN_O0UtoOyQCSL8DgHkYTFV9Txx_osxLryhtayg

It should only take a few moments to provision and connect to Cloud Shell.

je6hS0p4KcC104UEGhKzzcXCjYPcfIWl9u8JYW9pf4ZrsP7xFLT0Ua2XhbPV__h1fpA0Th4YOCWO5RdUtIH9uFUCCBGiplw-Q-d8PgCwvzPx1Ix6dUNu0NRAU9rFHh0-QKohgkhg5w

This virtual machine is loaded with all the development tools you'll need.

It offers a persistent 5GB home directory and runs in Google Cloud,

greatly enhancing network performance and authentication. Much,

if not all, of your work in this codelab can be done with simply a browser or your

Chromebook.

Once connected to Cloud Shell, you should see that you are already authenticated

and that the project is already set to your project ID.

2. Run the following command in Cloud Shell to confirm that you are authenticated:


gcloud auth list


Command output

Credentialed Accounts

ACTIVE  ACCOUNT

*       <my_account>@<my_domain.com>


To set the active account, run:

    $ gcloud config set account `ACCOUNT`


john_iacovacci1@cloudshell:~ (uconn-engr)$ gcloud config list project

[core]

project = uconn-engr


Your active configuration is: [cloudshell-11917]

john_iacovacci1@cloudshell:~ (uconn-engr)$ 


Note: The gcloud command-line tool is the powerful and unified command-line tool in Google

Cloud. It comes preinstalled in Cloud Shell. You will notice its support for

tab completion. For more information, see gcloud command-line tool overview.

gcloud config list project


Command output

[core]

project = <PROJECT_ID>


If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>


Command output

Updated property [core/project].


3. Environment setup

Before you can begin using the Natural Language API, run the following

command in Cloud Shell to enable the API:

gcloud services enable language.googleapis.com


john_iacovacci1@cloudshell:~ (uconn-engr)$ gcloud services enable language.googleapis.com

john_iacovacci1@cloudshell:~ (uconn-engr)$ 


Now, you can use the Natural Language API!

Navigate to your home directory:

cd ~


mkdir nlp

john_iacovacci1@cloudshell:~ (uconn-engr)$ mkdir nlp

john_iacovacci1@cloudshell:~ (uconn-engr)$ cd nlp

john_iacovacci1@cloudshell:~/nlp (uconn-engr)$ 


Create a Python virtual environment to isolate the dependencies:

Python virtual environments are useful for a number of reasons, including:

  • Separating dependencies

  • Virtual environments keep packages for different projects isolated so

  • they don't conflict. For example, if you're working on two projects

  • that require different versions of a package, you can create a virtual

  • environment for each project to keep their versions separate. 

  •  

  • Ensuring consistent package versions

  • Virtual environments help ensure that the correct package versions

  • are used each time the software runs. 

virtualenv venv-language

john_iacovacci1@cloudshell:~/nlp (uconn-engr)$ virtualenv venv-language

created virtual environment CPython3.12.3.final.0-64 in 2203ms

  creator CPython3Posix(dest=/home/john_iacovacci1/nlp/venv-language, clear=False, no_vcs_ignore=False, global=False)

  seeder FromAppData(download=False, pip=bundle, via=copy, app_data_dir=/home/john_iacovacci1/.local/share/virtualenv)

    added seed packages: pip==24.2

  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

john_iacovacci1@cloudshell:~/nlp (uconn-engr)$ 

Activate the virtual environment:

source venv-language/bin/activate

john_iacovacci1@cloudshell:~/nlp (uconn-engr)$ source venv-language/bin/activate


(venv-language) john_iacovacci1@cloudshell:~/nlp (uconn-engr)$ 

To stop using the virtual environment and go back to your system Python

version, you can use the deactivate command.

Install IPython, Pandas, and the Natural Language API client library:

pip install ipython pandas tabulate google-cloud-language


(venv-language) john_iacovacci1@cloudshell:~/nlp (uconn-engr)$ pip install ipython pandas tabulate google-cloud-language


You should see something like this:

...

Installing collected packages: ... pandas ... ipython ... google-cloud-language

Successfully installed ... google-cloud-language-2.11.0 ...


Now, you're ready to use the Natural Language API client library!

If you're setting up your own Python development environment outside of

Cloud Shell, you can follow these guidelines.

In the next steps, you'll use an interactive Python interpreter called IPython, which you installed in the previous step. Start a session by running ipython in Cloud Shell:

ipython


(venv-language) john_iacovacci1@cloudshell:~/nlp (uconn-engr)$ ipython

Python 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]

Type 'copyright', 'credits' or 'license' for more information

IPython 8.29.0 -- An enhanced Interactive Python. Type '?' for help.


In [1]: 

4. Sentiment analysis

Sentiment analysis inspects the given text and identifies the prevailing emotional

opinions within the text, especially to determine expressed sentiments as positive,

negative, or neutral, both at the sentence and the document levels.

It is performed with the analyze_sentiment method which returns an AnalyzeSentimentResponse.


Copy the following code into your IPython session:

====================================================================

from google.cloud import language


def analyze_text_sentiment(text: str) -> language.AnalyzeSentimentResponse:

    client = language.LanguageServiceClient()

    document = language.Document(

        content=text,

        type_=language.Document.Type.PLAIN_TEXT,

    )

    return client.analyze_sentiment(document=document)


def show_text_sentiment(response: language.AnalyzeSentimentResponse):

    import pandas as pd


    columns = ["score", "sentence"]

    data = [(s.sentiment.score, s.text.content) for s in response.sentences]

    df_sentence = pd.DataFrame(columns=columns, data=data)


    sentiment = response.document_sentiment

    columns = ["score", "magnitude", "language"]

    data = [(sentiment.score, sentiment.magnitude, response.language)]

    df_document = pd.DataFrame(columns=columns, data=data)


    format_args = dict(index=False, tablefmt="presto", floatfmt="+.1f")

    print(f"At sentence level:\n{df_sentence.to_markdown(**format_args)}")

    print()

    print(f"At document level:\n{df_document.to_markdown(**format_args)}")

   ========================================================

Perform an analysis:

========================================================

# Input
text = """
Python is a very readable language, which makes it easy to understand and maintain code.
It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks.
One disadvantage is its speed: it's not as fast as some other programming languages.
"""

# Send a request to the API
analyze_sentiment_response = analyze_text_sentiment(text)

# Show the results
show_text_sentiment(analyze_sentiment_response)



At sentence level:

   score | sentence

---------+------------------------------------------------------------------------------------------

    +0.8 | Python is a very readable language, which makes it easy to understand and maintain code.

    +0.9 | It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks.

    -0.4 | One disadvantage is its speed: it's not as fast as some other programming languages.


At document level:

   score |   magnitude | language

---------+-------------+------------

    +0.4 |        +2.2 | en


In [6]: 


Take a moment to test your own sentences.

Information on which languages are supported by the Natural Language API,

see Language Support

The score of the sentiment ranges between -1.0 (negative) and +1.0 (positive)

and corresponds to the overall sentiment from the given information.

The magnitude of the sentiment ranges from 0.0 to +inf and indicates the overall

strength of sentiment from the given information.

The more information provided, the higher the magnitude.

For more information on how to interpret the score and magnitude sentiment

values included in the analysis, see Interpreting sentiment analysis values.

Each API response returns the document automatically-detected language

(in ISO-639-1). It is shown here and will be skipped in the next analysis examples.

Summary

In this step, you were able to perform sentiment analysis on a string of text!

5. Entity analysis


Entity analysis inspects the given text for known entities (proper nouns such as

public figures, landmarks, etc.), and returns information about those entities.

It is performed with the analyze_entities method which returns an AnalyzeEntitiesResponse.

Copy the following code into your IPython session:


=====================================================

from google.cloud import language


def analyze_text_entities(text: str) -> language.AnalyzeEntitiesResponse:

    client = language.LanguageServiceClient()

    document = language.Document(

        content=text,

        type_=language.Document.Type.PLAIN_TEXT,

    )

    return client.analyze_entities(document=document)


def show_text_entities(response: language.AnalyzeEntitiesResponse):

    import pandas as pd


    columns = ("name", "type", "salience", "mid", "wikipedia_url")

    data = (

        (

            entity.name,

            entity.type_.name,

            entity.salience,

            entity.metadata.get("mid", ""),

            entity.metadata.get("wikipedia_url", ""),

        )

        for entity in response.entities

    )

    df = pd.DataFrame(columns=columns, data=data)

    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

 ======================================================

  

Perform an analysis:

# Input

text = """Guido van Rossum is best known as the creator of Python,

which he named after the Monty Python comedy troupe.

He was born in Haarlem, Netherlands.

"""


# Send a request to the API

analyze_entities_response = analyze_text_entities(text)


# Show the results

show_text_entities(analyze_entities_response)

You should see an output like the following:

name             | type         |   salience | mid       | wikipedia_url

------------------+--------------+------------+-----------+-------------------------------------------------------------

 Guido van Rossum | PERSON       |        50% | /m/01h05c | https://en.wikipedia.org/wiki/Guido_van_Rossum

 Python           | ORGANIZATION |        38% | /m/05z1_  | https://en.wikipedia.org/wiki/Python_(programming_language)

 creator          | PERSON       |         5% |           |

 Monty Python     | PERSON       |         3% | /m/04sd0  | https://en.wikipedia.org/wiki/Monty_Python

 comedy troupe    | PERSON       |         2% |           |

 Haarlem          | LOCATION     |         1% | /m/0h095  | https://en.wikipedia.org/wiki/Haarlem

 Netherlands      | LOCATION     |         1% | /m/059j2  | https://en.wikipedia.org/wiki/Netherlands


Take a moment to test your own sentences mentioning other entities.

  • For information on which languages are supported by this method, see Language Support.

  • The type of the entity is an enum that lets you classify or differentiate entities. For example, this can help distinguish the similarly named entities "T.E. Lawrence" (a PERSON) from "Lawrence of Arabia" (the film, tagged as a WORK_OF_ART). See Entity.Type.

  • The entity salience indicates the importance or relevance of this entity to the entire document text. This score can assist information retrieval and summarization by prioritizing salient entities. Scores closer to 0.0 are less important, while scores closer to 1.0 are highly important.

  • For more information, see Entity analysis.

  • You can also combine both entity analysis and sentiment analysis with the analyze_entity_sentiment method. See Entity sentiment analysis.

Summary

In this step, you were able to perform entity analysis!


6. Syntax analysis


Syntax analysis extracts linguistic information, breaking up the given text into

a series of sentences and tokens (generally based on word boundaries),

providing further analysis on those tokens. It is performed with the analyze_syntax method which returns an AnalyzeSyntaxResponse.


Copy the following code into your IPython session:



from typing import Optional

from google.cloud import language


def analyze_text_syntax(text: str) -> language.AnalyzeSyntaxResponse:

    client = language.LanguageServiceClient()

    document = language.Document(

        content=text,

        type_=language.Document.Type.PLAIN_TEXT,

    )

    return client.analyze_syntax(document=document)


def get_token_info(token: Optional[language.Token]) -> list[str]:

    parts = [

        "tag",

        "aspect",

        "case",

        "form",

        "gender",

        "mood",

        "number",

        "person",

        "proper",

        "reciprocity",

        "tense",

        "voice",

    ]

    if not token:

        return ["token", "lemma"] + parts


    text = token.text.content

    lemma = token.lemma if token.lemma != token.text.content else ""

    info = [text, lemma]

    for part in parts:

        pos = token.part_of_speech

        info.append(getattr(pos, part).name if part in pos else "")


    return info


def show_text_syntax(response: language.AnalyzeSyntaxResponse):

    import pandas as pd


    tokens = len(response.tokens)

    sentences = len(response.sentences)

    columns = get_token_info(None)

    data = (get_token_info(token) for token in response.tokens)

    df = pd.DataFrame(columns=columns, data=data)

    # Remove empty columns

    empty_columns = [col for col in df if df[col].eq("").all()]

    df.drop(empty_columns, axis=1, inplace=True)


    print(f"Analyzed {tokens} token(s) from {sentences} sentence(s):")

    print(df.to_markdown(index=False, tablefmt="presto"))

    

Perform an analysis:



# Input

text = """Guido van Rossum is best known as the creator of Python.

He was born in Haarlem, Netherlands.

"""


# Send a request to the API

analyze_syntax_response = analyze_text_syntax(text)


# Show the results

show_text_syntax(analyze_syntax_response)

You should see an output like the following:



Analyzed 20 token(s) from 2 sentence(s):

 token       | lemma   | tag   | case       | gender    | mood       | number   | person   | proper   | tense   | voice

-------------+---------+-------+------------+-----------+------------+----------+----------+----------+---------+---------

 Guido       |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |

 van         |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |

 Rossum      |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |

 is          | be      | VERB  |            |           | INDICATIVE | SINGULAR | THIRD    |          | PRESENT |

 best        | well    | ADV   |            |           |            |          |          |          |         |

 known       | know    | VERB  |            |           |            |          |          |          | PAST    |

 as          |         | ADP   |            |           |            |          |          |          |         |

 the         |         | DET   |            |           |            |          |          |          |         |

 creator     |         | NOUN  |            |           |            | SINGULAR |          |          |         |

 of          |         | ADP   |            |           |            |          |          |          |         |

 Python      |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |

 .           |         | PUNCT |            |           |            |          |          |          |         |

 He          |         | PRON  | NOMINATIVE | MASCULINE |            | SINGULAR | THIRD    |          |         |

 was         | be      | VERB  |            |           | INDICATIVE | SINGULAR | THIRD    |          | PAST    |

 born        | bear    | VERB  |            |           |            |          |          |          | PAST    | PASSIVE

 in          |         | ADP   |            |           |            |          |          |          |         |

 Haarlem     |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |

 ,           |         | PUNCT |            |           |            |          |          |          |         |

 Netherlands |         | NOUN  |            |           |            | SINGULAR |          | PROPER   |         |

 .           |         | PUNCT |            |           |            |          |          |          |         |

Take a moment to test your own sentences with other syntactic structures.


Extracting syntax information has multiple advantages. One is to extract lemmas.

A lemma contains the "root" word upon which this token is based,

which allows you to manage words with their canonical forms.


If you dive deeper into the response insights, you'll also find the relationships

between the tokens. Here is a visual interpretation showing the complete syntax

analysis for this example, a screenshot from the online Natural Language demo:


For more information, see the following:


AnalyzeSyntaxResponse

Language Support

Syntactic analysis

Morphology & Dependency Trees


Summary

In this step, you were able to perform syntax analysis!



7. Content classification


Content classification analyzes a document and returns a list of content

categories that apply to the text found in the document. It is performed

with the classify_text method which returns a ClassifyTextResponse.


Copy the following code into your IPython session:



from google.cloud import language


def classify_text(text: str) -> language.ClassifyTextResponse:

    client = language.LanguageServiceClient()

    document = language.Document(

        content=text,

        type_=language.Document.Type.PLAIN_TEXT,

    )

    return client.classify_text(document=document)


def show_text_classification(text: str, response: language.ClassifyTextResponse):

    import pandas as pd


    columns = ["category", "confidence"]

    data = ((category.name, category.confidence) for category in response.categories)

    df = pd.DataFrame(columns=columns, data=data)


    print(f"Text analyzed:\n{text}")

    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

    

Perform an analysis:



# Input

text = """Python is an interpreted, high-level, general-purpose programming language.

Created by Guido van Rossum and first released in 1991, Python's design philosophy

emphasizes code readability with its notable use of significant whitespace.

"""


# Send a request to the API

classify_text_response = classify_text(text)


# Show the results

show_text_classification(text, classify_text_response)

You should see an output like the following:



Text analyzed:

Python is an interpreted, high-level, general-purpose programming language.

Created by Guido van Rossum and first released in 1991, Python's design philosophy

emphasizes code readability with its notable use of significant whitespace.


 category                             |   confidence

--------------------------------------+--------------

 /Computers & Electronics/Programming |          99%

 /Science/Computer Science            |          99%


Take a moment to test your own sentences relating to other categories.

Note that you must supply a text block (document) with at least twenty tokens

(words and punctuation signs).


For more information, see the following docs:


ClassifyTextResponse

Language Support

Content Classification


Summary


In this step, you were able to perform content classification!



8. Text moderation


Powered by Google's latest PaLM 2 foundation model, text moderation identifies

a wide range of harmful content, including hate speech, bullying, and sexual harassment.

It is performed with the moderate_text method which returns a ModerateTextResponse.


Copy the following code into your IPython session:



from google.cloud import language


def moderate_text(text: str) -> language.ModerateTextResponse:

    client = language.LanguageServiceClient()

    document = language.Document(

        content=text,

        type_=language.Document.Type.PLAIN_TEXT,

    )

    return client.moderate_text(document=document)


def show_text_moderation(text: str, response: language.ModerateTextResponse):

    import pandas as pd


    def confidence(category: language.ClassificationCategory) -> float:

        return category.confidence


    columns = ["category", "confidence"]

    categories = sorted(response.moderation_categories, key=confidence, reverse=True)

    data = ((category.name, category.confidence) for category in categories)

    df = pd.DataFrame(columns=columns, data=data)


    print(f"Text analyzed:\n{text}")

    print(df.to_markdown(index=False, tablefmt="presto", floatfmt=".0%"))

    

Perform an analysis:



# Input

text = """I have to read Ulysses by James Joyce.

I'm a little over halfway through and I hate it.

What a pile of garbage!

"""


# Send a request to the API

response = moderate_text(text)


# Show the results

show_text_moderation(text, response)

You should see an output like the following:



Text analyzed:

I have to read Ulysses by James Joyce.

I'm a little over halfway through and I hate it.

What a pile of garbage!


 category              |   confidence

-----------------------+--------------

 Toxic                 |          67%

 Insult                |          58%

 Profanity             |          53%

 Violent               |          48%

 Illicit Drugs         |          29%

 Religion & Belief     |          27%

 Politics              |          22%

 Death, Harm & Tragedy |          21%

 Finance               |          18%

 Derogatory            |          14%

 Firearms & Weapons    |          11%

 Health                |          10%

 Legal                 |          10%

 War & Conflict        |           7%

 Public Safety         |           5%

 Sexual                |           4%

Take a moment to test your own sentences.


For more information, see the following docs:


ModerateTextResponse

Language Support

Moderating Text

Summary

In this step, you were able to perform text moderation!


No comments:

Post a Comment

Disable Billing

Search for Billing Manage billing accounts Go to MYPROJECTS CLICK ON THE 3 BUTTON Actions Then hit disable