UCONN

UCONN
UCONN

Chapter 15: Cloud Natural Language

Chapter 15. Cloud Natural Language


  • An overview of natural language processing

  • How the Cloud Natural Language API works

  • The different types of analysis supported by Cloud Natural Language

  • How Cloud Natural Language pricing is calculated

  • An example to suggest hashtags



Google NLP



NLP explained


Natural Language Processing

Gives machines the ability to read, understand and 

derive meaning from language from human languages


Combines field of linguistics and computer science 

To decipher language structure and guidelines and to make models which can comprehend breakdown and separate significant details from text and speech


Give machines the ability to mimic human logistic behavior 


1. Segmentation <<< break data into sentences

2. Tokenizing <<< break sentence into words

3. Stop Words <<< mark down 'Verb to be., prepositions, ...etc...'

4. Stemming <<< same words with different prefix or suffix

5. Lemmatization <<< learning that multiple words can have the same meaning (is, am, are >>> be)

6. Speech Tagging <<< adding tags to words ( noun, verb, preposition)

7. Name Entity Tagging <<< introduce machine to some group of words that may occur in documents

8. Machine Learning (ex. naive bayes classification) <<< learning the human sentiment and speech







Sentiment Analysis


Train a custom sentiment model

Analyse sentiment using Labeled data set

Sentiment scored labeled by person


Need at least 100 examples to train



Google BERT


Knowing words and understanding meaning


Model learns context


Understand that different words have different meanings 

depending upon what is around them


Order really matters



Hymie the Robot


Get Smart TV show



Bag of Words


Text data is transformed into numerical representations that machines can understand. 

Each document is converted into a vector based on word counts. 


Emphasize important words and reduce noise in data. 


Brand Awareness


Natural language processing is the act of taking text content as input and deriving some structured meaning or understanding from it as output. 


Take the sentence “I’m going to the mall” and derive 

{action: "going", target: "mall"}.


Joe drives his Broncos to work. Sentence is ambiguous.

Joe forces his bronco horses to his workplace, or he gets in one of the many Ford Bronco cars he owns.


Cloud Natural Language API  uses machine learning to process text content.


Results are best guesses—treat the output as suggestions.

15.1. How does the Natural Language API work?

Natural Language API is a stateless API where you send it some input 

(in this case the input is text), and the API returns some set of annotations about the text.


NL API can annotate three features of input text:


Syntax - parse a document into sentences,

 finding “tokens” along the way.


These tokens would have a part of speech, canonical form of the token.


Entities—look at each token individually and do a lookup in Google’s knowledge graph to associate the two.  pointer to a specific entity in the knowledge graph. 


Concept of salience (or “prominence”), you’ll be able to see whether

the sentence is focused on Barack Obama or whether he’s mentioned in passing.


Sentiment—ability to understand the emotional content involved in a chunk of text and recognize that a given sentence expresses positive or negative emotion.

Sentiment in Politics

Political Campaigns

AI in Politics 

Obama


In the 2008 presidential election, the Democratic National Committee (DNC) of the USA used their mobilization programs among supporters, to cover up the participation among citizens, stakeholders, etc. and to pre-determine the statistics of election status. 

Obama uses Web based platforms, sharing through social media, and smart phones for his supporters to make them participate in the political processes of his election campaign for each voter.

To register, mobilize, or persuade any supporter during the campaign of a party, using a mobile application could be a better solution for them. Obama uses this technology for volunteer activity of most active supporters, canvassers, citizens, stakeholders to make them notice about his approach, to make them hear his speech and taking back a statistical report in return, such as rating, from them as suggestions without ringing the doorbell for their home during the campaigning period.


RNC Data Campaign


Values should be treated as somewhat “fuzzy”—even our human brains

can’t necessarily come up with perfectly correct answers.

15.2. Sentiment analysis

Recognizing the sentiment or emotion of what is said. 

Humans, we can generally tell whether a given sentence is happy or sad.


“I like this car”  would be considered to be positive.


“This car is ugly” would likely be considered to be “negative.”


Neutral sentence “This is a car.”.


Need to track both the sentiment itself as well as the magnitude of the overall sentiment.


Table 15.1. Comparing sentences with similar sentiment and different magnitudes

Sentence

Sentiment

Magnitude

“This car is really pretty.”

Positive

High

“This car is ugly.”

Negative

High

“This car is pretty. It also gets terrible gas mileage.”

Neutral

High

“This is a car.”

Neutral

Low

The overall sentiment as a vector, which conveys both a rating of the

positivity (or negativity), and a magnitude, which expresses how strongly

that sentiment is expressed. 


Overall sentiment and magnitude, add the two vectors to get a final vector.




Where the positive and negative cancel each other out, the magnitude can help distinguish between a truly unemotional input and one where positivity and negativity neutralize one another.



Score is close to zero, the magnitude value will represent how much emotion actually went into it.


Magnitude will be a number greater than zero, with zero meaning that 

the statement was truly neutral. 


Enable the Natural Language API using the Cloud Console. 


Overall sentiment of that sentence was moderately positive.

Machine-learning APIs, the algorithms and underlying systems that generate the outputs are constantly learning and improving.


See something like the following:

Results for "This is a car.":

Score:     0.20000000298023224

Magnitude: 0.20000000298023224

Results for "This car is nice. It also gets terrible gas mileage!":

Score:     0

 Magnitude: 1.2999999523162842

The “neutral” sentence had quite a bit of emotion.

Thought to be a neutral statement (“This is a car”) is rated slightly positive overall,


 Judging the sentiment of content is a bit of a fuzzy process


from google.cloud import language


def analyze_text_sentiment(text: str) -> language.AnalyzeSentimentResponse:

    client = language.LanguageServiceClient()

    document = language.Document(

        content=text,

        type_=language.Document.Type.PLAIN_TEXT,

    )

    return client.analyze_sentiment(document=document)


def show_text_sentiment(response: language.AnalyzeSentimentResponse):

    import pandas as pd


    columns = ["score", "sentence"]

    data = [(s.sentiment.score, s.text.content) for s in response.sentences]

    df_sentence = pd.DataFrame(columns=columns, data=data)


    sentiment = response.document_sentiment

    columns = ["score", "magnitude", "language"]

    data = [(sentiment.score, sentiment.magnitude, response.language)]

    df_document = pd.DataFrame(columns=columns, data=data)


    format_args = dict(index=False, tablefmt="presto", floatfmt="+.1f")

    print(f"At sentence level:\n{df_sentence.to_markdown(**format_args)}")

    print()

    print(f"At document level:\n{df_document.to_markdown(**format_args)}")


In [9]: text = """

   ...: This car is nice.

   ...: """

   ...: 


In [10]: analyze_sentiment_response = analyze_text_sentiment(text)


In [11]: show_text_sentiment(analyze_sentiment_response)

At sentence level:

   score | sentence

---------+-------------------

    +0.8 | This car is nice.


At document level:

   score |   magnitude | language

---------+-------------+------------

    +0.8 |        +0.8 | en


In [12]: 


15.3. Entity recognition


Entity Recognition NER


Special entities, such as people, places, organizations, works of art, or anything else you’d consider a proper noun.


Parsing the sentence for tokens and comparing those tokens against the entities thatGoogle has stored in its knowledge graph.


API is able to distinguish between terms that could be special, depending on their use (such as “blackberry” the fruit versus “Blackberry” the phone).

Entity detection to determine which entities are present in your input.


Four distinct entities: Barack Obama, iPhone, Blackberry, and Hawaii.


Natural Language API can distinguish between differing levels of prominence.


Rank things according to how important they are in the sentence 

Rather than seeing the names of the entities, you’ll see the entity raw content,


Salience is a linguistic term that refers to how important a word or phrase is in a text. In natural language processing (NLP), entity salience is a metric that measures how prominent an entity is in a text. 

 

Here's some more information about salience in NLP:

Entity salience scores

These scores are a prediction of what a human reader would consider to be the most important entities in a text. The scores are relative to the text and range from 0 to 1, with higher scores indicating greater importance. 


What effect does the phrasing have on salience? 


const inputs = [

  'Barack Obama prefers an iPhone over a Blackberry when in Hawaii.',

  'When in Hawaii an iPhone, not a Blackberry, is Barack Obama\'s

      preferred device.',

Different values turn out to be given different phrasing of similar sentences. 


For the sentence "Barack Obama prefers an iPhone over a Blackberry when in  Hawaii."


The most important entity is: Barack Obama (0.5521853566169739)


For the sentence "When in Hawaii an iPhone, not a Blackberry, is Barack

     Obama's preferred device."


from google.cloud import language

def analyze_text_entities(text):

    client = language.LanguageServiceClient()

    document = language.Document(content=text, type_=language.Document.Type.PLAIN_TEXT)


    response = client.analyze_entities(document=document)


    for entity in response.entities:

        print("=" * 80)

        results = dict(

            name=entity.name,

            type=entity.type_.name,

            salience=f"{entity.salience:.1%}",

            wikipedia_url=entity.metadata.get("wikipedia_url", "-"),

            mid=entity.metadata.get("mid", "-"),

        )

        for k, v in results.items():

            print(f"{k:15}: {v}")


In [5]: analyze_entities_response = analyze_text_entities(text)

   ...: 

=========================================================

name           : iPhone

type           : CONSUMER_GOOD

salience       : 67.9%

wikipedia_url  : https://en.wikipedia.org/wiki/IPhone

mid            : /m/027lnzs

================================================================================

name           : Barack Obama

type           : PERSON

salience       : 15.1%

wikipedia_url  : https://en.wikipedia.org/wiki/Barack_Obama

mid            : /m/02mjmr

================================================================================

name           : Blackberry

type           : ORGANIZATION

salience       : 10.0%

wikipedia_url  : https://en.wikipedia.org/wiki/Blackberry

mid            : /g/120z183t

================================================================================

name           : Hawaii

type           : LOCATION

salience       : 6.9%

wikipedia_url  : https://en.wikipedia.org/wiki/Hawaii

mid            : /m/03gh4


In [6]: 


5.6. Case study: suggesting InstaSnap hash-tags

NL API is able to take some textual input and come up with both a sentiment analysis as well as the entities in the input.


A post’s caption as input text and send it to the Natural Language API. 


Next, the Natural Language API would send back both sentiment and any detected entities.


After that, you’d have to coerce some of the results into a format that’s

useful in this scenario; display a list of suggested tags to the user.



come up with some suggested tags should look simple

Summary

The Natural Language API is a powerful textual analysis service.

If you need to discover details about text in a scalable way, the Natural

Language API is likely a good fit for you.


The API can analyze text for entities (people, places, organizations),

syntax (tokenizing and diagramming sentences), and sentiment

(understanding the emotional content of text).


As with all machine learning today, the results from this API should

be treated as suggestions rather than absolute fact

(after all, it can be tough for people to decide whether a given sentence

is happy or sad).


No comments:

Post a Comment

Assignment #12 due 12/12/25

  Build 4 graphs using machine learning - linear regression I want two separate publicly traded companies e.g. AAPL & AMZN Linear regres...