UCONN

UCONN
UCONN

Chapter 15: Cloud Natural Language

Chapter 15. Cloud Natural Language


  • An overview of natural language processing

  • How the Cloud Natural Language API works

  • The different types of analysis supported by Cloud Natural Language

  • How Cloud Natural Language pricing is calculated

  • An example to suggest hashtags



NLP


NLP explained


Sentiment Analysis


Text


Bag of Words


Brand Awareness


Natural language processing is the act of taking text content as input and deriving some structured meaning or understanding from it as output. 


Take the sentence “I’m going to the mall” and derive 

{action: "going", target: "mall"}.


Joe drives his Broncos to work. Sentence is ambiguous.

Joe forces his bronco horses to his workplace, or he gets in one of the many Ford Bronco cars he owns.


Cloud Natural Language API  uses machine learning to process text content.


Results are best guesses—treat the output as suggestions.

15.1. How does the Natural Language API work?

Natural Language API is a stateless API where you send it some input 

(in this case the input is text), and the API returns some set of annotations about the text.


NL API can annotate three features of input text:


Syntax - parse a document into sentences,

 finding “tokens” along the way.


These tokens would have a part of speech, canonical form of the token.


Entities—look at each token individually and do a lookup in Google’s knowledge graph to associate the two.  pointer to a specific entity in the knowledge graph. 


Concept of salience (or “prominence”), you’ll be able to see whether

the sentence is focused on Barack Obama or whether he’s mentioned in passing.


Sentiment—ability to understand the emotional content involved in a chunk of text and recognize that a given sentence expresses positive or negative emotion.

Sentiment in Politics

Political Campaigns

AI in Politics

Obama


In the 2008 presidential election, the Democratic National Committee (DNC) of the USA used their mobilization programs among supporters, to cover up the participation among citizens, stakeholders, etc. and to pre-determine the statistics of election status. 


Obama uses Web based platforms, sharing through social media, and smart phones for his supporters to make them participate in the political processes of his election campaign for each voter.


To register, mobilize, or persuade any supporter during the campaign of a party, using a mobile application could be a better solution for them. Obama uses this technology for volunteer activity of most active supporters, canvassers, citizens, stakeholders to make them notice about his approach, to make them hear his speech and taking back a statistical report in return, such as rating, from them as suggestions without ringing the doorbell for their home during the campaigning period.


Infoworld Obama 2012

Trump

RNC Data Campaign


Values should be treated as somewhat “fuzzy”—even our human brains

can’t necessarily come up with perfectly correct answers.

15.2. Sentiment analysis

Recognizing the sentiment or emotion of what is said. 

Humans, we can generally tell whether a given sentence is happy or sad.


“I like this car”  would be considered to be positive.


“This car is ugly” would likely be considered to be “negative.”


Neutral sentence “This is a car.”.


Need to track both the sentiment itself as well as the magnitude of the overall sentiment.


Table 15.1. Comparing sentences with similar sentiment and different magnitudes

Sentence

Sentiment

Magnitude

“This car is really pretty.”

Positive

High

“This car is ugly.”

Negative

High

“This car is pretty. It also gets terrible gas mileage.”

Neutral

High

“This is a car.”

Neutral

Low

The overall sentiment as a vector, which conveys both a rating of the

positivity (or negativity), and a magnitude, which expresses how strongly

that sentiment is expressed. 


Overall sentiment and magnitude, add the two vectors to get a final vector.




Where the positive and negative cancel each other out, the magnitude can help distinguish between a truly unemotional input and one where positivity and negativity neutralize one another.



Score is close to zero, the magnitude value will represent how much emotion actually went into it.


Magnitude will be a number greater than zero, with zero meaning that 

the statement was truly neutral. 


Enable the Natural Language API using the Cloud Console. 


Overall sentiment of that sentence was moderately positive.

Machine-learning APIs, the algorithms and underlying systems that generate the outputs are constantly learning and improving.


See something like the following:

Results for "This is a car.":

Score:     0.20000000298023224

Magntiude: 0.20000000298023224

Results for "This car is nice. It also gets terrible gas mileage!":

Score:     0

 Magntiude: 1.2999999523162842

The “neutral” sentence had quite a bit of emotion.

Thought to be a neutral statement (“This is a car”) is rated slightly positive overall,


 Judging the sentiment of content is a bit of a fuzzy process


from google.cloud import language


def analyze_text_sentiment(text: str) -> language.AnalyzeSentimentResponse:

    client = language.LanguageServiceClient()

    document = language.Document(

        content=text,

        type_=language.Document.Type.PLAIN_TEXT,

    )

    return client.analyze_sentiment(document=document)


def show_text_sentiment(response: language.AnalyzeSentimentResponse):

    import pandas as pd


    columns = ["score", "sentence"]

    data = [(s.sentiment.score, s.text.content) for s in response.sentences]

    df_sentence = pd.DataFrame(columns=columns, data=data)


    sentiment = response.document_sentiment

    columns = ["score", "magnitude", "language"]

    data = [(sentiment.score, sentiment.magnitude, response.language)]

    df_document = pd.DataFrame(columns=columns, data=data)


    format_args = dict(index=False, tablefmt="presto", floatfmt="+.1f")

    print(f"At sentence level:\n{df_sentence.to_markdown(**format_args)}")

    print()

    print(f"At document level:\n{df_document.to_markdown(**format_args)}")


In [9]: text = """

   ...: This car is nice.

   ...: """

   ...: 


In [10]: analyze_sentiment_response = analyze_text_sentiment(text)


In [11]: show_text_sentiment(analyze_sentiment_response)

At sentence level:

   score | sentence

---------+-------------------

    +0.8 | This car is nice.


At document level:

   score |   magnitude | language

---------+-------------+------------

    +0.8 |        +0.8 | en


In [12]: 


15.3. Entity recognition

Special entities, such as people, places, organizations, works of art, or anything else you’d consider a proper noun.


Parsing the sentence for tokens and comparing those tokens against the entities thatGoogle has stored in its knowledge graph.


API is able to distinguish between terms that could be special, depending on their use (such as “blackberry” the fruit versus “Blackberry” the phone).

Entity detection to determine which entities are present in your input.


Four distinct entities: Barack Obama, iPhone, Blackberry, and Hawaii.


Natural Language API can distinguish between differing levels of prominence.


Rank things according to how important they are in the sentence 

Rather than seeing the names of the entities, you’ll see the entity raw content,


Salience is a linguistic term that refers to how important a word or phrase is in a text. In natural language processing (NLP), entity salience is a metric that measures how prominent an entity is in a text. 

 

Here's some more information about salience in NLP:

Entity salience scores

These scores are a prediction of what a human reader would consider to be the most important entities in a text. The scores are relative to the text and range from 0 to 1, with higher scores indicating greater importance. 


What effect does the phrasing have on salience? 


const inputs = [

  'Barack Obama prefers an iPhone over a Blackberry when in Hawaii.',

  'When in Hawaii an iPhone, not a Blackberry, is Barack Obama\'s

      preferred device.',

Different values turn out to be given different phrasing of similar sentences. 


For the sentence "Barack Obama prefers an iPhone over a Blackberry when in  Hawaii."


The most important entity is: Barack Obama (0.5521853566169739)


For the sentence "When in Hawaii an iPhone, not a Blackberry, is Barack

     Obama's preferred device."


from google.cloud import language

def analyze_text_entities(text):

    client = language.LanguageServiceClient()

    document = language.Document(content=text, type_=language.Document.Type.PLAIN_TEXT)


    response = client.analyze_entities(document=document)


    for entity in response.entities:

        print("=" * 80)

        results = dict(

            name=entity.name,

            type=entity.type_.name,

            salience=f"{entity.salience:.1%}",

            wikipedia_url=entity.metadata.get("wikipedia_url", "-"),

            mid=entity.metadata.get("mid", "-"),

        )

        for k, v in results.items():

            print(f"{k:15}: {v}")


In [5]: analyze_entities_response = analyze_text_entities(text)

   ...: 

=========================================================

name           : iPhone

type           : CONSUMER_GOOD

salience       : 67.9%

wikipedia_url  : https://en.wikipedia.org/wiki/IPhone

mid            : /m/027lnzs

================================================================================

name           : Barack Obama

type           : PERSON

salience       : 15.1%

wikipedia_url  : https://en.wikipedia.org/wiki/Barack_Obama

mid            : /m/02mjmr

================================================================================

name           : Blackberry

type           : ORGANIZATION

salience       : 10.0%

wikipedia_url  : https://en.wikipedia.org/wiki/Blackberry

mid            : /g/120z183t

================================================================================

name           : Hawaii

type           : LOCATION

salience       : 6.9%

wikipedia_url  : https://en.wikipedia.org/wiki/Hawaii

mid            : /m/03gh4


In [6]: 


15.4. Syntax analysis

Diagram a sentence to point out the various parts of speech such as the phrases, verbs, nouns, participles, adverbs.


Dependency graphs, which allow you to see the core of the sentence and push modifiers and other nonessential information to the side. 


The farmers gave their kids fresh vegetables.



Dependency graph given the same sentence as input. The API offers the ability to build a syntax tree to make it easier to build your own machine-learning algorithms on natural language inputs.


Detected whether a sentence made sense.

API works by first parsing the input for sentences, tokenizing the sentence,

recognizing the part of speech of each word, and building a tree of how all the words fit together in the sentence.


Table of the dependency graph, 

Table 15.2. Comparing sentences with similar sentiment and different magnitudes

Index

Text

Parent

0

’The’

1 (‘farmers’)

1

’farmers’

2 (‘gave’)

2

’gave’

2 (‘gave’)

3

’their’

4 (‘kids’)

4

’kids’

2 (‘gave’)

5

’fresh’

6 (‘vegetables’)

6

’vegetables’

2 (‘gave’)

7

’.’

2 (‘gave’)



Dependency tree 

15.5. Understanding pricing

Cloud Natural Language API charges based on the usage.

The amount of text sent for analysis, with different rates for the different types of analysis.


Send a long document for entity recognition, it’d be billed as the number of

1,000 character chunks needed to fit the entire document (Math.ceil(document.length / 1000.0)).

5.6. Case study: suggesting InstaSnap hash-tags

NL API is able to take some textual input and come up with both a sentiment analysis as well as the entities in the input.


A post’s caption as input text and send it to the Natural Language API. 


Next, the Natural Language API would send back both sentiment and any detected entities.


After that, you’d have to coerce some of the results into a format that’s

useful in this scenario; display a list of suggested tags to the user.



come up with some suggested tags should look simple

Summary

The Natural Language API is a powerful textual analysis service.

If you need to discover details about text in a scalable way, the Natural

Language API is likely a good fit for you.


The API can analyze text for entities (people, places, organizations),

syntax (tokenizing and diagramming sentences), and sentiment

(understanding the emotional content of text).


As with all machine learning today, the results from this API should

be treated as suggestions rather than absolute fact

(after all, it can be tough for people to decide whether a given sentence

is happy or sad).

Google NLP


No comments:

Post a Comment

Disable Billing

Search for Billing Manage billing accounts Go to MYPROJECTS CLICK ON THE 3 BUTTON Actions Then hit disable