Natural Language Processing
Natural Language Processing (NLP) is a critical part of machine learning and AI. It allows for computers to get a conceptual understanding of text from any source. At its core NLP functionality identifies the entities within the text and the sentiment, positive or negative, of the text.
Natural Language Processing is a branch of Artificial Intelligence that bridges the gap between human communication and computer understanding. By combining linguistics and computer science, NLP allows machines to read, decipher, and derive meaning from human languages.
For example we can take any random block of text and use the google nlp demo to try it out.
Take for example a short article on oil prices.
Oil prices could hit $200 per barrel if the war in Iran persists through the end of June, according to strategists from Macquarie Group.
If the war were to stretch well into summer, the strategists wrote in a client note on Wednesday, prices would need to move high enough to "destroy an historically large amount of global oil demand," likely requiring Brent crude prices above $200 per barrel and pushing US gasoline prices up to roughly $7 per gallon.
If we go to Google's NLP site we can try NLP by pasting the article into the input box.
Google's Natural Language site
Paste the article into the text box and hit analyse
It will identify all entities
Entity recognition
Google’s NLP is designed to identify entities within a block of text. By definition, an entity is a phrase or noun representing an object or concept. Entity analysis is used to classify the entity into various categories.
Classifications include a person, location, organization, event, consumer good, price, number and others.
In the example on oil prices we get the following entity classifications
Some of the uses of entity recognition is that you can aggregate or search massive amounts of articles and look for organizations like Google / Alphabet or person like Sundar Pichai or locations like Mountain View.
Another use case can be in call center routing. If a person calls in regarding problems with a laptop in New York the call can be routed to the correct support group. Finance applications can search for tickers, medical research applications can search for drugs and so forth.
Entities observe objects in the real world.
Once the text is broken down into words Google uses the Knowledge Graph, a large interconnected encyclopedia for AI to determine the context of the word. Think Apple the company versus apple the fruit.
Salience or Prominence is a numerical score based upon how important an entity is in the block of text.
Google AI analyzes the text and looks at how words are related to determine the salience score or how important the entity is to the block of text.
Scores can be based upon if it the entity is in the headline or early in the text, used frequently in the document or main subject in many subjects.
A high salience score close to 1.0 indicates a primary subject close to .5 would be a supporting topic and less than .1 is a subject briefly mentioned.
Sentiment
Sentiment analysis provides the emotion or feeling of the block of text such as an article or paragraph. The sentiment score measures the sentiment with positive, negative or neutral scores ranging from -1.0 to +1.0 with -1.0 being very negative to +1.0 very positive. Hence, Sentiment allows the determination that the text expresses positive or negative emotion.
The magnitude score measures the strength of the emotion or the passion behind the text. When strong
In the context of Google Cloud NLP, the magnitude is driven by the strength and frequency of emotional markers. While the model looks at entire phrases and syntax, specific words and modifiers act as the "volume dial" for these scores. Words like best, great, amazing, love, hate, ugly, terrible, etc correspond to strong magnitude scores. Words like mighty, may, fairly, average, moderate tend to generate low magnitude scores.
While the sentiment score provides the general emotion of text the magnitude shows the strength or the volume of that emotion. High sentiment and high magnitude shows a very strong feeling and intensity while low sentiment and low magnitude and the magnitude low feeling and intensity.
Google’s Natural Language Processing (NLP)
NLP starts with the Segmentation process. This involves taking a large block of text and breaking it down into manageable chunks or sentences.
Once the data is segmented the next process is Tokenization which breaks down the sentences into words or sub-words called tokens.
The next step is to reduce the noise in the data. This is done by the Stop Words process which makes the process run faster and improves the accuracy of the results.
Words like “the”,”is”, “and”, “but”, “in”,”about”, “me”,”we”, “who”, “what”, etc have little impact in determining sentiment so they are removed from the process.
Next, the Stemming process removes the endings of words to find the root. Suffixes like “ing”, “ed”, “ions” help improve the speed of the process.
Alternatively Lemmatization can be used instead of stemming and looks up the word to find the correct root.
Part of Speech (POS) Tagging is used to label words into grammatical categories into nouns, verbs, adjectives and adverbs. It is critical to understand the meaning of the words in content to the sentences.
Named Entity Recognition (NER) described above identifies all entities like organizations, people and locations.
Important for machine learning to understand the context of human language.
Need to understand the tone and intent of connecting words.
Google AI introduced BERT (Bidirectional Encoder Representations from Transformers) which addresses these issues.
Look at the entire string to determine how words relate to each other.
Allowed the model to handle complex conversations.
By knowing words and understanding meaning the model learns context and understands that different words have different meanings depending upon what is around them.
Order really matters.
Get Smart TV show
NLP is vital for AI applications, some of these applications are listed below:
Automating email boxes and properly route incoming emails.
Accurately translate communication to solve language barriers.
For customer service AI can summarize feedback.
AI Chat-bots can provide level 1 customer support and proving information to users without waiting for a human.
Using voice commands to retrieve directions or information.
President Obama's 2008 campaign used technology to mobilize voter turnout.
The developed platform helped coordinate supporters and use mobile app to register and communicate with voters.
Using social media the campaign was able to understand the concerns or voters in geographic areas as well as topics important to certain democratic groups.
No comments:
Post a Comment