Every time you ask Alexa (or Siri or Cortana, or Google…) to perform a task and it responds in a manner that is almost human, we take it for granted. But there’s a lot that goes on under the hood, and this article takes a closer look at the AI technology that makes human-machine interactions possible: NLP. (Natural Language Processing)
More to AI than Meets the Eye
Human speech is complicated. Not only do we speak in diverse languages like English, Spanish, French, etc. even those who speak a single language natively borrow words and phrases from other languages, almost unconsciously. In addition, there are numerous dialects, and each has its own set of grammar and syntax rules, slang, etc. In addition, we mumble, stammer, and use filler words, like, well, ‘like’. And writing is just as complex with misspelled words, abbreviations, and incorrect or omitted punctuation. Computers on the other hand speak “Machine” a language comprising only zeroes and ones in countless different patterns, bridging the two has only been made possible recently through the availability of big data—and the computing muscle to run complex algorithms to convert communications into zeroes and ones, and produce requested actions.
When a person asks his voice assistant (Siri/Alexa/etc.) to perform an action, they may do so in myriad ways, using different words, phrases, slang, sentence fragments, etc. But the application that runs the voice assistant must understand what is requested by breaking up the language into its basic parts, understand the pieces, and the context that gives it meaning, and respond appropriately…all in the space of a second or two.
How NLP relates to AI
Natural Language Processing is new, but the idea isn’t. The benefits of being able to communicate with machines were always apparent. More than 70 years ago, programmers used punch cards (cards with holes corresponding to zeroes and ones) to interact with computers. It was a tedious and manual process that few understood and fewer could perform. Then in 1952, Bell Labs created the first speech recognition, which could identify all 10 digits. It was called Audrey, but it was slow and thus quickly abandoned.
In 1971, the US DoD agency DARPA (Defense Advanced Research Projects Agency) developed a system, named Harpy, that could recognize more than a thousand words. This was a precursor of real-time speech recognition.
Over the past 50 years, thanks to new technologies, and computing power, coupled with the existence of—and the ability—to handle massive data sets, we have gained the ability to communicate with voice assistants like Siri and Alexa without skipping a beat.
Importance of NLP
Handling large volumes of text
NLP has made it possible for computers to read text, hear speech, break it down into its smallest component parts, understand it, interpret it, and respond. And computers can do it without fatigue or error. This is of huge benefit to many fields of human endeavor, where staggering volumes of unstructured data are generated—from medical records to social media. Automation will enable organizations to go through this data, analyze text, extract the required information, and do so quickly and accurately.
How NLP Works
Step 1. Sentence Segmentation: Breaking down text/speech into its elemental component parts.
Step 2. This is called tokenization: If you’ve ever diagrammed a sentence in school, you’ve done the same job manually. The objective is the same: to understand the component parts, their relationships to each other, and how they work together to create meaning.
Step 3. Stemming & Lemmatization: Stemming simple chops the ends off words to get to the root. Unfortunately, this doesn’t always work, as randomly chopping off the ends sometimes leaves a word fragment that has no meaning. Lemmatization groups words with similar inflextions to get to the root word.
Step 4. Filtering out the noise: We use a number of words that appear frequently and which can be safely filtered out without affecting the statistical analysis.
E.g. words like ‘a’, ‘the’, ‘and’, ‘this, ‘is’…
Step 5. Dependency parsing: This is done to understand how the words in relate to each other
Step 6. Assigning POS (Parts of Speech) Tags: Here each word is linked to its grammatical function to understand context. This is important when you consider how we often use nouns (even proper nouns) as verbs, for instance, “I need to google that.”
Step 7. Named Entity Recognition: In this step, the application detects a named entity, for instance, a person’s name, place, location, organization, etc.
Step 8: Chunking: The opposite of segmentation, chunking involves grouping individual pieces of information to form bigger fragments.
Phases of NLP
These stages can also be interpreted as the following phases
Lexical Analysis: This is the first phase, wherein the source content is scanned and the whole text is divided into paragraphs, sentences, words, and as individual characters
Syntactical Analysis: This is also known as parsing, and is done to understand the relationship between the words and phrases. E.g. Delhi came to John. Clearly, this sentence is incorrect, and is thus rejected by the Syntactic Analyzer
Semantic Analysis: This phase focuses on the literal meaning of the words and phrases
Discourse Integration: In this phase the sentences that precede a sentence and the ones that follow are taken into account to better understand the content
Pragmatic Analysis: This is the final NLP phase. It is done to understand the intended effect through context. For instance: Please open the door, is interpreted as a request, not an order.
NLP Applications
There are many, many general applications for NLP, such as translation, autocomplete, automated speech recognition, and conversational AI/Chatbots. But beyond interacting with Siri or Alexa, NLP offers many practical applications, here are some of the most common.
Spam filtering. If you’ve ever looked through your spam folder and studied the subject lines, you’ll notice a lot of similarities. This is Bayesian spam filtering at work. It is a statistical NLP technique that identifies junk from words in a spam list
Search. Websites that provide a search bar to enable visitors to search for a specific topic are using NLP methods, viz. topic modeling, entity extraction, and/or content categorization.
Transcripts: Popular video channels offer automatic (text) transcripts of the audio, this is NLP at work.
Social media analytics. NLP is routinely used to track awareness and gauge sentiment about topics.
Sentiment analysis
NLP for Fintech:
Insurance and financial services organizations generate a vast amount of data—both structured and unstructured—NLP can help them extract relevant information in situations like long term contracts (with many annexes). NLP is also highly beneficial in risk assessment, enabling the organization to extract relevant data, via NLP processes such as entity recognition as well as historical data, credit and account histories, et al, and estimate a candidate’s loan risk. NLP algorithms can save time and improve accuracy in fraud detection, too, by automating some of the processes involved in reviewing loan documentation. Document classification is another area where fintech can benefit from NLP. Algorithms that can identify content associated with a particular type of document and classify it accordingly, will save auditors hours of manual labor. Fintech industries are also prime candidates for NLP use cases like sentiment analysis—to gauge trust sentiments, by extracting relevant information from social media posts, opinions, etc.
NLP in Healthcare:
There are several use cases of NLP in healthcare, including clinical documentation, clinical decision support, clinical trial matching, and virtual transcribing, among others
NLP Benefits
NLP tools have never been so accessible: They help business process massive volumes of text data, streamline operations, improve efficiencies, enhance customer satisfaction, and reduce costs.
Text analysis at scale: Data that would take weeks of manual analysis can be processed in minutes or seconds
No bias, no errors: Humans are prone to mistakes and their inherent bias can skew results, NLP tools—once they’ve been trained and running—have neither. And they are indefatigable.
Streamlined processes and cost savings. NLP allows you to scale, 24×7. It also requires you to employ minimal staff. And it frees them from repetitive tasks and the potential for manual error.
Improved customer satisfaction: Connecting NLP tools to your data systems, for example, can analyze your customer feedback in real-time, so if needed you can contact them right away. NLP also helps you understand your customer base, improve market segmentation, and enhance customer lifetime value.
Happier employees: Relieving employees of the tedium of repetitive tasks, allows them to focus on doing their jobs better. It removes fatigue and improves motivation.
Actionable insights: With NLP you can easily break down data related to matters such as online surveys, reviews, etc. This lets you do away with guesswork or cursory analysis, and discover real, actionable insights that can bring real results.