What is Natural Language Processing? A Complete Guide
Machines that can understand humans?
Respond to human language?
Recognise linguistic cues, comprehend context, and even take appropriate action?
Sounds like something out of an Isaac Asimov novel! But the fact is – modern-day machines can do all of this and more – thanks to Natural Language Processing (NLP).
But what is NLP?
How does it work?
Why is NLP important?
What are its current applications and uses?
This detailed guide explores all these frequently-asked questions about this exciting area of Artificial Intelligence (AI).
A Brief History of NLP
Contrary to popular belief, NLP is not a new concept that has suddenly grown out of the 21st-century digital economy. If anything, the roots of NLP were planted more than a century ago.
The idea of NLP was born out of the concept of “Language as a Science” that was first proposed by Ferdinand de Saussure, a Swiss linguistics professor in the early 1900s. Half a century later, Alan Turing published his now-famous paper Computing Machinery and Intelligence, in which he proposed the ideas of “thinking” machines and “learning” machines.
Just a few years later, the Hodgkin-Huxley experiments showed that the brain of a living organism (they experimented with squids) forms an electrical network using neurons. Even popular culture presented the idea of a machine that could recognise human speech, understand natural language and even respond – notably Stanley Kubrick’s classic 1968 science fiction film 2001: A Space Odyssey.
This masterpiece prominently featured a talking computer called “HAL”. Together, all these early ideas and later, the accelerated development of Machine Learning technologies in the 1980s set the stage for the explosive development of NLP in ways that de Saussure, Turing and Kubrick unfortunately never got to witness during their lifetimes.
NLP is one of the most talked-about areas of AI because it presents the possibility that a logic-based machine could understand and respond to humans in the most human way possible – through language.
What is Natural Language Processing in AI?
Natural language is how humans communicate with each other. Natural Language Processing is an area of AI (NLP AI) that enables computers to understand and interpret human language inputs and respond using natural language.
Machines powered by NLP solutions can read text, hear speech, interpret, and understand and duplicate both. They can even take relevant or appropriate action based on these inputs. Simply put, natural language processing techniques allow machines to communicate with humans via human language.
Over the years, NLP has significantly advanced by drawing on the techniques and learnings from different disciplines, including:
- Applied Linguistics
- Computational Linguistics
- Computer Science, and
- Machine Learning
Natural Language Processing Examples
In 2011, Apple’s Siri became one of the first successful NLP-based digital assistants available for consumer use. Siri contains an intelligent Automated Speech Recognition (ASR) module that translates the spoken word into text. ASR is based on NLP, which allows normal conversations between users and Siri that are as close to human-like.
It matches human words to predefined commands to initiate specific actions for Siri to take. Thus, Siri’s ASR allows human users to use their voices to “speak” to Siri and get it to do numerous things, such as:
- Play music
- Check weather updates
- Check news updates
- Create shopping lists
- Set reminders or alarms
- Make phone calls
- Send or read texts
Since 2011 and the launch of Siri, dozens of other real-world applications and use cases have emerged for NLP. Today, NLP software tools and techniques allow:
- Amazon Alexa to play a user’s favourite songs
- Email filters to categorise incoming emails as Primary, Social, Promotional or Spam
- Apple Siri to provide weather updates
- Google Search to reveal the answer to “how many cells does an amoeba have?”
- Facebook to understand a user’s interests and analyse their browsing patterns to display related ads and posts
- Online automated translators to translate text from English to Tamil, Mandarin, Spanish, Maori, or German
- Call centre support platforms to direct customer calls either to a chatbot or a human support agent
- Voicemail applications to convert missed phone calls into text
- Analytics applications to analyse huge volumes of business data and present key insights that can support business decision-making and strategic action-taking
- Software to convert unstructured textual data into meaningful information for managers, doctors, customer account managers, and many other users
- Facebook Rosetta to visually extract useful text from videos and images in real-time
Further, NLP works with text analytics solutions to structure and meaning from large volumes of textual data by grouping, counting and categorising words in the text. This combination is frequently used for:
- Content classification: Classify content in meaningful ways to reveal important trends
- Criminal investigations: Identify patterns (or clues) in written text, such as reports or emails, to solve crimes
- Social media analytics: Identify key influencers that a brand can work with and track audience sentiment and opinions about specific topics to guide social media strategy
NLP and Chatbots
NLP enables intelligent chatbots, such as those developed using Gupshup’s low-code chatbot platform, to effectively answer common queries, guide customers down the sales funnel, and even transfer conversations to human support representatives as needed.
In one recent study, 68% of consumers said they had a positive experience interacting with chatbots. This is one reason for their increasing popularity. These chatbots use NLP and AI to understand customer queries and appropriately react to the meaning of customer questions.
They can understand the context of the query, as well as any linguistic nuances, abbreviations, accents, and other cues to figure out the best answer to provide. They can support users on websites, via phones, and even via messaging applications like WhatsApp, Slack, Facebook Messenger, or Telegram.
NLP/AI chatbots developed with a chatbot development platform like Gupshup can be customised for a wide range of industries and use cases, including:
- Customer support
- Sales and commerce
- Marketing and promotions
- Bookings and appointments
- Information-sharing (e.g. ticket prices, restaurant locations, business hours, etc.)
- In-app support
- News and weather
- eCommerce and retail
- Food delivery
How Does NLP Work?
Broadly speaking, NLP includes many techniques to interpret and respond to human language. Some methods are statistical, while others are machine learning-based, rules-based or algorithmic. Rules-based algorithms use carefully designed linguistic rules to understand and process text. Machine learning (ML) algorithms use statistical methods.
Based on the training data they’re fed, they “learn” to process language and over time, adjust their output to improve performance. Advanced NLP algorithms combine ML, deep learning and neural networks to improve their own rules through repeated and iterative data processing and learning.
In general, NLP software breaks down the language into shorter, more basic pieces called tokens. A token could be a word, a period, etc. It then attempts to unpack the relationships between tokens to understand the context of the written or audio text and interpret its meaning.
For instance, ML engines take inputs in the form of phrases, sentences, paragraphs, and even entire books. They then process this text using the grammatical rules of that language or human linguistic habits. The NLP system then looks for patterns in this data and tries to extrapolate what could come next.
Data pre-processing is an important step in NLP, where the input data is first prepared and “cleaned” for an NLP machine or algorithm to understand. Tokenisation is one method of pre-processing. Other methods are:
- Lemmatisation and stemming: Words are reduced to their root forms before they can be processed by the NLP algorithm
- Stop word removal: Common words are removed and unique words that offer useful information are retained
- Part-of-speech tagging: Words are tagged based on their part of speech, i.e. nouns, verbs, etc.
High-level NLP Features
Many NLP tasks use high-level capabilities, such as:
- Contextual extraction: Process text to automatically pull structured, contextual information
- Content categorisation: Create document summaries, detect duplicates, index text, and support textual search
- Topic discovery and modelling: Analyse text to capture the meaning and make forecasts
- Speech-to-text and text-to-speech conversion: Convert voice commands into written text, and vice versa
- Machine translation: Automatic translate written or spoken text from one language to another
In all these applications, the NLP software takes raw language as input and applies algorithms and linguistics techniques to transform or enrich the text (written or spoken) to deliver greater value to the target audience.
NLP, NLU and NLG
Natural Language Understanding (NLU) is a sub-field of NLP with numerous potential applications, particularly in cognitive and conversational AI. NLU applications can independently interpret user intent, understand conversational context, and resolve any word ambiguities, mispronunciations, misspellings, slang, and other language variants before generating appropriate output.
NLU does much more than simply understanding the structural format of human spoken or written language. NLU algorithms can perform semantic interpretation to understand the intended meaning of this language. They can also generate well-formed language, with the relevant context, subtleties and inferences that make sense to humans.
Natural Language Generation (NLG) is another sub-category of NLP. NLG refers to a machine’s ability to create its own written or spoken narrative. NLG utilises a database to understand word semantics, identify patterns, and generate new text based on this understanding.
- Example 1: NLG software would analyse a body of text to automatically generate news articles, social media posts, or tweets based on this text
- Example 2: An NLG algorithm could tap into the data from a Business Intelligence (BI) platform to map specific words, phrases or jargon, and then create a summary of findings
Some current uses of NLG include:
- Voice bots for customer service, marketing or promotions
- Voice assistants like Amazon Alexa, Apple Siri and Google Home
- Automate email marketing
- Generate contact centre agent scripts
- Summarise news reports for journalists
- Convert financial and other types of business data into human-friendly, consumable form
- Creating product descriptions for eCommerce websites
Why is NLP Important?
As the digital economy explodes, so does the amount of data that’s being generated, shared, and stored. To ensure that users – particularly business users – can effectively access, process and utilise this data, new techniques are required. Here’s where NLP comes in.
NLP is especially useful to process, analyse, and make sense of unstructured textual data.
Here are the primary reasons why NLP is becoming increasingly important:
Availability of Huge Volumes of Textual Data
NLP software tools can analyse large amounts of language-based data, such as medical records, social media content, etc. consistently and continuously. Algorithms can be modified to ensure that this analysis is unbiased, and delivers the best possible output for its particular use case.
The Need to Structure Unstructured Data
Human language is complex, diverse and often unstructured. There are thousands of languages, each with its own grammar rules, syntax, and abbreviations. Moreover, accents, slang, punctuation, and other features also add to the complexity of both spoken and written text. In many industries, there is a need to structure this unstructured data to extract meaning from it.
Techniques like supervised and unsupervised ML and deep learning are used to model human language for many applications. However, these techniques cannot help with syntactic and semantic understanding – which is why NLP is required.
NLP is particularly useful for applications like speech recognition and text analytics since it can resolve language ambiguities (e.g. spelling mistakes), and provide a useful, actionable, numeric and analysable structure to raw unstructured data.
Analyse Language Data and Make Sense of It in Various Ways
NLP algorithms are already available that can do all of the following with unstructured data:
- Parsing: Grammatically analyse a sentence by breaking it into its various parts of speech. This capability is ideal for complex downstream processing applications.
- Example: “The boy cried”
- Parsing breaks the sentence to identify the noun: boy, and verb: cried
- Word segmentation: Deriving word forms and identifying where words are separated by white text in a string of text
- Example: NLP can analyse a digitally scanned document to identify white space and recognise different words
- Sentence break-ups: Recognise periods (full stops) to place sentence boundaries in text
- Example: “It rained today. I forgot my umbrella at home.”
- In this sentence, the NLP algorithm can recognise the period and understand that this text contains two distinct sentences.
- Morphological segmentation: Divide words into smaller parts (morphemes) for speech recognition and machine translation applications
- Example: “unmistakably”
- In this word, the NLP software will recognise that the word consists of morphemes: un, mistake, able, and ly
- Stemming: Divide similar-looking words with inflection to identify root forms and conjugations
- Example: “It rained today.”
- In this sentence, the algorithm would understand that “rained” is a form of “rain”, even though the tenses and spellings are different
- Disambiguate word sense: Derive the meaning of the text based on context
- Example: “I will book an appointment for tomorrow.”
- In this sentence, the algorithm can recognise that “book” does not refer to a bound collection of pages, but appointment confirmation
- Named entity recognition (NER): NLP can determine words that can be categorised into groups
- This capability is particularly useful in the healthcare industry.
- Example: An NLP algorithm could analyse a scientific paper or news article and identify where a particular pharma company or its products are mentioned.
- It can also differentiate between entities that appear the same visually to extract the relevant contextual meaning.
- Example: In the sentence, “Wendy went to Wendy’s for lunch”, the NLP software can recognise the two Wendy instances – one a person’s name and the other a restaurant chain’s name – as two separate entities
Word sense disambiguation and named entity recognition are both NLP techniques involving semantics, i.e. using and understanding the meaning of words and the structure of sentences.
The other techniques all involve syntax, which refers to the arrangement of words to make grammatical sense. NLP techniques leverage language syntax to assess meaning from text based on grammatical rules.
Which Industries Use NLP?
The global NLP market was worth $16.53 billion in 2020. By 2028, this value is projected to grow almost 8X to $127.26 billion at a CAGR of 29.4%. The world is recognising the inherent potential of NLP technology and is looking for more ways to leverage its capabilities to solve real-world challenges.
Several industries already use NLP successfully and for a variety of use cases. These include:
- Digitise and analyse Electronic Medical Records (EMR) to improve care delivery and patient outcomes
- Predict adverse medical events and provide proactive treatment, e.g. for strokes, and alcohol or drug overdoses
- Predict and prevent suicides
- Speed up drug discovery and testing
- Speed up clinical trials and interpret clinical trial protocols
- Mine safety information from unstructured text like EMRs, medical literature, social media, conferences, etc.
- Financial Services and Legal
- Analyse data, and use it to design new products or services
- Resolve business challenges such as “why are customers leaving?”
- Speed up document processing
- Analyse company annual reports and news articles to improve business decision-making
- Streamline insurance underwriting and claims processing workflows
- Customer Support
- NLP-powered chatbots assist customers with common queries
- Understand customer intent or sentiment to provide information, make recommendations, suggest alternatives, speed up product search, etc.
Natural Language Processing with Python
Python is one of the most popular computing languages to build NLP systems. By leveraging the NLP Python combination, developers can create a host of algorithms and systems for use in numerous applications.
NLP frameworks like the Natural Language ToolKit, spaCy, the Stanford CoreNL, as well as NLP libraries like Natural Language Toolkit (NLTK), are also useful to build NLP systems. NLTK is built in Python and is versatile enough to help developers with a host of NLP tasks, such as:
- Pre-process data
- Tokenise text by word or sentence
- Analyse text
- Create visualisations, e.g. dispersion plots or frequency distributions
- Tag parts of speech
- Stem or lemmatise words
- Chunk phrases
- Use Named Entity Recognition
Python and Python-based libraries like NLTK make it easy to explore unstructured data and create customised NLP applications for various use cases.
The development of Natural Language Processing as well as related technologies like Natural Language Understanding and Natural Language Generation has significant implications for dozens, and perhaps even hundreds, of real-world applications.
As human beings continue to generate ever-larger amounts of data, NLP techniques will play an increasingly important role in analysing this data, and making sense of it to solve real human challenges. Google Search, Alexa and chatbots are just the tip of the iceberg. When it comes to NLP, a whole wide world of applications is still waiting to be explored.
Gupshup is one of the world’s leading providers of conversational AI/NLP chatbot and conversational marketing applications. Our no-code bot building tools enable organisations in numerous industries to quickly build and deploy intelligent AI- and NLP-powered chatbots for customer support, commerce, marketing, and many other applications.
To know how Gupshup can help you scale up your customer communications flows with NLP chatbots, talk to us. Alternatively, if you’re looking for low-code SMS, RCS or WhatsApp business solutions, explore our SMS API, WhatsApp API and RCS Business Messaging platforms.
Blogs you will want to share. Delivered to your inbox.