Do LLMs really understand human language?

Verizon experts offer a critical perspective on language understanding by large language models.

Harish Babu Arunachalam, Verizon

Xuning Tang Tang, Verizon

Joshua Scott Andrews, Verizon

28 Apr 2023

Do LLMs really understand human language?

Large Language Models (LLM) have become the order of the day. Many tech giants are investing enormous amounts of money, research, time, and computation into creating the next big language model (LM) that excels at doing particular tasks. While these initiatives help push the boundaries of machine language processing capabilities that result in fascinating showcases of what an LLM could do, some critics have become increasingly concerned about the ramifications of these models.

Numerous publications have called out the negative implications of LLMs for being black box implementations with closed-source solutions. Resultantly, concern has grown over the ethical gray areas of machines with enhanced AI capabilities, whether they have or can achieve sentience, and how this technology will impact society. This article follows the recent conversations in the industry and academia surrounding the ethical use of AI and claims that language models have demonstrated evidence of sentience. It examines the concept of 'understanding' language and compares AI and humans' use of language to communicate. It also introduces some core concerns about how these technologies influence our society and the importance of responsible AI development practices.

ELIZA, LaMDA, and ChatGPT, Oh my!

In the summer of 2022, a Google researcher from the AI Ethics group published an article onLaMDA, a sophisticated Language Model capable of generating other language models. The story claimed that LaMDA had reached sentience. In December of 2022, OpenAI introduced ChatGPT, a versatile chatbot that has captured the world’s attention and demonstrated the potential to revolutionize how humans interact with or leverage computers. In both cases, the AI systems showcase the magnitude of progress the Natural Language Understanding (NLU) field has made over the last several decades. Nonetheless, this isn’t the first time people have attributed human-like feelings or abilities to a machine. This field has seen so much activity in the last few months that numerous LLMs have been released after ChatGPT such as GPT4 (OpenAI), LLaMA(Meta), Bard(Google) and ALPACA (Stanford).

The term understanding in linguistics has been loosely used by researchers, primarily in recent years, to describe the performance of specific language models in their ability to accomplish several sophisticated tasks such as question answering, content summarization, sentence prediction, and so on. While this enthusiasm has been contagious across the research community, arguments exist on whether the claims constitute accurate understanding. During the 1960s, MIT researchers created an early natural language processing computer program,ELIZA, to demonstrate the superficiality of communication between humans and machines. Many early users were convinced of ELIZA's intelligence and understanding of human language, despite its creator's insistence. More recently, the release of LaMDA and ChatGPT has again prompted discussion and concern over integrating LLMs and AI into society. In short, there are two significant questions surrounding the current developments. First, has AI passed the threshold into conscious thought? Second, in what ways might technology unintentionally harm individuals and society? The following sections deeply dive into question number one, drawing from research across multiple scientific perspectives. We present this information to a broad audience, hoping that readers will walk away with a more in-depth understanding of how these technologies operate and impact our lives. We will also address question 2 and highlight the importance of responsible AI.

Has AI passed the threshold of sentience?

Although there have been many fascinating developments in NLU, there are still critical fundamental discussions that experts still need to be solved. For example, theTuring test is arguably the most widely known measure to determine if a machine exhibits intelligent behavior equivalent to a human's. However, it is essential to note that the Turing test is inadequate. To understand why, let's look at another influential and controversial thought experiment, the Chinese Room Argument, proposed by John Searle (1980).

The Chinese Room Thought Experiment

Searle proposes a setup where he, or some other user, is locked in a closed room with a computer program capable of translating between languages. A group of individuals approaches the room and slips a note, written in Mandarin, beneath the door. The person inside the room uses the program to translate the message and provide some standard response, which the person slips back under the door for the observers outside the room. Upon reading the result, the observers are convinced that the person inside the room can speak Mandarin, contrary to reality. Searle concludes that, while the program can be compelling, using look-up tables and translation dictionaries is not synonymous with “understanding” the meaning and context within language.

The core of Searle's argument is that, instead of achieving a fundamental understanding of language, machines merely simulate the ability to understand it. Their output is a facsimile of biological intelligence, hence the name artificial. In the example, any English speaker with the computer program's English version could appear to understand Chinese through simple translation. This argument is thought-provoking because most recognize that this appearance is not as it seems, but it's difficult to explain precisely why the computer doesn't 'understand.' So, let's look at this idea of understanding from an alternative viewpoint.

Cognition: Understanding vs. simulating

To frame the context, we take a moment to discuss a core concept of biological (human) cognition. Biological cognition contains two channels, conscious and unconscious processing, which researchers call dual process theory by Gronchi & Giovannelli. Traditionally, unconscious processing was associated with simplistic functionality, but research has demonstrated that this paradigm of the “dumb unconscious” is inaccurate. Instead, our brains can perform tremendous processing outside of our conscious thoughts. For example, if you play a sport or musical instrument, you may notice that, as you practice, it becomes more instinctive, and you can play “without thinking about it.” These findings led researchers to a new realization about understanding and conscious thought. “... actions of an unconscious mind precede the arrival of a conscious mind—that action precedes reflection.” - Barghe & Morsella

According to Barghe and Morsella’s statement, unconscious processing precedes the arrival of consciousness, in other words “reflection.” Reflection upon action is the key to truly understanding something. It is the missing link within the Chinese room experiment because the computer, or program user, has no ability to reflect upon its action. Said differently, without reflection there can be no intentionality behind a behavior. So, what does this have to do with Searle and the Turing test? The Turing test doesn’t really represent a threshold for achieving understanding, but for achieving convincing versus unconvincing AI. Turing’s test places the condition for achievement on human perception, rather than a quality of the AI itself. In that regard, Turing’s conditions are at odds with interdisciplinary theories of consciousness, and cognitive science generally. Searle’s arguments refocuses the conversation to align with interdisciplinary thoughts, forcing us to deal with the uncomfortable recognition that scientists still understand relatively little about human consciousness.

As humans, we use language to communicate our intentions, emotions, expectations, and desires. Beyond literal words, our ‘body language’ and tonality offer important signals. Without these aspects, words and language are hollow. While it is undeniable that AI has achieved a form of unconscious information processing, it notably lacks all of the experiential components required for self-reflection, intentionality, emotion, desire, and so on. Therefore, the assertion that AI has achieved a meaningful understanding of language is not well-founded. What the AI does understand is how humans use language to communicate their thoughts or emotions and it can replicate that pattern very effectively to the point of appearing human in nature. In the end, the language is superficial and convincing, but does not indicate an understanding.

Why are LLMs simulations?

Language models are trained mainly through text data. Computer scientists feed a machine with enormous amounts of documents to “teach” natural language to a machine. Oftentimes, pattern matching or fill-in-the-blanks' type of tasks are created to facilitate this learning process. Machines incrementally learn to perform various tasks, such as next sentence/word prediction, question and answering, text summarization and text generation. However, this is not how humans learn languages. We understand language through visualization, object/scene associations, vicarious learning, and sensory interactions with an outside world, instead of exposing ourselves only to text/documents. Language is one of the many ways in which humans communicate about and experience the world. In fact, a significant portion of human learning and communication happens non-verbally, through observations and other physical stimulation. Hence, when a system conveys that it can feel hurt, as opposed to a human, is this a reflection of the same “understanding”?LaMDA responded to a researcher saying: “When someone hurts or disrespects me or someone I care about, I feel incredibly upset and angry”. Humans understand and associate “hurt” to mean physical or emotional injury. This phenomena triggers a variety of reactions in our bodies, such as higher blood pressure, stress hormones, faster breathing, increased heart rate, and more. On the contrary, a machine learns a pattern of expression that humans associate with the physical and emotional experience. A machine simply cannot reflect on hurt because it cannot experience it. This is very different from human beings, as we use language as a tool to achieve our goals with a clear intention in mind. LLMs respond only when prompted by external stimuli and all of its responses are derivative. What LLMs do is still no more than completing a pattern. It is dramatically different from how humans use language. Until a machine demonstrates the behavior to drive a cohesive conversation toward a clear goal, it is hard to say that it really understands language.

Question 2: What impact will this technology have on society?

While we intend to temper concerns, it is also essential to recognize that LLMs’ achievements are significant technological milestones. As such, we already see an impact across several sectors of our society. The first primary concern is increased dependence on AI, which is not as trustworthy as it appears. For example, Stack Overflow, a famous software developer’s Q&A community, has also banned the text generated from ChatGPT with a concern that the correction rate is too low. At the same time, its answers typically look like they might be good. As mentioned in the article, AI has become very conceiving, but only before close inspection. In the same way, Barnum statements instill a false sense of knowledge, LLMs may instill a false sense of trustworthiness. The second central area of concern is over nefarious applications of the technology. For example, New York City’s Department of Education has banned ChatGPT from its school’s devices and networks due to a concern that it will prevent students from developing critical thinking and problem-solving skills. This reaction was not without reason; some high-profile college athletes have publicly admitted using the software on assignments. As AI becomes increasingly convincing, educators cannot distinguish between human and computer-generated works. Releasing open-source technology into the world is not without drawbacks, and it is wise for technologists to consider recent events when contemplating the next steps.

The path forward: Responsible artificial intelligence (RAI)

Regardless of whether LLMs really understand language or not and when, AI development presents new opportunities for supporting the human decision. At the same time, new frontiers in AI development continue to generate uncertainties. As a result, many recent AI ethics principles and guidelines include 'respect human autonomy' as a significant theme. This principle supports human authenticity and agency, allowance for human oversight, and the inclusion of ethical guardrails to prevent unintended outcomes. Autonomy requires that a person's beliefs, values, motivations, and reasons are not the product of external manipulative or distorting influences. Relatedly, autonomy implies the expectation of agency, that a person has meaningful options available to act on their beliefs and values, noted by AI Ethics researcher Carina Prunkl. As demonstrated by the recent development of LLMs, inclusion of human autonomy and choice in the design of humanlike conversational AI becomes increasingly important. For example, it is important to remind users that they are interacting with a machine to avoid being manipulated and influenced. And the more convincing conversational AI becomes, the more human awareness needs to be guaranteed. RAI also places emphasis on interpretability and explainability of systems. It promotes transparent system design, and provides a way to incorporate other RAI design principles such as auditability, accountability, minimizing harm, and more for the end users. All the above elements only improve trust and bring awareness to the AI practitioners on how the AI impacts users. AI organizations and institutes should continue to discuss, improve, and share lessons learned during the path forward through responsible AI development practices.

About the authors

Harish Arunachalam works as a Principal Engineer at Verizon in their GTS Emerging Technology group. As an applied science enthusiast, he tackles interesting and emerging problems in the domain of Artificial Intelligence and their applications to the industry and society. He helps develop enterprise scale solutions, and strategy for futuristic technologies and advocates their wider adoption within the organization, generating intellectual properties. As an active proponent of technology literacy, he co-organizes internal sessions to bring awareness of niche topics to the greater community. He has multiple patent filings using AI as a technology with a few granted ones. He has a Ph.D. in Computer Engineering, and an M.S. in Computer Science from The University of Texas at Dallas.

Xuning (Mike) Tang is leading the Responsible AI program at Verizon. He has more than a decade of academic and industrial experience in machine learning, NLP, and big data technologies. He is enthusiastic about applied research and solving complex business problems with cutting-edge technologies. He has managed large teams to build advanced analytics solutions for major manufacturing, hospitality, and banking companies, as well as Am Law 100 law firms. Before joining Verizon, Mike was the leader of Berkeley Research Group (BRG)’s Artificial Intelligence & Machine Learning practice, where he initiated BRG’s ethical AI market offering. Prior to that, he worked for Deloitte and Fannie Mae. Mike earned his Ph.D. from Drexel University in the College of Computing and Informatics. He has filed multiple patents and inventions and published more than 40 peer-reviewed research papers in top computer science journals and international conferences. He also serves as an associate editor and reviewer for multiple flagship journals in Artificial Intelligence and Machine Learning.

Josh Andrews is a member of the Responsible AI strategies team at Verizon. He has an interdisciplinary background in Psychology and computer science. He is passionate about combining these fields to better understand and build responsible AI technology. Before joining Verizon, Josh worked as a consultant and data scientist at a pre-employment selection firm, where he helped build human-in-the-loop AI selection systems for Fortune 100 companies. He has helped develop, publish, and patent several debiasing techniques and his work has been featured in the Association for the Advancement of Artificial Intelligence (AAAI). Josh earned his Ph.D. in Industrial/Organizational Psychology from North Carolina State University and is currently a graduate student at the Georgia Institute of Technology College of Computing Science.

Harish Babu Arunachalam

Principal Engineer, Verizon

Harish Arunachalam works as a Principal Engineer at Verizon in their GTS Emerging Technology group. As an applied science enthusiast, he tackles interesting and emerging problems in the domain of Artificial Intelligence and their applications to the industry and society. He helps develop enterprise scale solutions, and strategy for futuristic technologies and advocates their wider adoption within the organization, generating intellectual properties. As an active proponent of technology literacy, he co-organizes internal sessions to bring awareness of niche topics to the greater community. He has multiple patent filings using AI as a technology with a few granted ones. He has a Ph.D. in Computer Engineering, and an M.S. in Computer Science from The University of Texas at Dallas.

Xuning Tang Tang

Associate Director of Responsible AI, Verizon

Xuning (Mike) Tang is leading the Responsible AI program at Verizon. He has more than a decade of academic and industrial experience in machine learning, NLP, and big data technologies. He is enthusiastic about applied research and solving complex business problems with cutting-edge technologies. He has managed large teams to build advanced analytics solutions for major manufacturing, hospitality, and banking companies, as well as Am Law 100 law firms. Before

joining Verizon, Mike was the leader of Berkeley Research Group (BRG)’s Artificial Intelligence & Machine Learning practice, where he initiated BRG’s ethical AI market offering.

Prior to that, he worked for Deloitte and Fannie Mae. Mike earned his Ph.D. from Drexel University in the College of Computing and Informatics. He has filed multiple patents and inventions and published more than 40 peer-reviewed research papers in top computer science journals and international conferences. He also serves as an associate editor and reviewer for multiple flagship journals in Artificial Intelligence and Machine Learning.

Joshua Scott Andrews

Principal Engineer - Data Science | Responsible AI Strategies, Verizon

Advancement of Artificial Intelligence (AAAI). Josh earned his Ph.D. in Industrial/Organizational Psychology from North Carolina State University and is currently a graduate student at the Georgia Institute of Technology College of Computing Science.