DESIGN TOOLS
Micron technology glossary

Large language models

Large language models (LLMs) have become a cornerstone of artificial intelligence (AI), driving the rapid expansion of generative AI across various sectors. As AI becomes increasingly integrated into daily life and as organizations adopt it in their operations, understanding the pivotal role of LLMs in AI’s growth is essential. These models enable sophisticated language understanding and generation, making them crucial for advancing AI capabilities and applications. 

What is a large language model? 

Large language model definition: Large language models are a type of AI designed to understand and generate human language using natural language processing (NLP) technology. 

Language has evolved over thousands of years as a primary form of human communication. LLMs are designed to mimic this evolution, allowing them to communicate, process and interpret input data to generate meaningful responses. Through contextual training and continuous learning, LLMs can effectively understand and generate text, making them powerful tools in various applications. 

These models use deep learning techniques and vast datasets to process and produce natural language, making them integral to many AI applications. 

Large language models often work in conjunction with other AI technologies. For instance, they use deep learning to analyze substantial amounts of text data, enabling them to generate coherent and contextually relevant text. Generative AI, which focuses on creating new content, is closely linked with LLMs, as they are specifically designed to produce text-based content. 

How do large language models work?

Large language models are typically composed of multiple layers of ​​neural networks, with each layer fine-tuned during the training process. During training, LLMs learn to predict the next word in a sentence based on the preceding words. They do this by assigning a probability score to each potential next word, considering the given context.

As the models undergo more training, their predictions become more accurate. The more context and words a model processes during training, the more precise its output becomes. This process allows LLMs to accrue knowledge, resulting in a more accurate and humanlike output. 

Two key metrics for evaluating the performance of a large language model are accuracy and precision.

  • Accuracy measures how often the model’s predictions are correct. In the context of LLMs, accuracy may refer to how often the model generates the correct word, phrase or sentiment based on the given input. High accuracy ensures that the model frequently produces correct outputs, which is crucial for tasks requiring reliable predictions. However, achieving high accuracy can sometimes come at the expense of precision, as the model might produce more correct outputs overall but with less certainty about individual predictions. 
  • Precision measures the quality of the correct predictions. For LLMs, precision might refer to how often the model’s positive predictions are correct. For example, if the model identifies certain text as positive sentiment, precision measures how often this identification is accurate. High precision indicates that the model’s positive predictions are usually correct, enhancing the trustworthiness of its outputs. However, focusing too much on precision can reduce accuracy, as the model might become overly cautious and miss some correct predictions. 

The model performance improves with time and can get to a point where it can filter hateful speech, unwanted biases and factually flawed responses. This capability ensures AI technology has broad appeal, particularly where organizations seek to avoid controversial opinions that could be generated by the models. This refinement process, which occurs in the model’s training environment by developers or data scientists, is commonly referred to as prompt tuning. 

What is the history of large language models?

Neural networks played a crucial role in the development of large language models, with numerous experiments and extensive research driving advancements in computers’ ability to better understand human language.

  • 1950s, the birth of LLMs: Researchers at IBM and Georgetown University created the first form of natural language processing, developing a system that translated Russian to English.
  • 1960s, significant breakthroughs: The world's first chatbot, ELIZA, was created in the 1960s by researchers at MIT. Although limited in its capabilities, ELIZA provided a glimpse into the future potential of large language models. 
  • 1970s, SHRDLU: Developed at MIT, SHRDLU was a software program designed to understand and process natural language. It could engage in conversations with users. 
  • 1990s, development of statistical approaches: The 1990s saw the adoption of statistical approaches, significantly improving the effectiveness of large language models. N-gram models, which estimated the probability of a word based on preceding words in a sequence, were particularly influential. 
  • 2020s, OpenAI and BERT: OpenAI introduced its first large language model in 2018. Concurrently, Google introduced bidirectional encoder representations from transformers (BERT), marking a new era in large language modeling by highlighting the potential of pretrained models. 

What are key types of large language models?

There are three key types of large language models: transformer-based, multimodal and specialized. 

  • Transformer-based models, such as the examples below, use transformer architecture, which allows for efficient processing of large-scale text data:
    • GPT (generative pretrained transformer): Developed by OpenAI, models like GPT-3 and GPT-4 are designed to generate humanlike text and perform a wide range of language tasks. 
    • BERT (bidirectional encoder representations from transformers): Created by Google, BERT is optimized to understand the context of words in a sentence, making it effective for tasks like question answering and sentiment analysis.
    • T5 (text-to-text transfer transformer): Also created by Google, T5 treats every NLP problem as a text-to-text problem, allowing it to be highly versatile. 
    • XLNet: An extension of BERT, XLNet improves on the limitations of BERT by using a permutation-based training approach. 
  • Multimodal models, such as the examples below, integrate various types of data (text, audio, images and video) to provide a more comprehensive understanding of context and user interactions: 
    • CLIP (contrastive language–image pretraining): Developed by OpenAI, CLIP can understand images and text together, making it useful for tasks like captioning images and answering visual questions. 
    • DALL-E: Another OpenAI model, DALL-E generates images from textual descriptions, showcasing the potential of multimodal AI. 
  • Specialized models, such as the examples below, are designed for specific tasks or domains, often more finely tuned than general-purpose models:
    • BioBERT: This variant of BERT is fine-tuned for biomedical text mining. 
    • LegalBERT: This variant of BERT is tailored for legal documents and tasks. 

These models represent significant advancements in natural language processing and understanding, enabling a wide range of applications — from chatbots and virtual assistants to content generation and language translation. 

How are large language models used?

Large language models are an invaluable aspect of AI, and they are used in many ways every day. One example is audio data analysis. AI tools are integrated into video calls and can convert audio recorded during meetings into notes, summarizing these meetings into concise action points for attendees to refer to afterward. 

Similarly, LLMs can help translate text or audio into different languages. When users visit a website with text in other languages, they can often choose to view the text in their preferred language. LLM-powered tools can then translate the text almost immediately, ensuring that nothing gets lost in translation. 

The field of cybersecurity also benefits from large language models. Businesses can use LLMs to analyze vast amounts of cybersecurity data, which helps them identify and prevent potential threats. 

Frequently asked questions

Large language model FAQs

Like other AI branches, LLMs have limitations. These include a lack of understanding without context and an inability to explain how they arrived at certain conclusions. 

Currently, LLM tools are used widely for creating content, such as writing blogs. However, their applications are likely to expand as LLMs evolve.