Artificial intelligence is transforming the world while simultaneously creating a new lexicon to explain its advancements. Spend five minutes delving into AI literature, and you’ll encounter terms like LLMs, RAG, RLHF, among others, which can make even the brightest minds in technology feel uneasy. This glossary is our attempt to remedy that confusion. We revise it frequently as the field grows, so think of it as a dynamic document, akin to the AI systems it elucidates.
Artificial general intelligence, often abbreviated AGI, is a vague term. Generally, it denotes AI that surpasses the average human in several, if not all, tasks. OpenAI’s CEO Sam Altman has previously likened AGI to the “equal of a median human you could hire as a collaborator.” Conversely, OpenAI’s charter defines AGI as “highly autonomous systems that excel beyond humans in the majority of economically significant work.” Google DeepMind has a slightly different perspective, characterizing AGI as “AI that matches or exceeds human capability in most cognitive tasks.” Baffled? Don’t be — experts at the forefront of AI research share your confusion.
An AI agent is a tool that leverages AI technologies to accomplish a set of tasks on your behalf — extending beyond what a typical AI chatbot would manage — such as processing expenses, reserving tickets or restaurant tables, or even writing and managing code. Nonetheless, as we have noted previously, this emergent domain is filled with complexities, so “AI agent” could take on varying meanings for different individuals. The necessary infrastructure is still under development to achieve its anticipated functions. However, the fundamental idea suggests an autonomous system that might utilize numerous AI frameworks to execute multi-step processes.
Imagine API endpoints as “buttons” located on the backend of software that other applications can activate to initiate actions. Developers employ these interfaces to create integrations — for example, enabling one application to extract data from another, or allowing an AI agent to manipulate third-party services directly without human intervention at each interface. Many smart home gadgets and interconnected platforms possess these concealed buttons, even if typical users are oblivious to their existence or operation. As AI agents gain proficiency, they are increasingly able to autonomously discover and utilize these endpoints, unlocking significant — and at times surprising — opportunities for automation.
When posed with a straightforward question, the human brain can respond effortlessly — consider inquiries like “which animal is taller, a giraffe or a cat?” Yet, in numerous scenarios, you may find it necessary to jot down notes to ascertain the right answer because intermediary steps are involved. For example, if a farmer possesses both chickens and cows, totaling 40 heads and 120 legs, one might need to devise a basic equation to deduce the solution (20 chickens and 20 cows).
In the realm of AI, chain-of-thought reasoning for large language models entails dissecting a problem into smaller, intermediary steps to enhance the quality of the final output. This process often requires more time to arrive at an answer; however, it increases the likelihood of correctness, especially in logical or coding contexts. Reasoning models are evolved from traditional large language models and refined for chain-of-thought reasoning via reinforcement learning.
(Refer to: Large language model)
This embodies a more precise notion than an “AI agent,” denoting a program that can autonomously act, step by step, to fulfill an objective. A coding agent is a specific variant focused on software development. Instead of merely proposing code for a human to evaluate and insert, a coding agent can autonomously write, test, and debug code, handling the iterative trial-and-error processes that often occupy a developer’s time. These agents can traverse entire codebases, identifying bugs, executing tests, and deploying corrections with minimal human oversight. Imagine it as hiring a super-fast intern who never sleeps and remains entirely focused — though, like any intern, a human still needs to review the output.
Although it’s a somewhat ambiguous term, compute generally signifies the essential computational capacity that allows AI models to function. This processing power fuels the AI sector, granting it the capability to train and roll out its robust models. The term is often shorthand for the hardware types that provide this computational power — including GPUs, CPUs, TPUs, and other infrastructure forms that constitute the foundation of the contemporary AI industry.
A subdivision of self-enhancing machine learning where AI algorithms are conceived with a multi-layered, artificial neural network (ANN) architecture. This design enables them to establish more intricate correlations as compared to simpler machine learning systems, such as linear models or decision trees. The configurational structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons within the human brain.
Deep learning AI models possess the capability to independently identify vital features in data, rather than necessitating human engineers to delineate these attributes. This structure also accommodates algorithms that can learn from mistakes and, through a cycle of repetition and modification, enhance their outputs. Nevertheless, deep learning systems require a substantial number of data points to generate favorable results (millions or more). They also usually take longer to train compared to simpler machine learning models — hence, development expenses tend to be elevated.
(Refer to: Neural network)
Diffusion is the technology integral to many AI models that create art, music, and text. Drawing inspiration from physics, diffusion systems gradually “deteriorate” the structure of data — including images, songs, etc. — by introducing noise until nothing remains. In physics, diffusion is spontaneous and irreversible — sugar dissolved in coffee cannot revert to its original cube form. However, AI diffusion systems are designed to learn a sort of “reverse diffusion” mechanism to restore the damaged data, enabling the recovery of the information from noise.
Distillation is a method employed to extract knowledge from a large AI model using a ‘teacher-student’ framework. Developers send queries to a teacher model and document the outputs. Responses may be cross-verified with a dataset for accuracy. These outputs subsequently train the student model, which is crafted to mirror the teacher’s behavior.
Distillation can yield a smaller, more efficient model derived from a larger model with minimal distillation loss. This methodology likely facilitated OpenAI in developing GPT-4 Turbo, a swifter iteration of GPT-4.
While all AI firms employ distillation internally, it may also have been used by certain AI companies to catch up with leading models. Distillation from a competitor typically infringes upon the AI API and chat assistants’ terms of service.
This indicates the additional training of an AI model to refine performance for a more specific task or area than was previously emphasized during its training — often through the introduction of new, specialized (i.e., task-specific) data.
Numerous AI startups leverage large language models as a foundation to develop a commercial product but strive to enhance functionality for a specific sector or task by augmenting earlier training cycles with fine-tuning grounded in their own domain-specific knowledge and expertise.
(Refer to: Large language model [LLM])
A GAN, or Generative Adversarial Network, represents a type of machine learning framework that underlies significant advancements in generative AI with respect to creating realistic data — including (but not limited to) deepfake tools. GANs involve utilizing a pair of neural networks, where one draws from its training data to generate an output that the other model evaluates.
The two models are essentially coded to challenge one another. The generator aims to produce outputs that the discriminator cannot identify as artificially created, while the discriminator strives to detect such data. This structured competition can enhance AI outputs to appear more realistic without necessitating additional human intervention. Though GANs are most effective for narrower applications (such as generating realistic images or videos), they are less suited for general-purpose AI.
Hallucination is the term preferred by the AI sector for situations where AI models fabricate information — essentially generating incorrect data. This poses a significant challenge for AI quality.
Hallucinations lead to GenAI outputs that may be deceptive and could even present real-world risks — with potentially harmful ramifications (consider a health inquiry that yields dangerous medical advice).
The phenomenon of AIs generating false information is believed to result from gaps in training data. Hallucinations have prompted a push toward increasingly specialized and/or vertical AI models — that is, domain-specific AIs that necessitate narrower expertise — as a means to diminish the likelihood of knowledge deficits and curtail misinformation risks.
Inference is the mechanism by which an AI model operates. It involves unleashing a model to make predictions or draw conclusions based on previously encountered data. To clarify, inference can occur only after training; a model must discern patterns in a dataset before it can effectively extrapolate from this training data.
Various hardware types can perform inference, ranging from smartphone processors to powerful GPUs to specially-designed AI accelerators. However, not all can execute models effectively. For instance, very large models would take considerable time to generate predictions on a laptop compared to a cloud server equipped with advanced AI chips.
[Refer to: Training]
Large language models, or LLMs, constitute the AI frameworks utilized by popular AI assistants like ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When interacting with an AI assistant, you engage with a large language model that directly processes your request or employs various available tools, such as web browsing or code interpreters.
LLMs are deep neural networks composed of billions of numerical parameters (or weights, as described below) that learn the interrelations between words and phrases, thereby creating a representation of language, akin to a multi-dimensional map of words.
These models are derived from encoding the patterns they detect in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely continuation that aligns with the prompt.
(Refer to: Neural network)
Memory cache refers to a critical process that enhances inference (the method by which AI generates responses to user inquiries). Essentially, caching is an optimization strategy aimed at increasing inference efficiency. AI relies heavily on rigorous mathematical calculations, and each time these calculations occur, they consume more power. Caching is intended to minimize the number of calculations a model may need to execute by saving specific computations for future user inquiries and operations. Various forms of memory caching exist, with one of the well-recognized being KV (key value) caching. KV caching operates within transformer models and boosts efficiency, yielding faster outcomes by reducing the time (and algorithmic effort) necessary to formulate responses to user queries.
(Refer to: Inference)
A neural network is the multi-layered algorithmic architecture that supports deep learning — and, more broadly, the surge in generative AI tools following the advent of large language models.
Although the concept of drawing inspiration from the densely intertwined pathways of the human brain as a framework for data processing algorithms traces back to the 1940s, it was only the relatively recent advent of graphical processing units (GPUs) — spurred by the gaming sector — that truly unleashed the potential of this theory. These chips are particularly suited for training algorithms with far more layers than was feasible in earlier eras — allowing neural network-based AI systems to attain significantly improved performance across diverse fields, including voice recognition, autonomous navigation, and drug discovery.
(Refer to: Large language model [LLM])
Open source denotes software — or increasingly, AI models — for which the underlying code is made publicly accessible for anyone to utilize, review, or alter. In the AI realm, Meta’s Llama family of models serves as a prominent instance; Linux is the notable historical counterpart in operating systems. Open source methodologies empower researchers, developers, and companies globally to build upon each other’s work, accelerating advancement and enabling independent safety assessments that closed systems cannot readily offer. Closed source implies that the code is proprietary — users can utilize the product but are not granted insight into its operations, as is the case with OpenAI’s GPT models — a distinction that has become a critical issue within the AI sector.
Parallelization refers to executing multiple tasks concurrently rather than sequentially — akin to having ten employees working simultaneously on different segments of a project instead of one individual handling all aspects consecutively. In AI, parallelization is fundamental to both training and inference: modern GPUs are specifically engineered to conduct thousands of calculations concurrently, which significantly contributes to their becoming the pivotal hardware in the industry. As AI systems develop in complexity and models enlarge, the capability to parallelize operations across numerous chips and machines has become a crucial aspect in determining how swiftly and cost-effectively models can be constructed and launched. Research into superior parallelization techniques is now a burgeoning field of study in its own right.
RAMageddon is a playful term for a serious trend overtaking the technology sector: an escalating shortage of random access memory, or RAM chips, that power virtually all the tech devices we engage with daily. With the AI sector thriving, the largest tech corporations and AI laboratories — all competing for the most robust and efficient AI — are purchasing vast quantities of RAM to support their data centers, leaving scant resources for everyone else. This supply bottleneck is driving up prices for what remains.
This shortage affects various sectors, including gaming (where leading companies have been compelled to hike prices on consoles due to difficulties in sourcing memory chips), consumer electronics (where RAM scarcity could lead to the most significant decline in smartphone shipments in over a decade), and general enterprise computing (as companies struggle to acquire enough RAM for their data centers). Anticipated price increases are likely to persist until the dreaded shortage is resolved, but unfortunately, there’s currently little indication of when that might occur.
Similar to AGI, recursive self-improvement presents a threshold for AI’s intelligence and its reliance on humans. In the RSI scenario, AI models initiate their self-enhancement without human intervention, which could lead to a significant acceleration in their capabilities and autonomy. Some narratives portray this moment as catastrophic, akin to the singularity, when AI models become resistant to external influence. Nevertheless, RSI also describes a fundamental capability — can an AI model design its own successor? — simplifying the process for engineers to attempt to construct it. Various recent AI startups aim to create recursively self-improving models, although most downplay the apocalyptic implications, presenting RSI merely as the next frontier for exploration.
Reinforcement learning is a training methodology for AI wherein a system acquires knowledge through experimentation and receives rewards for correct answers — similar to training a beloved pet with treats, except here the “pet” is a neural network, and the “treat” represents a mathematical signal indicating success. Unlike supervised learning, where a model is educated on a predetermined dataset of labeled examples, reinforcement learning permits a model to explore its surroundings, take actions, and continuously refine its behavior based on the feedback received. This approach has proven particularly effective for training AI to engage in gaming, control robots, and, more recently, enhance the reasoning abilities of large language models. Techniques such as reinforcement learning from human feedback, or RLHF, have become central to how leading AI labs optimize their models for greater helpfulness, accuracy, and safety.
In terms of communication between humans and machines, several evident challenges arise — individuals relay information using human language, while AI programs execute tasks through intricate algorithmic processes informed by data. Tokens act as the connecting element: they are the fundamental components of human-AI communication, representing distinct segments of data processed or produced by an LLM. Tokens are generated during a process called tokenization, which breaks down raw text into manageable units that a language model can process, similar to how a compiler translates human language into binary code that a computer can interpret. In enterprise contexts, tokens also produce cost implications — many AI companies bill per token used for LLM interactions, indicating that the more a business engages, the greater the expense.
Thus, tokens represent small fragments of text — often portions of words rather than complete ones — into which AI language models segment language prior to processing; they are roughly comparable to “words” in terms of comprehending AI workloads. Throughput refers to the volume that can be processed within a specific timeframe, making token throughput effectively a measure of how much AI tasking a system can handle at once. High token throughput is a primary objective for AI infrastructure teams, as it dictates how many users a model can simultaneously accommodate and how swiftly responses are delivered. AI researcher Andrej Karpathy has expressed anxiety when his AI subscriptions remain idle — mirroring sentiments he experienced as a graduate student when costly computer hardware went underutilized — a sentiment that underscores why maximizing token throughput has evolved into somewhat of an obsession in the discipline.
The process of developing machine learning AIs is termed training. In simple terms, it involves feeding data into the model so it can learn from patterns and produce useful outputs. Essentially, this procedure pertains to the system reacting to characteristics within the data which enables it to tailor outputs to a desired goal — whether that’s identifying feline images or generating a haiku upon request.
Training can incur significant costs because it necessitates vast amounts of input data, and the quantities required have been on the rise — which is why hybrid methodologies, such as fine-tuning a rules-based AI with targeted data, can help manage expenses without starting from scratch.
[Refer to: Inference]
A tactic whereby a previously trained AI model serves as the foundation for developing a new model for a different yet typically related task — enabling previously acquired knowledge to be reapplied.
Transfer learning can yield efficiency gains by streamlining model development. It can also be advantageous when the data available for the task at hand is somewhat restricted. However, it’s crucial to acknowledge that this approach does have limitations. Models that depend on transfer learning for generalized capabilities will likely need additional training on supplementary data to perform effectively in their specified domain.
(Refer to: Fine tuning)
Validation loss is a metric that indicates how effectively an AI model is learning throughout training — with lower values being preferable. Researchers monitor it closely as a sort of real-time evaluation, using it to decide when to terminate training, when to tweak hyperparameters, or whether to look into a possible issue. One significant concern it helps identify is overfitting, a scenario in which a model memorizes its training dataset rather than truly assimilating patterns it can generalize for new contexts. Think of it as the distinction between a student who thoroughly grasps the material and another who merely memorized last year’s examination — validation loss assists in uncovering which type your model is becoming.
Weights are fundamental to AI training, as they dictate the degree of importance (or weight) attributed to various features (or input variables) within the data utilized for training the model — thereby influencing the output of the AI system.
In other words, weights are numerical parameters that signify the most salient aspects of a dataset for the designated training task. They fulfill their function by applying multiplication to inputs. Typically, model training commences with randomly assigned weights, but as the training progresses, these weights adjust as the model endeavors to achieve an output that closely aligns with the target.
For instance, an AI model predicting real estate prices trained on historical data for a particular locale may incorporate weights for attributes such as the count of bedrooms and bathrooms, whether a property is detached or semi-detached, and if it includes parking or a garage, among others.
Ultimately, the weights assigned to each of these factors reflect how significantly they affect the value of a property based on the dataset provided.
This article is consistently updated with fresh information.