Understanding AI Agents: How They Work

Published 29 March 2025 by

Martin Hamilton

Introduction

AI agents actually are a great niche to start a business in. Let’s look under the hood and see how they work. Just like humans need a brain, memory, and tools to do their job, AI agents need specific components to function correctly.

The Five Key Parts of an AI Agent

1. The Brain (LLM)

Every AI agent needs a brain, which in the AI world we call a large language model (LLM). You’ve probably heard of some of these: GPT from OpenAI, Claude from Anthropic, Gemini from Google, etc. You can think of the LLM as a super smart intern who can understand your instructions in plain English and figure out how to get things done. Without this brain, all the other parts would be useless—like having a desk full of office supplies but no one sitting there to use them.

2. Instructions (Prompting)

The brain needs instructions on how to behave, and this is called prompting. Writing a prompt for an agent is how you program much of its behavior rather than having to code it manually. This is what makes building AI agents so much more accessible to non-coders, as the programming is done through clearly written instructions rather than actual code.

3. Memory

Imagine trying to have a conversation with someone who forgets everything you said 30 seconds ago. Memory allows your agent to remember what you talked about in previous messages, keep track of tasks, build on previous conversations, and even learn from past interactions. The good news is that most AI agent platforms handle this memory component automatically.

4. External Knowledge (Optional)

AI models like GPT and Gemini are pre-trained on huge amounts of data, but that data is cut off at a certain point (e.g., 2024). It’s like having a new employee who only knows what they learned in school. You can give an AI agent additional knowledge through PDFs of company documents, spreadsheets with product information, customer service transcripts, or any other text-based information. Without this added knowledge, agents will be limited to general information and couldn’t handle specific business tasks.

5. Tools

Tools are what transform an AI agent from just being able to chat to being able to actually get things done. You can think of tools like giving your digital employee access to different software—just like you might give a new hire access to your email, calendar, or CRM system. These tools let your agent check real-time data, update databases, send messages, create documents, and much more. The really powerful part is when agents use multiple tools together to solve complex problems.

A Real Example

Say you want an agent to handle customer support. When the agent receives a message:

The brain immediately understands the prompt and what the customer is asking
It checks its recent memory before replying to understand the full context
If the customer wants specific information, it will use its external knowledge
It may use tools to update a customer’s account or process a refund when needed

All of this happens in seconds during the conversation, which is why AI agents are such game changers.

The Three Ingredients Framework

A more practical framework for understanding how to build AI agents is what I call the “three ingredients.” You only have three elements to plan when creating an AI agent:

Knowledge: The external data that you want the agent to use when answering
Tools: The different actions you want the agent to take (e.g., saving contact info to CRM, getting live stock data, sending emails)
Prompting: The glue that ties everything together and determines how the agent behaves

While an agent has five components, your main focus as an AI agent builder is on these three ingredients.

Understanding Tools in Depth

How the Web, APIs, and Tools Work

Tools are by far the most powerful part of AI agents. To understand them, we need to cover the basics of how software and the internet work.

Tools allow agents to take action rather than just chat. Agents use APIs, just like we do when we use the internet—we’re making dozens of requests to APIs and getting responses back without realizing it.

For example, when you click on a YouTube video:

Your browser sends a request to YouTube servers saying “I want to watch this video”
YouTube servers send back all the data needed
Your browser unpacks that data and plays the video on your screen

This request-response pattern happens with almost everything online. We get pretty websites and apps that make it easy for us to use software via APIs, but under the hood, it’s still two computers talking back and forth.

APIs are like waiters in a restaurant—they take your order (request) to the kitchen (servers) and bring back your food (response).

There are two main types of requests:

GET requests: Asking for information (checking weather, looking up prices)
POST requests: Sending information (posting a tweet, sending an email)

AI agents use these same APIs as their “buttons” to do things. Each tool an agent has access to is essentially an API it can call. These tools come in two flavors:

Pre-made integrations (Google Calendar, Gmail) that are ready to use
Custom-made tools that we can build ourselves

Anatomy of a Tool

Let’s break down how a tool is made using a simple example of a text capitalization tool:

Function: Something that does work—in this case, takes in text and makes it uppercase
API: Stands for application programming interface and it wraps around the function, making it accessible over the internet
Schema: A one-page instruction manual on how to use the API

The magic is that we can explain to our agent how to use this API just by explaining how it works in natural language.

How Schemas Work

A schema tells the agent:

What the tool does
What information it needs as input
What information to expect as output

Modern AI like ChatGPT can read these instructions and understand not just how to use the tool but when to use it.

For example, if we gave an agent our capitalization tool and asked it to “please capitalize this text: All work and no play makes johnny a dull boy,” the agent would:

Read the schema and see that there’s a tool for capitalizing text
Check the requirements and see that it needs text input
Extract “all work and no play makes johnny a dull boy” from our message
Send that to the API where the capitalization function does its work
Receive the capitalized text back
Format a natural language response: “Here’s your capitalized text: ALL WORK AND NO PLAY MAKES JOHNNY A DULL BOY”

The agent gets back raw computer data (JSON) but can transform it into natural conversation, like having an employee who can read technical information and explain it in plain English.

The Power of Multiple Tools

When you understand this pattern, you’ll never see the internet the same way again. Every action online is just requests and responses, and we can build our own tools and AI agents to automate all of it. Instead of manually searching the web, copying information, pasting it into spreadsheets, and sending emails, an AI agent can do it all automatically using tools.

The real magic happens when AI agents are given multiple tools to work with. Obviously, an agent that just capitalizes text isn’t very useful, but when they can use multiple tools together, their capabilities become truly powerful.

Related content: