AI Explained

Pulling Back the Curtain on How Modern AI Actually Works

An understandable explanation of what AI is and how it works, written for normal humans in everyday language.

How many times have you used AI today? Or heard it mentioned this week? You might be surprised if you actually stop and count. AI is already integrated into our everyday lives.

But do you really understand what AI actually is? A big black box where stuff goes in and magic comes out, right?

Nope! But here’s the thing… you’re far from alone if you don’t fully get it yet.

So let’s pull back the proverbial curtain and understand AI.

The Definition: AI

AI is short for artificial intelligence and is used to describe almost everything these days. If in doubt, say it’s AI and most people won’t blink an eye.

Spam filters in email? AI.
Netflix recommendations for your next binge fest? AI.
The bidet ads showing up right after Alexa overhears your kid say you’re out of toilet paper? AI.
Waymo cars circling an Atlanta cul-de-sac ad nauseam? AI.
Your high school friend’s IG photos looking like she’s 20 years younger than you? AI.

That’s just the start of a list, and it’s already a lot to lump under one AI umbrella. The concept of AI has been around since the fifties, but it’s only recently that the “what it can do” factor has changed dramatically. Even bidets are “smart” now.

So let’s narrow in on a definition: AI is software built to perform tasks that typically require human intelligence.

These tasks include things like recognizing what’s in a picture, understanding and responding to spoken language, making movie recommendations, and evaluating options to make smart choices.

Some of those tasks have been around much longer than mainstream AI. So what’s different recently?

AI differs from traditional software in how it’s built.

The Shift: From Rules to Learning

The software we’ve been using for years follows rules. Developers provide explicit instructions: if this, then that. If the user hits SUBMIT, store form fields to the database. If the user presses PLAY, start the video playback. If the user presses a button repeatedly, track a rage click. And so on.

The traditional program does exactly what it is told to do and nothing else. If it is asked to do something that it doesn’t have a specifically programmed rule for, it can’t handle it.

AI is different. It learns from examples rather than following rules.

Take spam filters.

The old way? Rules.

If a Nigerian prince needs help getting $$$ out of the country, flag spam.
If the subject line is ALL CAPs followed by !!!!!!!!!!!!, flag spam.
If it’s from Big Bank’s Security Team but the email is cooldude@xk67mail.ru, flag spam.
If a lawyer in Lagos asks for bank details to wire $9 million, flag spam.

The new way? No rules.

AI models were literally shown millions of emails labeled spam or not spam. It’s nice of spammers to provide us with so much training material… but I digress.

Over time, AI learned to identify spam that rules could not. The model identified patterns for what constitutes spam versus legit emails.

The fundamental AI concept here is the shift from defined rules to learned patterns. Everything else is icing on the cake.

The Process: How AI Learns

At the core of modern AI is something called a neural network; more on this in a bit. For now, just think of a neural network as a huge, complex web of connections that improves itself the more data it sees.

Picture this: your kid slaps a handful of slime in their hair. The more you tease it out, the more it untangles. It’s not a great analogy though, as pretty soon you just grab scissors and embrace the bald spot. AI would never.

But that’s the idea. There’s a thing called a neural network and now we need a massive dataset of examples. For the spam model, beaucoup emails. For an image model, pictures. For a language model, lots of written… well, basically the whole internet.

You give the model an example, ask it to make a guess, then tell it how close it was. Good guess? Pattern gets stronger. Bad guess? Weaker.

Rinse, repeat. On steroids. Seriously, this is done millions of times. Example in, feedback given, pattern strength adjusted accordingly. That’s model training.

The tiny tweaks made to patterns in each round are called weights. Weights are numbers that determine how important each piece of info is to the network. At the end of training, these weights can reliably produce good guesses, or outputs.

Put the whole thing together - the neural network, the training, the weights, the patterns - and you have a model.

No rules anywhere. Network, weights, patterns.

While rules would limit the model to what it has been explicitly given, the AI model can make sense of things it has never seen before.

That’s kinda mind blowing.

The Neural Network: Deep Learning

If not a cheesy pile of spaghetti or slimy bald spot, what is a neural network?

AI is built on a technical architecture consisting of interconnected nodes arranged in layers. Think of it as layers and layers of mini marshmallows connected with toothpicks. The toothpicks connect marshmallows between neighboring layers.

Now visualize each marshmallow as a tiny calculator: a node. The node / calculator / marshmallow receives numbers, applies what it learned during training, and passes new numbers out. That’s it. Get numbers in, act on training, pass numbers on.

Nodes are organized into three types of layers: input, hidden, and output. There can be hundreds of hidden layers and this is where the deep in deep learning comes from. More layers allow more complexity in the model’s patterns. Each layer transforms incoming data before passing it out so it makes sense that more layers, more processing, more complexity.

So, let’s say we’re working with an animal recognition model where we provide a picture and the model returns the species.

The input layer takes in the raw data; in this example, the raw data is a picture of an animal. There’s a node for each and every pixel in the image. Each node processes its pixel value and passes the result to the next layer. So if the pixel is red, the RGB value of red (255,0,0) is passed on to the first of the hidden layers.

Hidden layers are where the actual pattern detection happens. Early layers might be identifying edges in an image, so they would be seeking out neighboring nodes where the values are significantly different. If the pixel next to our red node is the same color red, it’s likely more of whatever is red in the picture. But if the next door pixel is green, visually we’d probably be seeing an edge in the picture.

Remember those weights our model determined in training? Those weights are key here. As values are passed between nodes, they’re multiplied by the weights. That determines how much importance each node has on the next.

The model steps from layer to layer, recognizing increasingly abstract and complex patterns. Middle layers start combining the edges detected by earlier layers into shapes and parts. These curved edges close to each other match a learned pattern and are probably an eye. This pattern of edges and curves looks like fur texture. These shapes clustered together suggest a four-legged body. The model doesn’t actually call them eyes, fur, and body, but the patterns are becoming clearer.

In even deeper layers, the patterns suggest that these combined shapes look like an animal face and this combo of snout, ears, fur, and body shape match the pattern learned for dogs.

Deeper still, the layers can separate subtle color combinations, ear shapes, and snout length to hone in on a breed and ultimately identify a golden retriever vs a yellow lab.

In many models, deeper layers compress information and the number of nodes decreases. After starting with one node per pixel, the model distills down to a handful of nodes representing possible outputs. A standard square image is 1024 x 1024 pixels meaning we’d start with over one million nodes. During processing, models significantly compress huge amounts of pixel-level data into bigger chunks of learned patterns.

Finally, in the output layer, a confidence score for each possible answer is reached. Golden retriever: 92%, yellow lab: 5%, Other: 1%. When the model reaches an acceptable confidence level, it answers. In this case, the animal photo is of a golden retriever.

Each layer builds on the one before it, moving from raw pixels to actual understanding. Or at least something that looks a lot like what we call understanding.

1 / 2

picture of golden retriever with pixel overlay

RAW PIXELS

The model receives the entire image as numbers for each pixel in the red, green, and blue channels.

The image is a grid or pixel values (color + brightness).

RAW PIXELS

The model receives the entire image as numbers for each pixel in the red, green, and blue channels.

The image is a grid or pixel values (color + brightness).

EDGES

Convolutional filters respond strongly to edges. The model builds a map of where edges exist.

Early laters detect simple edges like lines and corners at many locations and orientations.

🧠

The Big Idea

AI doesn't look at one small part; it transforms the entire image through many layers, building up from simple patterns to rich, meaningful understanding.

✓

Each layers sees the whole image.

✓

Simple patterns combine to form complex ideas.

✓

The final layers reason about what (and which breed) it is.

The Modern Differentiator: Transformers

The concepts we’ve covered so far have been available for years. But AI feels totally different than it did just five years ago.

Neural networks historically worked with data sequentially. One piece of data at a time, in order. To read a sentence, AI went word by word and had limited memory to store earlier context. This greatly limited how well systems could interpret language, nuance, and bigger relationships between ideas.

In 2017, along came a research paper introducing a revolutionary architecture called the transformer. Transformers process entire inputs at the same time and model relationships between all parts simultaneously. So instead of word by word with limited memory as above, a transformer can read the whole sentence in its entirety and determine how each word relates to every other word. This is called attention.

Take this sentence:

The trophy didn’t fit in the suitcase because it was too big.

We know immediately that the trophy was too big to fit in the suitcase, not that the suitcase was too big. A sequential reading model struggles to get that right. But a transformer that processes the whole sentence simultaneously gets it.

So while we often default to saying that AI predicts the next word, it’s actually much cooler than that. AI went from reading word by word like moving your finger across the page to consuming the whole page all at once and instantly connecting every part to each other.

But wait! There’s more. Transformers can also scale. The more data and computing power they get, the better they get. Continuously. The relationship between scale and capability is a really, really, really big deal.

So from the paper published in 2017 to the launch of ChatGPT in 2022, transformer models were given more data, more power, and more training than ever before.

The new era of AI isn’t a new idea, it’s just running at phenomenal scale.

The Response: Constructed, Not Retrieved

Ask Google a question and it searches a database. Looks up a topic and retrieves an answer.

Ask AI a question and it generates a response from learned patterns. Takes everything it has been given, both now and in training, and constructs what should come next.

That difference explains why AI can act incredibly knowledgeable across almost all subjects, but can also be confidently, arrogantly, completely wrong. It provides what patterns suggest, not necessarily what is correct.

Both sides of that coin make total sense once you understand what’s actually happening behind the curtain.

The Truth: What Exists Today

Spoiler alert: despite the headlines, artificial general intelligence (AGI) doesn’t exist yet. AI that can do anything a human can do isn’t here.

We’re in a time of narrow AI with systems that are really good at specific tasks, but different models need to be used for different purposes.

The AI telling you to watch Pluribus because you loved Wall-E is not the same as the AI telling your bank that even though you just got back from Switzerland, the charge for yodeling lessons is fraudulent. Each model is trained for its domain.

Within its domain, current AI can exceed human performance, but move outside that specific expertise and who knows what you’ll get. A model that can write a killer essay can’t accurately count the letters in a word.

As of now, no one knows when, or even if, true AGI is possible. But we know that what we have now is leaps and bounds ahead of five years ago, and pretty darn impressive in its own right.

The Reality: Many Flavors

We also have different names for the various systems with the overall AI category; each works in very different ways across their domains. Many hybrid systems exist today combining expertise; some are multimodal within one model, but many are actually different models working together rather than one crossing domains.

Generative AI creates content
Predictive AI forecasts outcomes and makes recommendations
Computer Vision AI interprets images
Speech AI understands and generates spoken language
Agentic AI performs actions

Not all AI is created equal.

The Why Now: Triple Convergence

AI has been researched for over seventy years, yet it feels like everything just changed.

Three things came together in a perfect storm scenario:

Transformer architecture providing a new way to process language and context at the same time
Scalable computing power capable of training models previously unimaginable in size
Massive amounts of training data representing a significant chunk of human written knowledge (aka the internet)

The convergence of those three things created capabilities vastly different than what came before. It’s not a new version of the same thing, but something new. Not a magic box, but something that makes sense and is truly understandable.

Who knows what’s next, but this isn’t a moment that’s passing. This AI thing is compounding.