All posts

What Is RAG?

By Scott Rippey

RAG gets thrown around like a magic spell, usually right before someone tries to sell you something. So let's pull it apart and look at what is actually inside, because it is far simpler than the acronym lets on.

RAG stands for Retrieval Augmented Generation. Underneath that mouthful is a genuinely plain idea: when you ask an AI a question, it first goes and finds the relevant pieces of your content, and then it answers using them - instead of guessing from whatever it happened to be trained on. That is the whole trick behind an AI that actually knows your business: your pricing, your policies, last week's support tickets.

The machinery that makes this work is not exotic, even though the jargon tries hard to convince you otherwise. To really understand RAG - and to build one that holds up - you only need to follow how those relevant pieces get found and stored. So let's build it up one piece at a time, starting with where your content lives.

Two kinds of databases

Let's start with the basic split, because everything else sits on top of it.

A traditional database is the kind running quietly behind almost every app you have ever used - Postgres, MySQL, MongoDB. It stores structured data in rows and columns (or documents and fields), and it matches that data exactly. Ask it for "marketing strategies" and it hands back the rows that literally contain those two words. It has no idea that "promotion tactics," "branding approaches," and "advertising methods" are the very same thing you were after - it is matching characters, not concepts. And honestly, that is exactly what you want most of the time. When you are looking up a customer by their ID, an order by its number, or a user by their email, exact is the entire point.

A vector database works differently. Instead of storing the literal text, it stores a kind of numerical fingerprint of what each piece of content means, and it searches by similarity rather than by exact words. Ask it for "marketing strategies" and it will happily surface the promotion, branding, and advertising material too - not because those documents share any words with your query, but because it understands they are about the same idea.

Traditional databaseVector database
StoresExact text, rows and fieldsMeaning, as numbers
Matches onExact wordsConcepts
"marketing strategies" findsOnly that phrasePromotion, branding, advertising too
Best forIDs, records, precise lookupsQuestions, ideas, "find me things like this"

Neither one is better than the other; they simply answer different questions. The rest of this is really about how that second kind works - and, as you will see, how it quietly folds right into the first.

Turning meaning into numbers

So how does a database come to understand meaning in the first place? The answer is a thing called an embedding, and it is the real engine under everything here.

An embedding is just a list of numbers that captures the meaning of a piece of text. You hand a sentence to an AI model, and it hands back a vector - that long list of numbers - representing what the sentence is actually about. The magic, such as it is, comes down to one property: things that mean similar things end up with similar numbers. "Car repair" and "fix my automobile" land right next to each other, even though they do not share a single word.

You do not need to picture the math to use it. The one idea worth holding onto is this: similar meaning, similar numbers. Once your content is stored as these vectors, "find me related things" turns into "find me nearby numbers" - and finding nearby numbers is something a computer can do blindingly fast, even across millions of items.

Modern embedding models pack a surprising amount into each vector, anywhere from a few hundred to a few thousand numbers per piece of text. The more numbers there are, the more room the model has to capture subtle shades of meaning. That is the whole reason this works as well as it does.

What a vector database actually is

Here is the part the hype tends to skip right over. A vector database is not some exotic new technology you have to go acquire. It is simply your own content, broken into chunks, with each chunk stored next to its embedding and a pointer back to where it came from.

That really is the whole recipe:

  • Chunk it. Take your articles, support docs, or product descriptions and split them into bite-sized pieces.
  • Embed each chunk. Run every piece through an embedding model to get its vector.
  • Store them together. Save the chunk, its vector, and a reference back to the original record, all side by side.

Then, when someone searches, you embed their question the exact same way, find the chunks whose vectors sit closest to it, and pull back the originals. "Vector database" is really just a name for storing your data like this. You are not replacing your information with anything magical - you are simply keeping a meaning-searchable copy of it right alongside the real thing.

And that is RAG

Now we have every piece we need, so let's snap them together - because this assembly is exactly the thing we started with. This is RAG, in full.

Remember the problem: an AI model only knows what it was trained on. It has never seen your pricing, your policies, or last week's support tickets. RAG closes that gap in three steps, and now you can see precisely what each word in the name is doing:

  1. Retrieve - take the user's question, embed it, and find the most relevant chunks of your content by meaning.
  2. Augment - hand those chunks to the AI model as context, right alongside the original question.
  3. Generate - let the model write its answer using your actual content, not just its training.

That is the entire idea. Retrieval Augmented Generation is nothing more than "find the relevant stuff first, then let the AI answer with it in hand." It is also why a well-built RAG system hallucinates less and can cite real sources: the model is working from documents you handed it, not from a hazy memory of the internet.

You probably do not need a separate database

For years, the standard advice was to run two databases side by side: your normal one for records, and a dedicated vector database next to it, with a bunch of machinery to keep the two in sync. For most businesses, that advice is simply out of date now.

Postgres - one of the most common databases in the world - can store and search vectors directly, through an extension called pgvector. Supabase ships it built right in. That means your embeddings can live in the very same database as the data they came from. One place. Your records and their meaning-searchable chunks, sitting together, queried together, with no second system to sync and no extra moving parts to babysit.

The practical takeaway: if you are already on Postgres or Supabase, you can almost certainly add semantic search without standing up a separate vector database at all. Start there, and reach for more only when you have a real reason to.

When you outgrow your own database

There does come a point where a dedicated vector database genuinely earns its keep - think tens of millions of vectors and up, or query volume heavy enough to need purpose-built scaling. That is where tools like Pinecone, Weaviate, Qdrant, Milvus, and Chroma come in. They are engineered specifically for similarity search at serious scale, and they are very good at it.

But that is a scale-up decision, not a starting line. Most businesses never come close to needing it. Reaching for a specialized vector database on day one is usually how you end up maintaining two systems to do the work one could have handled on its own.

Making it genuinely good

By now you have the whole machine: meaning stored as vectors, sitting in your database, feeding the right context to an AI. Getting some results back is the easy part. Getting the right results back, reliably, is what separates a slick demo from something you would actually put in front of a customer. Two upgrades do most of that work, and they matter more and more as your content grows. Think of them as bolt-ons that stack on top of everything we have built so far.

Hybrid search - meaning plus keywords. Searching purely by meaning has one real blind spot: exact terms. Product codes, part numbers, people's names, acronyms - the vector can drift right past the single document that contains the exact string you typed. Plain old keyword search nails those every time. So you run both at once and combine the results, and each one catches what the other would have missed. This is widely treated as the sensible default for serious search, not some optional extra.

Reranking - a second, sharper read. Your first pass casts a wide net cheaply, grabbing maybe the top fifty candidates. A reranker then takes that shortlist and reads each one far more carefully against the original question, pushing the genuinely best few up to the top. Services like Cohere's reranker do exactly this. Because you only spend the slow, expensive step on a small handful of candidates, the whole thing stays fast.

And here is the part that ties them together: rerankers lean heavily on meaning, so left on their own they can still stumble over exact codes and names - which is precisely what keyword search is best at. The two cover each other's weak spots, and that is why strong systems tend to layer both on. They are the polish you add once the foundation underneath is solid.

The plain version

Strip away every bit of terminology and it comes down to this:

RAG is just letting an AI find the relevant pieces of your own content first, then answer using them.

Embeddings turn meaning into numbers. A vector database is your content stored as those numbers, so it can be found by meaning instead of exact words. Your existing database can very likely do all of it. And when you want the results to be genuinely good, you add keyword matching back into the mix and a reranking pass on top.

No magic, and no exotic infrastructure required. Just your own data, organized so the search finally works the way people actually think.