RAG in Laravel: Query Your Docs with Embeddings

A strong laravel rag tutorial should focus on one thing: grounded answers from your own documents, not generic model memory. In this guide, we will build a simple retrieval pipeline you can run in production.

What RAG solves for PHP teams

RAG (Retrieval-Augmented Generation) helps when answers must come from internal knowledge:

product docs,
onboarding guides,
policy pages,
runbooks.

Without retrieval, models may produce plausible but incorrect details. With retrieval, you pass relevant chunks as context and force answer behavior to stay grounded.

Build a document chunk and embedding pipeline

Create a command that chunks source text and stores embeddings.

<?php
 
namespace App\Console\Commands;
 
use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Http;
 
class IndexKnowledgeBaseCommand extends Command
{
    protected $signature = 'ai:index-kb {source}';
 
    public function handle(): int
    {
        $source = (string) $this->argument('source');
        $text = file_get_contents($source);
 
        if (! $text) {
            $this->error('Could not read source file.');
            return self::FAILURE;
        }
 
        $chunks = collect(str_split($text, 1200)); // Start simple; refine later.
 
        foreach ($chunks as $idx => $chunk) {
            $embedding = Http::withToken(config('services.openai.key'))
                ->post('https://api.openai.com/v1/embeddings', [
                    'model' => 'text-embedding-3-small',
                    'input' => $chunk,
                ])
                ->throw()
                ->json('data.0.embedding');
 
            DB::table('doc_chunks')->updateOrInsert(
                ['source' => $source, 'chunk_index' => $idx],
                [
                    'content' => $chunk,
                    'embedding' => json_encode($embedding),
                    'updated_at' => now(),
                ]
            );
        }
 
        $this->info('Indexed knowledge base successfully.');
 
        return self::SUCCESS;
    }
}

Example migration skeleton:

Schema::create('doc_chunks', function (Blueprint $table) {
    $table->id();
    $table->string('source');
    $table->unsignedInteger('chunk_index');
    $table->longText('content');
    $table->json('embedding');
    $table->timestamps();
 
    $table->unique(['source', 'chunk_index']);
});

Retrieve top chunks and answer with grounded context

<?php
 
namespace App\Services;
 
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Http;
 
class RagAnswerService
{
    public function answer(string $question): array
    {
        $vector = Http::withToken(config('services.openai.key'))
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $question,
            ])
            ->throw()
            ->json('data.0.embedding');
 
        $chunks = DB::table('doc_chunks')
            ->select(['id', 'source', 'content'])
            ->orderByRaw('embedding <=> ? asc', [json_encode($vector)])
            ->limit(4)
            ->get();
 
        $context = $chunks
            ->map(fn ($c) => "Source: {$c->source}\n{$c->content}")
            ->implode("\n\n---\n\n");
 
        $response = Http::withToken(config('services.openai.key'))
            ->post('https://api.openai.com/v1/responses', [
                'model' => config('services.openai.model'),
                'instructions' => 'Answer only from provided context. If the answer is missing, say "I do not know based on the provided docs."',
                'input' => "Context:\n{$context}\n\nQuestion:\n{$question}",
            ])
            ->throw()
            ->json();
 
        return [
            'answer' => (string) data_get($response, 'output_text', ''),
            'sources' => $chunks->pluck('source')->unique()->values()->all(),
        ];
    }
}

Real-world scenario: support assistant for product docs

Your support agents answer feature questions all day. Replies must match what docs actually say.

With this RAG flow:

agent enters customer question,
service retrieves best chunks from docs,
answer returns with source list,
ambiguous cases escalate to human.

Result: faster responses and fewer hallucinated claims.

Common mistakes when implementing RAG in Laravel

Using chunk size too large for retrieval precision.
Not re-indexing docs after edits.
Returning answers without source attribution.
Treating top-1 similarity as always trustworthy.
Allowing model to answer without retrieval context.

Production checklist

Define chunk size and overlap per doc type.
Schedule re-indexing on doc updates.
Track retrieval hit rate and no-answer rate.
Show sources in UI for every generated answer.
Add confidence thresholds for escalation.
Keep PII out of retrievable chunks when possible.

FAQ

1) Do I need a vector database immediately?

No. Start with your existing database if vector operations are available, then migrate when scale requires it.

2) How many chunks should I pass to generation?

Start with 3 to 6 chunks. Tune based on quality and token cost.

3) Can RAG guarantee no hallucinations?

No. It reduces hallucinations, but guardrails and evals are still required.

Previous: Prompt Engineering in PHP: Structured Output That Doesn't Break
Foundation: AI for PHP and Web Developers: Complete 6-Part Series
Next: Production AI in PHP: Guardrails, Cost Control, and Evals

The next part shows how to add cost control, moderation, and eval loops to this stack.

Newsletter: Get practical build notes
Secondary contact: Start a conversation

Official references: