A strong laravel rag tutorial should focus on one thing: grounded answers from your own documents, not generic model memory. In this guide, we will build a simple retrieval pipeline you can run in production.
What RAG solves for PHP teams
RAG (Retrieval-Augmented Generation) helps when answers must come from internal knowledge:
- product docs,
- onboarding guides,
- policy pages,
- runbooks.
Without retrieval, models may produce plausible but incorrect details. With retrieval, you pass relevant chunks as context and force answer behavior to stay grounded.
Build a document chunk and embedding pipeline
Create a command that chunks source text and stores embeddings.
<?php
namespace App\Console\Commands;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Http;
class IndexKnowledgeBaseCommand extends Command
{
protected $signature = 'ai:index-kb {source}';
public function handle(): int
{
$source = (string) $this->argument('source');
$text = file_get_contents($source);
if (! $text) {
$this->error('Could not read source file.');
return self::FAILURE;
}
$chunks = collect(str_split($text, 1200)); // Start simple; refine later.
foreach ($chunks as $idx => $chunk) {
$embedding = Http::withToken(config('services.openai.key'))
->post('https://api.openai.com/v1/embeddings', [
'model' => 'text-embedding-3-small',
'input' => $chunk,
])
->throw()
->json('data.0.embedding');
DB::table('doc_chunks')->updateOrInsert(
['source' => $source, 'chunk_index' => $idx],
[
'content' => $chunk,
'embedding' => json_encode($embedding),
'updated_at' => now(),
]
);
}
$this->info('Indexed knowledge base successfully.');
return self::SUCCESS;
}
}Example migration skeleton:
Schema::create('doc_chunks', function (Blueprint $table) {
$table->id();
$table->string('source');
$table->unsignedInteger('chunk_index');
$table->longText('content');
$table->json('embedding');
$table->timestamps();
$table->unique(['source', 'chunk_index']);
});Retrieve top chunks and answer with grounded context
<?php
namespace App\Services;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Http;
class RagAnswerService
{
public function answer(string $question): array
{
$vector = Http::withToken(config('services.openai.key'))
->post('https://api.openai.com/v1/embeddings', [
'model' => 'text-embedding-3-small',
'input' => $question,
])
->throw()
->json('data.0.embedding');
$chunks = DB::table('doc_chunks')
->select(['id', 'source', 'content'])
->orderByRaw('embedding <=> ? asc', [json_encode($vector)])
->limit(4)
->get();
$context = $chunks
->map(fn ($c) => "Source: {$c->source}\n{$c->content}")
->implode("\n\n---\n\n");
$response = Http::withToken(config('services.openai.key'))
->post('https://api.openai.com/v1/responses', [
'model' => config('services.openai.model'),
'instructions' => 'Answer only from provided context. If the answer is missing, say "I do not know based on the provided docs."',
'input' => "Context:\n{$context}\n\nQuestion:\n{$question}",
])
->throw()
->json();
return [
'answer' => (string) data_get($response, 'output_text', ''),
'sources' => $chunks->pluck('source')->unique()->values()->all(),
];
}
}Real-world scenario: support assistant for product docs
Your support agents answer feature questions all day. Replies must match what docs actually say.
With this RAG flow:
- agent enters customer question,
- service retrieves best chunks from docs,
- answer returns with source list,
- ambiguous cases escalate to human.
Result: faster responses and fewer hallucinated claims.
Common mistakes when implementing RAG in Laravel
- Using chunk size too large for retrieval precision.
- Not re-indexing docs after edits.
- Returning answers without source attribution.
- Treating top-1 similarity as always trustworthy.
- Allowing model to answer without retrieval context.
Production checklist
- Define chunk size and overlap per doc type.
- Schedule re-indexing on doc updates.
- Track retrieval hit rate and no-answer rate.
- Show sources in UI for every generated answer.
- Add confidence thresholds for escalation.
- Keep PII out of retrievable chunks when possible.
FAQ
1) Do I need a vector database immediately?
No. Start with your existing database if vector operations are available, then migrate when scale requires it.
2) How many chunks should I pass to generation?
Start with 3 to 6 chunks. Tune based on quality and token cost.
3) Can RAG guarantee no hallucinations?
No. It reduces hallucinations, but guardrails and evals are still required.
Series navigation and references
- Previous: Prompt Engineering in PHP: Structured Output That Doesn't Break
- Foundation: AI for PHP and Web Developers: Complete 6-Part Series
- Next: Production AI in PHP: Guardrails, Cost Control, and Evals
The next part shows how to add cost control, moderation, and eval loops to this stack.
- Newsletter: Get practical build notes
- Secondary contact: Start a conversation
Official references: