If your first AI prototype worked in local but failed under real traffic, this guide is for you. In this laravel openai api integration tutorial, we will build a resilient endpoint with streaming output, retry logic, and clean fallback behavior.
Start with a resilient architecture
A production-ready endpoint needs four things from day one:
- strict request validation,
- timeout and retry strategy,
- user-safe error handling,
- observability (latency, status, and request IDs).
A simple flow looks like this:
- Validate input and attach a trace ID.
- Call OpenAI from a dedicated service class.
- Retry only transient failures (
429/5xx). - Stream partial response to UI when useful.
- Log structured telemetry for debugging and cost tracking.
Implement an OpenAI client service in Laravel
Add a small service layer so all API behavior (timeouts, retries, model config) stays in one place.
<?php
namespace App\Services;
use Illuminate\Http\Client\PendingRequest;
use Illuminate\Support\Facades\Http;
class OpenAIClient
{
public function http(): PendingRequest
{
return Http::withToken(config('services.openai.key'))
->acceptJson()
->timeout(45)
->retry(
3,
fn (int $attempt) => 200 * $attempt,
fn ($exception, PendingRequest $request) => $this->isRetryable($exception)
);
}
public function createResponse(string $input, ?string $instructions = null): array
{
$payload = [
'model' => config('services.openai.model'),
'input' => $input,
];
if ($instructions) {
$payload['instructions'] = $instructions;
}
return $this->http()
->post('https://api.openai.com/v1/responses', $payload)
->throw()
->json();
}
private function isRetryable($exception): bool
{
$status = method_exists($exception, 'response') ? optional($exception->response)->status() : null;
return in_array($status, [429, 500, 502, 503, 504], true);
}
}Set config once in config/services.php:
'openai' => [
'key' => env('OPENAI_API_KEY'),
'model' => env('OPENAI_MODEL', 'gpt-5'),
],Expose a streaming endpoint for faster UX
This pattern keeps the interface responsive for longer generations.
<?php
namespace App\Http\Controllers;
use App\Services\OpenAIClient;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\StreamedResponse;
class CustomerReplyStreamController extends Controller
{
public function __invoke(Request $request, OpenAIClient $openAI): StreamedResponse
{
$data = $request->validate([
'message' => ['required', 'string', 'max:5000'],
'context' => ['nullable', 'string', 'max:5000'],
]);
return response()->stream(function () use ($openAI, $data) {
$traceId = (string) \Illuminate\Support\Str::uuid();
echo "event: meta\n";
echo 'data: ' . json_encode(['trace_id' => $traceId]) . "\n\n";
ob_flush();
flush();
try {
$input = "Context: " . ($data['context'] ?? 'none') . "\n\nUser: " . $data['message'];
$result = $openAI->createResponse($input, 'Draft a concise and accurate support reply.');
echo "event: done\n";
echo 'data: ' . json_encode([
'output' => data_get($result, 'output_text', ''),
'trace_id' => $traceId,
]) . "\n\n";
} catch (\Throwable $e) {
report($e);
echo "event: error\n";
echo 'data: ' . json_encode([
'message' => 'AI service is temporarily unavailable. Please try again.',
'trace_id' => $traceId,
]) . "\n\n";
}
ob_flush();
flush();
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no',
]);
}
}use App\Http\Controllers\CustomerReplyStreamController;
Route::post('/ai/customer-reply-stream', CustomerReplyStreamController::class);Minimal browser consumer:
const source = new EventSource('/ai/customer-reply-stream?message=' + encodeURIComponent(message));
source.addEventListener('done', (event) => {
const payload = JSON.parse(event.data);
renderOutput(payload.output);
source.close();
});
source.addEventListener('error', () => {
showError('Could not generate a reply right now.');
source.close();
});Real-world scenario: customer inquiry console
Imagine your support team handles 500+ inquiries per day. Agents need first drafts in under 5 seconds.
With this setup you get:
- stable endpoint behavior under burst traffic,
- user-visible progress instead of frozen UI,
- trace IDs in logs for fast incident triage,
- safer customer messaging when provider errors happen.
Common mistakes in Laravel OpenAI API integration
- Retrying every error, including invalid requests (
4xx) that should fail fast. - Leaking provider error payloads directly to end users.
- Skipping per-request trace IDs.
- Calling the API directly from controllers without a shared service layer.
- Ignoring timeout boundaries, which can exhaust worker capacity.
Production checklist
- Retry only transient failures (
429,5xx). - Log
trace_id,latency_ms,status_code, andmodel. - Define safe fallback copy for failed generations.
- Add throttling on endpoint and per-user quotas.
- Monitor p95 latency and error-rate alerts.
- Add feature flag for rolling back AI quickly.
FAQ
1) Is streaming required for every use case?
No. For short outputs you can use a normal JSON response. Stream when outputs are longer or UX sensitivity is high.
2) Should retries happen on frontend or backend?
Backend. It centralizes policy and prevents inconsistent client behavior.
3) How do I avoid vendor lock-in?
Keep provider calls in one service class and expose provider-agnostic methods to the rest of your app.
Series navigation and references
- Previous: AI for PHP Developers: Start Here with Laravel
- Foundation: AI for PHP and Web Developers: Complete 6-Part Series
- Next: Prompt Engineering in PHP: Structured Output That Doesn't Break
Keep following this series for more production-first Laravel AI patterns.
- Newsletter: Get practical build notes
- Secondary contact: Start a conversation
Official references: