← Back to articles

Laravel OpenAI API Integration: Build a Production-Ready Endpoint

Learn Laravel OpenAI API integration with streaming responses, retries, and robust error handling for stable customer inquiry endpoints in production.

Feb 28, 20264 min readNivesh Saharan

If your first AI prototype worked in local but failed under real traffic, this guide is for you. In this laravel openai api integration tutorial, we will build a resilient endpoint with streaming output, retry logic, and clean fallback behavior.

Start with a resilient architecture

A production-ready endpoint needs four things from day one:

  • strict request validation,
  • timeout and retry strategy,
  • user-safe error handling,
  • observability (latency, status, and request IDs).

A simple flow looks like this:

  1. Validate input and attach a trace ID.
  2. Call OpenAI from a dedicated service class.
  3. Retry only transient failures (429/5xx).
  4. Stream partial response to UI when useful.
  5. Log structured telemetry for debugging and cost tracking.

Implement an OpenAI client service in Laravel

Add a small service layer so all API behavior (timeouts, retries, model config) stays in one place.

<?php
 
namespace App\Services;
 
use Illuminate\Http\Client\PendingRequest;
use Illuminate\Support\Facades\Http;
 
class OpenAIClient
{
    public function http(): PendingRequest
    {
        return Http::withToken(config('services.openai.key'))
            ->acceptJson()
            ->timeout(45)
            ->retry(
                3,
                fn (int $attempt) => 200 * $attempt,
                fn ($exception, PendingRequest $request) => $this->isRetryable($exception)
            );
    }
 
    public function createResponse(string $input, ?string $instructions = null): array
    {
        $payload = [
            'model' => config('services.openai.model'),
            'input' => $input,
        ];
 
        if ($instructions) {
            $payload['instructions'] = $instructions;
        }
 
        return $this->http()
            ->post('https://api.openai.com/v1/responses', $payload)
            ->throw()
            ->json();
    }
 
    private function isRetryable($exception): bool
    {
        $status = method_exists($exception, 'response') ? optional($exception->response)->status() : null;
 
        return in_array($status, [429, 500, 502, 503, 504], true);
    }
}

Set config once in config/services.php:

'openai' => [
    'key' => env('OPENAI_API_KEY'),
    'model' => env('OPENAI_MODEL', 'gpt-5'),
],

Expose a streaming endpoint for faster UX

This pattern keeps the interface responsive for longer generations.

<?php
 
namespace App\Http\Controllers;
 
use App\Services\OpenAIClient;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\StreamedResponse;
 
class CustomerReplyStreamController extends Controller
{
    public function __invoke(Request $request, OpenAIClient $openAI): StreamedResponse
    {
        $data = $request->validate([
            'message' => ['required', 'string', 'max:5000'],
            'context' => ['nullable', 'string', 'max:5000'],
        ]);
 
        return response()->stream(function () use ($openAI, $data) {
            $traceId = (string) \Illuminate\Support\Str::uuid();
 
            echo "event: meta\n";
            echo 'data: ' . json_encode(['trace_id' => $traceId]) . "\n\n";
            ob_flush();
            flush();
 
            try {
                $input = "Context: " . ($data['context'] ?? 'none') . "\n\nUser: " . $data['message'];
                $result = $openAI->createResponse($input, 'Draft a concise and accurate support reply.');
 
                echo "event: done\n";
                echo 'data: ' . json_encode([
                    'output' => data_get($result, 'output_text', ''),
                    'trace_id' => $traceId,
                ]) . "\n\n";
            } catch (\Throwable $e) {
                report($e);
 
                echo "event: error\n";
                echo 'data: ' . json_encode([
                    'message' => 'AI service is temporarily unavailable. Please try again.',
                    'trace_id' => $traceId,
                ]) . "\n\n";
            }
 
            ob_flush();
            flush();
        }, 200, [
            'Content-Type' => 'text/event-stream',
            'Cache-Control' => 'no-cache',
            'X-Accel-Buffering' => 'no',
        ]);
    }
}
use App\Http\Controllers\CustomerReplyStreamController;
 
Route::post('/ai/customer-reply-stream', CustomerReplyStreamController::class);

Minimal browser consumer:

const source = new EventSource('/ai/customer-reply-stream?message=' + encodeURIComponent(message));
 
source.addEventListener('done', (event) => {
  const payload = JSON.parse(event.data);
  renderOutput(payload.output);
  source.close();
});
 
source.addEventListener('error', () => {
  showError('Could not generate a reply right now.');
  source.close();
});

Real-world scenario: customer inquiry console

Imagine your support team handles 500+ inquiries per day. Agents need first drafts in under 5 seconds.

With this setup you get:

  • stable endpoint behavior under burst traffic,
  • user-visible progress instead of frozen UI,
  • trace IDs in logs for fast incident triage,
  • safer customer messaging when provider errors happen.

Common mistakes in Laravel OpenAI API integration

  • Retrying every error, including invalid requests (4xx) that should fail fast.
  • Leaking provider error payloads directly to end users.
  • Skipping per-request trace IDs.
  • Calling the API directly from controllers without a shared service layer.
  • Ignoring timeout boundaries, which can exhaust worker capacity.

Production checklist

  • Retry only transient failures (429, 5xx).
  • Log trace_id, latency_ms, status_code, and model.
  • Define safe fallback copy for failed generations.
  • Add throttling on endpoint and per-user quotas.
  • Monitor p95 latency and error-rate alerts.
  • Add feature flag for rolling back AI quickly.

FAQ

1) Is streaming required for every use case?

No. For short outputs you can use a normal JSON response. Stream when outputs are longer or UX sensitivity is high.

2) Should retries happen on frontend or backend?

Backend. It centralizes policy and prevents inconsistent client behavior.

3) How do I avoid vendor lock-in?

Keep provider calls in one service class and expose provider-agnostic methods to the rest of your app.

Series navigation and references

Keep following this series for more production-first Laravel AI patterns.

Official references:

Related articles