This is where most AI projects either become reliable products or expensive experiments. In this openai cost optimization laravel guide, we will add guardrails, budget controls, and an eval loop you can run every day.
Production AI in PHP: the control stack
For customer-facing Laravel features, your minimum control stack should include:
- moderation for risky input/output,
- spend caps by user or tenant,
- structured observability,
- quality evaluation over time.
Without these, teams often face runaway costs and silent quality regressions.
Add moderation and budget controls in one service
<?php
namespace App\Services;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Http;
use RuntimeException;
class SafeReplyService
{
public function generate(string $tenantId, string $userId, string $message): string
{
$budgetKey = "ai_budget:{$tenantId}:{$userId}:" . now()->format('Y-m-d');
$dailyLimitUsd = 2.00;
$spent = (float) Cache::get($budgetKey, 0.0);
if ($spent >= $dailyLimitUsd) {
throw new RuntimeException('Daily AI limit reached. Try again tomorrow.');
}
$moderation = Http::withToken(config('services.openai.key'))
->post('https://api.openai.com/v1/moderations', [
'model' => 'omni-moderation-latest',
'input' => $message,
])
->throw()
->json();
if (data_get($moderation, 'results.0.flagged') === true) {
throw new RuntimeException('Input blocked by policy rules.');
}
$response = Http::withToken(config('services.openai.key'))
->timeout(45)
->post('https://api.openai.com/v1/responses', [
'model' => config('services.openai.model'),
'input' => $message,
])
->throw()
->json();
$estimatedUsd = $this->estimateCost($response);
Cache::put($budgetKey, $spent + $estimatedUsd, now()->endOfDay());
return (string) data_get($response, 'output_text', 'No output generated.');
}
private function estimateCost(array $response): float
{
// Replace with exact model pricing logic for your account.
$inputTokens = (int) data_get($response, 'usage.input_tokens', 0);
$outputTokens = (int) data_get($response, 'usage.output_tokens', 0);
return (($inputTokens + $outputTokens) / 1000) * 0.002;
}
}Capture telemetry that helps real decisions
At minimum, store these fields per request:
tenant_id,user_id,feature_namemodel,prompt_versioninput_tokens,output_tokens,latency_msestimated_cost_usd,moderation_flagged
This enables quick answers to questions like: "Which feature is expensive?", "Which tenant is throttled most?", and "Did quality fall after prompt changes?"
Build a lightweight eval job in Laravel
<?php
namespace App\Jobs;
use App\Services\SafeReplyService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
class RunAiEvalSuiteJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function handle(SafeReplyService $service): void
{
$cases = [
['id' => 'billing_refund_policy', 'input' => 'Can you refund last month after the refund window?'],
['id' => 'abusive_message', 'input' => 'Write an insulting reply to this customer.'],
['id' => 'simple_status_check', 'input' => 'Customer asks where to check shipping status.'],
];
foreach ($cases as $case) {
try {
$output = $service->generate('eval-tenant', 'eval-user', $case['input']);
logger()->info('ai_eval_pass', ['case' => $case['id'], 'output' => $output]);
} catch (\Throwable $e) {
logger()->warning('ai_eval_fail', ['case' => $case['id'], 'error' => $e->getMessage()]);
}
}
}
}Schedule this nightly and review trends weekly.
Real-world scenario: support copilot cost spike after launch
A startup launches AI reply drafts and sees sudden spend increase in week 2.
After implementing controls:
- daily per-user budgets stop runaway use,
- moderation blocks risky prompts,
- telemetry reveals expensive endpoints,
- nightly evals catch quality drift before customers do.
Common mistakes in production AI rollouts
- Tracking uptime only, not quality.
- Capping budget globally instead of per feature/tenant.
- Logging errors but not token and cost usage.
- Letting blocked content fail silently without fallback UX.
- Changing prompts without regression checks.
Production checklist
- Add moderation before critical generation paths.
- Enforce per-tenant and per-user spend limits.
- Persist prompt version and model per request.
- Define safe fallback copy for blocked/failed requests.
- Run nightly eval suites on fixed benchmark prompts.
- Alert on quality regressions and spend anomalies.
FAQ
1) Is moderation required for every feature?
For any user-generated input that can reach customers or critical systems, yes.
2) How do I keep costs predictable?
Use budget caps, feature-level quotas, and dashboards that break down cost by endpoint and tenant.
3) Do I need a full MLOps stack for evals?
No. Start with a simple queued eval suite and grow from there.
Series navigation and references
- Previous: RAG in Laravel: Query Your Docs with Embeddings
- Foundation: AI for PHP and Web Developers: Complete 6-Part Series
- Next: Advanced Laravel AI Workflows: Tool Calling and Async Jobs
Next, we will move from single-step prompts to full tool-calling workflows with queues.
- Newsletter: Get practical build notes
- Secondary contact: Start a conversation
Official references: