Logging and Monitoring¶
stdapi.ai provides production observability. It emits structured JSON logs for every request, stream, and background task, and integrates with OpenTelemetry (OTel) for traces and metrics. This guide shows how to enable observability, read the logs, and correlate signals across systems.
-
At a glance
JSON logs to STDOUT (perfect for AWS CloudWatch Logs). One event per line. -
Correlation
All events for a request share the sameidand are returned asx-request-id. -
ECS friendly
ECS forwards container STDOUT to CloudWatch Logs automatically. -
Traces (optional)
EnableOTEL_ENABLED=trueto export spans to X‑Ray, Jaeger, Tempo, etc. -
Payload logging (optional)
EnableLOG_REQUEST_PARAMS=trueonly for targeted debugging.
Quick start (2 minutes)¶
Set these environment variables, then restart the service (see the Configuration Guide for details):
# Set minimum log level (optional, defaults to "info")
# Options: info, warning, error, critical, disabled
export LOG_LEVEL=warning
# Enable OpenTelemetry tracing
export OTEL_ENABLED=true
export OTEL_SERVICE_NAME=stdapi
# 0.0–1.0 (10% example)
export OTEL_SAMPLE_RATE=0.1
# Include request/response payloads in logs (for debugging ONLY)
export LOG_REQUEST_PARAMS=true
# Log client IP addresses (requires ENABLE_PROXY_HEADERS for real client IPs)
export LOG_CLIENT_IP=true
export ENABLE_PROXY_HEADERS=true # When behind ALB/CloudFront
Sensitive data and cost impact
Enabling LOG_REQUEST_PARAMS may expose sensitive content in logs. Use only in development or during targeted troubleshooting. Redact secrets before sharing logs externally.
Additionally, logging full request/response payloads can dramatically increase log volume and costs, especially for large LLM prompts, tool calls, and generated outputs. In AWS CloudWatch Logs, ingestion and storage costs scale with log size. Prefer short retention, targeted sampling, and temporary enablement only when needed.
Client IP Logging
When LOG_CLIENT_IP=true:
- The
client_ipfield is added to request logs - The client IP is added as
client.addressattribute to OpenTelemetry spans (whenOTEL_ENABLED=true)
To log the real client IP address (instead of the proxy IP), also enable ENABLE_PROXY_HEADERS=true when running behind AWS ALB, CloudFront, or other reverse proxies. See the Configuration Guide for details.
CloudWatch best practice
JSON to STDOUT is optimal for CloudWatch Logs Insights. In AWS ECS, the task’s log driver forwards container STDOUT to CloudWatch Logs automatically.
Event types¶
stdapi.ai emits five kinds of JSON events (one per line):
| Event | Description |
|---|---|
start |
Emitted once at server startup. Includes startup metadata and warnings. |
stop |
Emitted on graceful shutdown. Includes uptime. |
request |
One per HTTP request. Method, path, status, timings, and optional request/response. |
request_stream |
Streaming segments (SSE/audio). Indicates streaming activity and duration. |
background |
Background tasks correlated to the parent request. |
Common fields¶
Each event shares core fields and may add type‑specific ones.
| Field | Applies to | Description |
|---|---|---|
type |
all | One of start, stop, request, request_stream, background |
level |
all | info, warning, error, critical (controlled by LOG_LEVEL) |
date |
all | RFC3339, timezone‑aware timestamp |
server_id |
all | Instance identifier |
error_detail |
all | Optional list of formatted exception strings |
id |
request, request_stream, background | Correlation ID (also returned as x-request-id) |
execution_time_ms |
request, request_stream, background | Duration of the handled block |
method |
request | HTTP method |
path |
request | Request path |
status_code |
request | Final HTTP status code |
client_ip |
request | Client IP address (if LOG_CLIENT_IP=true) |
client_user_agent |
request | When provided by client |
model_id |
request | Targeted model (if applicable) |
voice_id |
request | TTS voice (if applicable) |
request_user_id, request_org_id |
request | Propagated identifiers (if applicable) |
request_params |
request | Sanitized request payload (if LOG_REQUEST_PARAMS=true) |
request_response |
request | Sanitized response payload (if LOG_REQUEST_PARAMS=true) |
event |
background | Background operation name |
server_start_time_ms, server_warnings |
start | Startup metrics and warnings |
server_uptime_ms |
stop | Uptime at shutdown |
Understanding warnings and errors
- For
requestevents, default log levels are derived from the final HTTP status: 4xx →warning, 5xx →error. Unexpected server crashes (like HTTP 500) may appear ascritical. - Authentication/authorization: For security, client responses for
401and403include only generic messages. Full diagnostic details are captured in server logs undererror_detailand can be correlated viaid(seex-request-id). server_warnings(on thestartevent) often highlights missing configuration and features that have been disabled as a result (for example, no S3 bucket configured disables certain image/audio features).error_detail(on any event) contains formatted exception traces and diagnostic hints, which frequently point to missing configuration, unavailable dependencies, or disabled features.
Correlating logs and traces¶
- Group events by
idto reconstruct a full request lifecycle (request → stream(s) → background). - The
x-request-idresponse header exposes the same value so external systems can propagate correlation. - With OTel enabled, a root span named like
POST /v1/...is created and carries attributes:http.method,http.url,http.user_agent,request.id,server.id,http.status_code, andduration_ms.
Do and Don’t for correlation
- Do propagate
x-request-idacross client → service → downstreams when possible. - Do use
request_streamdurations to account for total user‑perceived latency. - Don’t generate your own request IDs for the same hop; prefer the provided one.
Reading the logs (what to look for)¶
- High latency: Inspect
execution_time_mson therequestevent. If the response was streamed, also sumrequest_streamdurations. Combine with OTel spans to locate downstream delays (model provider, S3, etc.). - Errors: Look for
level=criticalanderror_detail(formatted exceptions). With OTel, the span is marked error with attributeserror=trueanderror.message.
When to open a GitHub issue
If you encounter level=critical events,
capture representative JSON log lines (redacting sensitive data)
and open an issue at https://github.com/stdapi-ai/stdapi.ai/issues. Include information about the failing request
to help reproduce the issue.
- Payload issues: Temporarily enable
LOG_REQUEST_PARAMS=trueto validate requests/responses, then disable. - Client identification:
client_user_agentand optionalrequest_user_id/request_org_idhelp tie requests to users. - Routing confirmation:
model_idandvoice_idconfirm which provider/model/voice handled the request.
Controlling log verbosity¶
The LOG_LEVEL environment variable controls which log events are written to STDOUT. Set it to filter out lower-severity events. For detailed configuration options, see the Logging Level section in the Configuration Guide.
info(default): All events are logged (info, warning, error, critical)warning: Only warnings and higher severity (warning, error, critical) - recommended for productionerror: Only errors and critical eventscritical: Only critical eventsdisabled: No log output (not recommended)
# Production example: reduce log volume while maintaining visibility
export LOG_LEVEL=warning
Reducing CloudWatch Costs
In high-traffic production environments, setting LOG_LEVEL=warning or LOG_LEVEL=error can significantly reduce CloudWatch Logs ingestion and storage costs by filtering out routine info-level events. This is especially effective when combined with appropriate retention policies.
Additionally, infrastructure routes are automatically excluded from logging to reduce noise: /docs, /favicon.ico, /health, /openapi.json, /redoc.
OpenTelemetry integration¶
When OTEL_ENABLED=true:
- A span is created per request and for streaming/background blocks.
- Spans carry
request.idandserver.idfor correlation. - 4xx/5xx
status_codemarks the span with an error status. - Sampling is controlled via
OTEL_SAMPLE_RATE.
For exporters and advanced setup, rely on standard OTel environment variables supported by your exporter/backend.
Example events¶
Example — Request with payload logging enabled
{
"type": "request",
"level": "info",
"date": "2025-01-01T12:00:00Z",
"server_id": "stdapi-1",
"id": "a1b2c3d4",
"method": "POST",
"path": "/v1/chat/completions",
"status_code": 200,
"model_id": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"execution_time_ms": 842,
"request_params": {"messages": [{"role": "user", "content": "..."}]},
"request_response": {"id": "cmpl_...", "choices": [...], "usage": {...}}
}
Example — Streaming segment (SSE/audio)
{
"type": "request_stream",
"level": "info",
"date": "2025-01-01T12:00:01Z",
"server_id": "stdapi-1",
"id": "a1b2c3d4",
"execution_time_ms": 1234
}
Example — Background work correlated to a request
{
"type": "background",
"level": "info",
"date": "2025-01-01T12:00:02Z",
"server_id": "stdapi-1",
"id": "a1b2c3d4",
"event": "image-upload-s3",
"execution_time_ms": 97
}
Example — Error with captured details
{
"type": "request",
"level": "critical",
"date": "2025-01-01T12:00:05Z",
"server_id": "stdapi-1",
"id": "e9f0a1b2",
"method": "POST",
"path": "/v1/images/edits",
"status_code": 500,
"error_detail": ["Traceback (most recent call last): ..."],
"execution_time_ms": 12
}
CloudWatch Logs Insights: ready‑to‑use queries¶
These examples assume JSON logs in CloudWatch Logs (default with ECS awslogs/awsfirelens). Adjust the log group and time range.
1) Follow a specific request across request/stream/background¶
fields @timestamp, type, level, path, event, status_code, execution_time_ms
| filter id = "<paste-request-id>"
| sort @timestamp asc
Tip
Copy the request ID from the x-request-id response header or any request log line. Expect one request, optional request_stream entries, and background entries.
2) Find recent errors with context¶
fields @timestamp, level, type, path, status_code, id, error_detail
| filter level in ["error", "critical"]
| sort @timestamp desc
| limit 100
3) High-latency endpoints (P95/P99)¶
fields path, execution_time_ms
| filter type = "request" and ispresent(execution_time_ms)
| stats pct(execution_time_ms, 95) as p95_ms, pct(execution_time_ms, 99) as p99_ms, avg(execution_time_ms) as avg_ms by path
| sort p95_ms desc
AWS service-level logs and metrics¶
Beyond stdapi.ai logs and OTel traces, use AWS-native signals from the underlying AI services to validate provider behavior, monitor throttling/latency, and audit access. Enable only what you need: some options can capture content and increase costs. For full, up-to-date details, refer to the official AWS documentation for more information.
- CloudWatch Metrics: Throughput, latency, throttling, and error rates per service/region.
- CloudTrail: Control-plane auditing of API calls (who did what, when, from where).
- Content/Invocation logging: Optional features that may record inputs/outputs. Use with caution and encryption/retention controls.
- Correlation: Service logs won’t include StdAPI
x-request-id. Correlate by time window, region, model/voice/job identifiers, and volume. Use StdAPImodel_id,voice_id, andexecution_time_msto narrow windows. - AWS Bedrock Invocation logging (optional): Export invocation metadata and, if enabled, content to CloudWatch Logs/S3/Firehose. Treat prompts/completions as sensitive; manage retention and KMS.
Troubleshooting checklist¶
- No logs visible: Ensure you are reading container STDOUT. On ECS/Kubernetes, verify the log driver and retention.
- Missing
request_params: ConfirmLOG_REQUEST_PARAMS=trueand restart after changing environment variables. - No traces: Verify
OTEL_ENABLED=trueand that exporters are configured and reachable. - Correlation missed: Ensure clients read and propagate
x-request-idfor multi‑hop requests.