AI Agent监控体系：从”事后救火”到”事前预警”

大家好，我是老金。

去年，我们的AI Agent服务突然出现异常：

响应时间从平均2秒飙升到30秒
错误率从0.1%飙升到35%
用户投诉电话打爆了客服

但问题是——我们花了40分钟才发现问题。

原因很简单：没有像样的监控告警系统。

那次事故后，我把AI Agent的监控体系作为重点建设内容。

今天这篇文章，我想分享AI Agent监控的最佳实践。

指标	说明	告警阈值
响应时间（P50/P99）	请求处理时间	P99 > 10s
吞吐量（QPS）	每秒处理请求数	下降 > 50%
LLM首字延迟	第一个Token返回时间	> 3s
Token生成速度	每秒生成Token数	< 10 tokens/s

指标	说明	告警阈值
错误率	请求失败比例	> 1%
超时率	请求超时比例	> 0.5%
幻觉检测率	检测到幻觉的比例	> 5%
用户满意度	用户反馈评分	< 4.0/5

指标	说明	告警阈值
Token消耗	每请求Token数	异常增长 > 30%
LLM调用成本	每小时/每天成本	超预算
向量数据库查询	检索次数/延迟	> 预期2倍

指标	说明	告警阈值
任务完成率	Agent成功完成任务的比例	< 90%
工具调用成功率	Function Call成功率	< 95%
人工介入率	需要人工干预的比例	> 10%


┌─────────────────────────────────────────────────────────┐
│                     AI Agent 服务                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐              │
│  │ Agent 1  │  │ Agent 2  │  │ Agent N  │              │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘              │
│       │             │             │                     │
│       └─────────────┼─────────────┘                     │
│                     │                                   │
│              ┌──────▼──────┐                           │
│              │ Metrics SDK │ ← 埋点采集                │
│              └──────┬──────┘                           │
└─────────────────────┼───────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
  ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
  │ Prometheus│ │   Loki    │ │Jaeger/    │
  │ (指标)    │ │ (日志)    │ │Zipkin(链路)│
  └─────┬─────┘ └─────┬─────┘ └─────┬─────┘
        │             │             │
        └─────────────┼─────────────┘
                      │
              ┌───────▼───────┐
              │   Grafana     │ ← 可视化
              │  (Dashboard)  │
              └───────┬───────┘
                      │
              ┌───────▼───────┐
              │ AlertManager  │ ← 告警
              └───────────────┘

from prometheus_client import Counter, Histogram, Gauge
import time
定义指标
REQUEST_COUNT = Counter(
'agent_request_total',
'Total agent requests',
['agent_name', 'status']
)
REQUEST_LATENCY = Histogram(
'agent_request_latency_seconds',
'Request latency in seconds',
['agent_name'],
buckets=[0.1, 0.5, 1, 2, 5, 10, 30, 60]
)
TOKEN_USAGE = Counter(
'agent_token_usage_total',
'Total tokens used',
['agent_name', 'model', 'type']  # type: input/output
)
HALLUCINATION_COUNT = Counter(
'agent_hallucination_total',
'Hallucination detections',
['agent_name', 'severity']
)
class AgentMonitor:
def init(self, agent_name):
self.agent_name = agent_name
def track_request(self, func):
    """装饰器：跟踪请求"""
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        status = "success"

        try:
            result = await func(*args, **kwargs)

            # 记录Token使用
            if hasattr(result, 'token_usage'):
                TOKEN_USAGE.labels(
                    agent_name=self.agent_name,
                    model=result.model,
                    type='input'
                ).inc(result.token_usage.input)
                TOKEN_USAGE.labels(
                    agent_name=self.agent_name,
                    model=result.model,
                    type='output'
                ).inc(result.token_usage.output)

            return result

        except Exception as e:
            status = "error"
            raise

        finally:
            # 记录请求
            REQUEST_COUNT.labels(
                agent_name=self.agent_name,
                status=status
            ).inc()

            # 记录延迟
            REQUEST_LATENCY.labels(
                agent_name=self.agent_name
            ).observe(time.time() - start_time)

    return wrapper
使用示例
monitor = AgentMonitor("customer_service_agent")
@monitor.track_request

async def process_query(user_input):
处理请求...
return result

四、 关键监控Dashboard
4.1 实时概览面板
┌─────────────────────────────────────────────────────────────┐
│                   AI Agent 实时监控                          │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ QPS      │  │ 错误率   │  │ P99延迟  │  │ Token消耗│    │
│  │ 1,234    │  │ 0.3%     │  │ 2.3s     │  │ 5.2M/h   │    │
│  │ ↑ 12%    │  │ ↓ 0.1%   │  │ ↑ 0.2s   │  │ ↑ 8%     │    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐ │
│  │              请求量趋势 (过去1小时)                      │ │
│  │  ▁▂▃▅▆▇█▇▆▅▃▂▁▁▂▃▅▆▇█▇▆▅▃▂▁▁▂▃▅▆▇█▇▆▅▃▂▁         │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌────────────────────┐  ┌────────────────────┐           │
│  │   各Agent请求分布   │  │   错误类型分布      │           │
│  │   ┌──────┐         │  │   ┌──────┐         │           │
│  │   │Agent1│ 45%     │  │   │超时  │ 40%     │           │
│  │   │Agent2│ 30%     │  │   │LLM   │ 35%     │           │
│  │   │Agent3│ 25%     │  │   │工具  │ 25%     │           │
│  │   └──────┘         │  │   └──────┘         │           │
│  └────────────────────┘  └────────────────────┘           │
└─────────────────────────────────────────────────────────────┘

4.2 成本分析面板
┌─────────────────────────────────────────────────────────────┐
│                   成本分析 Dashboard                         │
├─────────────────────────────────────────────────────────────┤
│  今日成本: $127.50    │    本月累计: $3,842.30             │
│  预算使用: 42.5%      │    预计月底: $9,000                │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐ │
│  │              每日成本趋势                               │ │
│  │  $200 ┤                                                │ │
│  │  $150 ┤    ▂▃▅▆▇█▇▆▅▃▂                               │ │
│  │  $100 ┤                                                │ │
│  │   $50 ┤                                                │ │
│  │       └─────────────────────────────────────────────   │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  成本构成：                                           │   │
│  │  • GPT-4 调用: $89.50 (70%)                          │   │
│  │  • Claude 调用: $28.00 (22%)                         │   │
│  │  • 向量检索: $10.00 (8%)                             │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

五、 告警规则设计
5.1 告警规则示例
# Prometheus告警规则
groups:


name: agent_alerts
rules:
高错误率

alert: HighErrorRate
expr: |
sum(rate(agent_request_total{status="error"}[5m])) 
/ sum(rate(agent_request_total[5m])) > 0.01
for: 2m
labels:
severity: critical
annotations:
summary: "Agent错误率过高"
description: "错误率 {{ $value | humanizePercentage }}，超过1%阈值"

响应时间过长

alert: HighLatency
expr: |
histogram_quantile(0.99, rate(agent_request_latency_seconds_bucket[5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "Agent响应时间过长"
description: "P99延迟 {{ $value }}秒"

Token消耗异常

alert: AbnormalTokenUsage
expr: |
rate(agent_token_usage_total[1h]) 
> 2 * rate(agent_token_usage_total[1h] offset 1d)
for: 10m
labels:
severity: warning
annotations:
summary: "Token消耗异常增长"
description: "当前消耗是昨天同期的{{ $value }}倍"

幻觉检测过多

alert: HighHallucinationRate
expr: |
sum(rate(agent_hallucination_total[10m])) 
/ sum(rate(agent_request_total[10m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "幻觉检测率过高"
description: "检测到{{ $value | humanizePercentage }}的回答可能存在幻觉"





5.2 告警分级



级别
说明
响应要求
通知方式




P0 紧急
服务不可用
5分钟内响应
电话 + 短信 + IM


P1 严重
功能严重受损
15分钟内响应
短信 + IM


P2 警告
需要关注
1小时内响应
IM + 邮件


P3 提示
信息通知
24小时内处理
邮件



六、 日志与链路追踪
6.1 结构化日志
import structlog
logger = structlog.get_logger()
async def process_request(request_id, user_input):
log = logger.bind(request_id=request_id)
log.info("request_started", user_input_length=len(user_input))

try:
    # LLM调用
    log.info("llm_call_started", model="gpt-4")
    response = await llm.call(user_input)
    log.info("llm_call_completed", 
             tokens_used=response.usage.total_tokens,
             latency=response.latency)

    # 工具调用
    if response.tool_calls:
        for tool_call in response.tool_calls:
            log.info("tool_call_started", 
                     tool_name=tool_call.name,
                     arguments=tool_call.arguments)
            # ...

    log.info("request_completed", response_length=len(response.content))
    return response

except Exception as e:
    log.error("request_failed", 
              error_type=type(e).__name__,
              error_message=str(e))
    raise

6.2 链路追踪
from opentelemetry import trace
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
tracer = trace.get_tracer(name)
async def process_request(request):
with tracer.start_as_current_span("process_request") as span:
span.set_attribute("user_id", request.user_id)
span.set_attribute("query", request.query[:100])
    # LLM调用
    with tracer.start_as_current_span("llm_call") as llm_span:
        llm_span.set_attribute("model", "gpt-4")
        response = await llm.call(request.query)
        llm_span.set_attribute("tokens", response.usage.total_tokens)

    # 工具调用
    if response.tool_calls:
        with tracer.start_as_current_span("tool_calls") as tools_span:
            for tool_call in response.tool_calls:
                with tracer.start_as_current_span(f"tool_{tool_call.name}"):
                    result = await execute_tool(tool_call)

    return response

七、 写在最后
监控不是成本，是投资。
好的监控体系能让你：

快速发现问题：从40分钟降到30秒
快速定位原因：链路追踪让问题一目了然
预防问题发生：趋势预测提前预警

核心原则：

监控先行：上线前先把监控搭好
分级告警：不要让告警变成噪音
可追溯：每个请求都能追溯全链路

我是技术老金，我们下期见！

📌 往期精彩回顾

AI Agent状态管理踩坑：那些凌晨3点的并发问题
如何让AI Agent”少花钱多办事”？架构师才知道的4个成本优化秘诀

级别	说明	响应要求	通知方式
P0 紧急	服务不可用	5分钟内响应	电话 + 短信 + IM
P1 严重	功能严重受损	15分钟内响应	短信 + IM
P2 警告	需要关注	1小时内响应	IM + 邮件
P3 提示	信息通知	24小时内处理	邮件

技术老金同步发布至微信公众号【技术老金】，欢迎关注

我们为何放弃了CrewAI：一个关于AI框架选型的深度复盘 - 技术老金 […] AI写不出“干净架构”：从代码生成到软件匠艺的进阶之路 […]

我们为何放弃了CrewAI：一个关于AI框架选型的深度复盘 - 技术老金 […] 和AI结对编程第一天，我踩了3个大坑，差点项目失败！复盘4条生存法则 […]

你的AI“实习生”为何总是带不动？我们犯了3个“管理”上的致命错误 - 技术老金 […] AI代码生成：是解放生产力的“银弹”，还是架构师的“新噩梦”？当AI能生成“正确”的代码，我们这些35岁+的老程序员，到底“贵”在哪？AI与代码品味：当机器开始“创作”，我们程序员的价值还剩多少？ […]

技术老金文章已同步发布到微信公众号【技术老金】，欢迎关注

技术老金文章已同步发布到微信公众号【技术老金】，欢迎关注。

技术老金同步发布至微信公众号【技术老金】，欢迎关注，有什么问题可以公众号私信

AI Agent监控体系：从”事后救火”到”事前预警”

一、开场：一个让我重视监控的事故

二、 AI Agent监控的核心指标

2.1 性能指标

2.2 质量指标

2.3 成本指标

2.4 业务指标

三、监控系统架构

3.1 架构图

3.2 核心埋点

定义指标

使用示例

处理请求...

四、关键监控Dashboard

4.1 实时概览面板

4.2 成本分析面板

五、告警规则设计

5.1 告警规则示例

高错误率

响应时间过长

Token消耗异常

幻觉检测过多

5.2 告警分级

六、日志与链路追踪

6.1 结构化日志

6.2 链路追踪

七、写在最后

OpenClaw实战案例：5个真实场景教你打造私人AI助理（2026完整教程）

OpenClaw多Agent协作教程：MCP协议配置与实战应用（2026）

OpenClaw性能优化指南：提升响应速度与系统稳定性的最佳实践（2026）

如何让AI Agent”少花钱多办事”？架构师才知道的4个成本优化秘诀

我是如何被OpenClaw部署折磨了整整3天的？（附避坑指南）

AI Agent生产部署指南：从灰度发布到快速回滚

AI Agent监控体系：从”事后救火”到”事前预警”

AI幻觉怎么治？6招让你的AI不再”一本正经胡说八道”

Prompt安全防护实战：如何防止你的AI被”忽悠”

AI Agent用户体验设计：从”能用”到”好用”的10个细节

归档

分类

AI Agent监控体系：从”事后救火”到”事前预警”

一、 开场：一个让我重视监控的事故

二、 AI Agent监控的核心指标

2.1 性能指标

2.2 质量指标

2.3 成本指标

2.4 业务指标

三、 监控系统架构

3.1 架构图

3.2 核心埋点

定义指标

使用示例

处理请求...

四、 关键监控Dashboard

4.1 实时概览面板

4.2 成本分析面板

五、 告警规则设计

5.1 告警规则示例

高错误率

响应时间过长

Token消耗异常

幻觉检测过多

5.2 告警分级

六、 日志与链路追踪

6.1 结构化日志

6.2 链路追踪

七、 写在最后

归档

分类

一、开场：一个让我重视监控的事故

三、监控系统架构

四、关键监控Dashboard

五、告警规则设计

六、日志与链路追踪

七、写在最后