Agent Framework：性能优化-深圳市維司達科技有限公司

概述

在开发 AI 代理应用时，性能优化是确保应用能够高效运行、提供良好用户体验的关键。本文将介绍 AI 代理应用中的性能优化关键点、实用技巧和测试方法。

为什么性能优化很重要？

想象一下，如果你的 AI 客服助手每次回答问题都需要等待 30 秒，用户会有什么感受？性能优化就像给你的代理装上"涡轮增压器"，让它更快、更高效地工作。

性能问题的常见表现

响应时间过长：用户等待时间超过 5 秒
资源消耗过高：CPU、内存占用过大
并发能力不足：无法同时处理多个请求
成本过高：API 调用费用超出预算

性能优化的关键点

1. 减少 API 调用次数

每次调用 AI 模型都需要时间和费用。减少不必要的调用是最直接的优化方法。

优化技巧

❌ 不好的做法：每次都重新调用

// 每次用户输入都创建新的代理和对话 public async Task<string> ProcessMessage(string userMessage) { var agent = new ChatCompletionAgent(/* ... */); var thread = new AgentThread(); await thread.AddUserMessageAsync(userMessage); var response = await agent.InvokeAsync(thread); return response.Content; }

✅ 好的做法：复用代理和对话线程

// 复用代理实例和对话线程 private readonly ChatCompletionAgent _agent; private readonly Dictionary<string, AgentThread> _userThreads; public async Task<string> ProcessMessage(string userId, string userMessage) { // 获取或创建用户的对话线程 if (!_userThreads.TryGetValue(userId, out var thread)) { thread = new AgentThread(); _userThreads[userId] = thread; } await thread.AddUserMessageAsync(userMessage); var response = await _agent.InvokeAsync(thread); return response.Content; }

性能提升：减少 50% 的初始化开销

2. 使用缓存策略

对于相同或相似的问题，可以使用缓存避免重复调用 AI 模型。

实现简单缓存

using System.Collections.Concurrent; using System.Security.Cryptography; using System.Text; public class AgentResponseCache { private readonly ConcurrentDictionary<string, CacheEntry> _cache = new(); private readonly TimeSpan _expirationTime = TimeSpan.FromMinutes(30); private class CacheEntry { public string Response { get; set; } public DateTime CreatedAt { get; set; } } // 生成缓存键 private string GenerateCacheKey(string message) { using var sha256 = SHA256.Create(); var hash = sha256.ComputeHash(Encoding.UTF8.GetBytes(message.ToLower())); return Convert.ToBase64String(hash); } // 尝试从缓存获取响应 public bool TryGetCachedResponse(string message, out string response) { var key = GenerateCacheKey(message); if (_cache.TryGetValue(key, out var entry)) { // 检查是否过期 if (DateTime.UtcNow - entry.CreatedAt < _expirationTime) { response = entry.Response; return true; } else { // 移除过期条目 _cache.TryRemove(key, out _); } } response = null; return false; } // 添加响应到缓存 public void CacheResponse(string message, string response) { var key = GenerateCacheKey(message); _cache[key] = new CacheEntry { Response = response, CreatedAt = DateTime.UtcNow }; } // 清理过期缓存 public void CleanupExpiredEntries() { var expiredKeys = _cache .Where(kvp => DateTime.UtcNow - kvp.Value.CreatedAt >= _expirationTime) .Select(kvp => kvp.Key) .ToList(); foreach (var key in expiredKeys) { _cache.TryRemove(key, out _); } } }

使用缓存的代理

public class CachedAgent { private readonly ChatCompletionAgent _agent; private readonly AgentResponseCache _cache; public CachedAgent(ChatCompletionAgent agent) { _agent = agent; _cache = new AgentResponseCache(); } public async Task<string> ProcessMessageAsync(AgentThread thread, string message) { // 先检查缓存 if (_cache.TryGetCachedResponse(message, out var cachedResponse)) { Console.WriteLine("✓ 从缓存返回响应"); return cachedResponse; } // 缓存未命中，调用 AI 模型 Console.WriteLine("→ 调用 AI 模型"); await thread.AddUserMessageAsync(message); var response = await _agent.InvokeAsync(thread); var content = response.Content; // 缓存响应 _cache.CacheResponse(message, content); return content; } }

性能提升：缓存命中时响应时间减少 90%

3. 优化提示词（Prompt）长度

提示词越长，处理时间越长，费用也越高。

优化技巧

❌ 冗长的提示词

var instructions = @" 你是一个非常专业的客服助手。你需要帮助用户解决各种各样的问题。 你应该始终保持礼貌和专业。你需要仔细理解用户的问题，然后给出详细的回答。 如果你不知道答案，你应该诚实地告诉用户你不知道。 你应该使用简单易懂的语言，避免使用过于专业的术语。 你应该确保你的回答是准确的、有帮助的。 ...（还有很多重复的内容） ";

✅ 简洁的提示词

var instructions = @" 你是专业的客服助手。 - 礼貌、准确地回答用户问题 - 使用简单易懂的语言 - 不确定时诚实说明 ";

性能提升：减少 30-40% 的 token 消耗

4. 使用流式响应

对于长文本响应，使用流式输出可以让用户更快看到结果。

public async Task StreamResponseAsync(AgentThread thread, string message) { await thread.AddUserMessageAsync(message); Console.Write("AI: "); // 使用流式响应 await foreach (var update in _agent.InvokeStreamingAsync(thread)) { if (update.Content != null) { Console.Write(update.Content); await Task.Delay(10); // 模拟打字效果 } } Console.WriteLine(); }

用户体验提升：用户感知的等待时间减少 70%

5. 并行处理多个请求

当需要处理多个独立的请求时，使用并行处理可以显著提升性能。

public async Task<List<string>> ProcessMultipleQuestionsAsync(List<string> questions) { // 为每个问题创建独立的任务 var tasks = questions.Select(async question => { var thread = new AgentThread(); await thread.AddUserMessageAsync(question); var response = await _agent.InvokeAsync(thread); return response.Content; }); // 并行执行所有任务 var results = await Task.WhenAll(tasks); return results.ToList(); }

性能提升：处理 10 个问题的时间从 50 秒减少到 8 秒

6. 限制对话历史长度

对话历史越长，每次调用的成本越高。合理限制历史长度很重要。

public class OptimizedAgentThread { private readonly List<ChatMessage> _messages = new(); private const int MaxHistoryMessages = 20; // 最多保留 20 条消息 public void AddMessage(ChatMessage message) { _messages.Add(message); // 如果超过限制，移除最旧的消息（保留系统消息） if (_messages.Count > MaxHistoryMessages) { var systemMessages = _messages.Where(m => m.Role == ChatRole.System).ToList(); var recentMessages = _messages .Where(m => m.Role != ChatRole.System) .TakeLast(MaxHistoryMessages - systemMessages.Count) .ToList(); _messages.Clear(); _messages.AddRange(systemMessages); _messages.AddRange(recentMessages); } } public IReadOnlyList<ChatMessage> GetMessages() => _messages.AsReadOnly(); }

7. 选择合适的模型

不同的模型有不同的性能特点和成本。

模型	速度	质量	成本	适用场景
GPT-4	慢	最高	高	复杂推理、创意写作
GPT-4-turbo	中	高	中	平衡性能和质量
GPT-3.5-turbo	快	中	低	简单对话、分类任务

// 根据任务复杂度选择模型 public ChatCompletionAgent CreateAgentForTask(TaskComplexity complexity) { string modelId = complexity switch { TaskComplexity.Simple => "gpt-3.5-turbo", // 快速、低成本 TaskComplexity.Medium => "gpt-4-turbo", // 平衡 TaskComplexity.Complex => "gpt-4", // 高质量 _ => "gpt-3.5-turbo" }; return new ChatCompletionAgent( chatClient: _chatClient, name: "OptimizedAgent", instructions: "你是一个高效的助手", modelId: modelId ); } public enum TaskComplexity { Simple, // 简单任务：问候、简单问答 Medium, // 中等任务：信息检索、总结 Complex // 复杂任务：推理、创意生成 }

性能测试方法

1. 响应时间测试

using System.Diagnostics; public class PerformanceTester { public async Task<PerformanceMetrics> MeasureResponseTimeAsync( Func<Task<string>> agentCall) { var stopwatch = Stopwatch.StartNew(); var response = await agentCall(); stopwatch.Stop(); return new PerformanceMetrics { ResponseTime = stopwatch.Elapsed, ResponseLength = response.Length, TokensPerSecond = response.Length / stopwatch.Elapsed.TotalSeconds }; } } public class PerformanceMetrics { public TimeSpan ResponseTime { get; set; } public int ResponseLength { get; set; } public double TokensPerSecond { get; set; } public override string ToString() { return $"响应时间: {ResponseTime.TotalSeconds:F2}秒, " + $"响应长度: {ResponseLength} 字符, " + $"速度: {TokensPerSecond:F2} 字符/秒"; } }

2. 并发性能测试

public async Task<ConcurrencyTestResult> TestConcurrencyAsync( int concurrentRequests, Func<Task<string>> agentCall) { var stopwatch = Stopwatch.StartNew(); var tasks = new List<Task<string>>(); // 创建并发请求 for (int i = 0; i < concurrentRequests; i++) { tasks.Add(agentCall()); } // 等待所有请求完成 var results = await Task.WhenAll(tasks); stopwatch.Stop(); return new ConcurrencyTestResult { TotalRequests = concurrentRequests, TotalTime = stopwatch.Elapsed, AverageTime = stopwatch.Elapsed.TotalSeconds / concurrentRequests, RequestsPerSecond = concurrentRequests / stopwatch.Elapsed.TotalSeconds }; } public class ConcurrencyTestResult { public int TotalRequests { get; set; } public TimeSpan TotalTime { get; set; } public double AverageTime { get; set; } public double RequestsPerSecond { get; set; } public override string ToString() { return $"总请求数: {TotalRequests}, " + $"总时间: {TotalTime.TotalSeconds:F2}秒, " + $"平均时间: {AverageTime:F2}秒, " + $"吞吐量: {RequestsPerSecond:F2} 请求/秒"; } }

3. 完整的性能测试示例

public class Program { public static async Task Main(string[] args) { // 初始化代理 var chatClient = new AzureOpenAIClient( new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")), new ApiKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")) ).GetChatClient("gpt-35-turbo"); var agent = new ChatCompletionAgent( chatClient: chatClient, name: "PerformanceTestAgent", instructions: "你是一个测试助手" ); var tester = new PerformanceTester(); Console.WriteLine("=== 性能测试开始 ===\n"); // 测试 1: 单次响应时间 Console.WriteLine("测试 1: 单次响应时间"); var thread1 = new AgentThread(); var metrics = await tester.MeasureResponseTimeAsync(async () => { await thread1.AddUserMessageAsync("你好，请介绍一下自己"); var response = await agent.InvokeAsync(thread1); return response.Content; }); Console.WriteLine(metrics); Console.WriteLine(); // 测试 2: 缓存效果 Console.WriteLine("测试 2: 缓存效果对比"); var cachedAgent = new CachedAgent(agent); var thread2 = new AgentThread(); // 第一次调用（无缓存） var metrics1 = await tester.MeasureResponseTimeAsync(async () => { return await cachedAgent.ProcessMessageAsync(thread2, "什么是 AI？"); }); Console.WriteLine($"无缓存: {metrics1}"); // 第二次调用（有缓存） var metrics2 = await tester.MeasureResponseTimeAsync(async () => { return await cachedAgent.ProcessMessageAsync(thread2, "什么是 AI？"); }); Console.WriteLine($"有缓存: {metrics2}"); Console.WriteLine($"性能提升: {(1 - metrics2.ResponseTime.TotalSeconds / metrics1.ResponseTime.TotalSeconds) * 100:F1}%"); Console.WriteLine(); // 测试 3: 并发性能 Console.WriteLine("测试 3: 并发性能"); var concurrencyResult = await tester.TestConcurrencyAsync(10, async () => { var thread = new AgentThread(); await thread.AddUserMessageAsync("你好"); var response = await agent.InvokeAsync(thread); return response.Content; }); Console.WriteLine(concurrencyResult); Console.WriteLine("\n=== 性能测试完成 ==="); } }

性能优化检查清单

在部署应用之前，使用这个清单检查性能优化：

[ ]代理复用：是否复用了代理实例？
[ ]缓存策略：是否对常见问题使用了缓存？
[ ]提示词优化：提示词是否简洁明了？
[ ]流式响应：长文本是否使用了流式输出？
[ ]并行处理：独立任务是否并行执行？
[ ]历史限制：对话历史是否有合理的长度限制？
[ ]模型选择：是否根据任务选择了合适的模型？
[ ]错误重试：是否实现了指数退避的重试机制？
[ ]资源释放：是否正确释放了资源？
[ ]性能监控：是否添加了性能监控？

实际案例：优化前后对比

优化前的代码

// 性能问题：每次都创建新代理，没有缓存，提示词冗长 public class SlowCustomerService { public async Task<string> HandleQuestionAsync(string question) { // 问题 1: 每次都创建新的客户端和代理 var chatClient = new AzureOpenAIClient(/* ... */).GetChatClient("gpt-4"); // 问题 2: 提示词过长 var agent = new ChatCompletionAgent( chatClient: chatClient, name: "CustomerService", instructions: @"你是一个非常专业的客服助手。你需要帮助用户解决各种各样的问题。 你应该始终保持礼貌和专业。你需要仔细理解用户的问题，然后给出详细的回答。 如果你不知道答案，你应该诚实地告诉用户你不知道。 你应该使用简单易懂的语言，避免使用过于专业的术语。 你应该确保你的回答是准确的、有帮助的。" ); // 问题 3: 没有缓存 var thread = new AgentThread(); await thread.AddUserMessageAsync(question); var response = await agent.InvokeAsync(thread); return response.Content; } }

性能指标：

平均响应时间：8.5 秒
每月 API 费用：$450
并发能力：5 请求/秒

优化后的代码

// 优化后：复用代理，使用缓存，简化提示词，选择合适模型 public class FastCustomerService { private readonly ChatCompletionAgent _agent; private readonly AgentResponseCache _cache; public FastCustomerService() { // 优化 1: 复用客户端和代理 var chatClient = new AzureOpenAIClient(/* ... */) .GetChatClient("gpt-3.5-turbo"); // 优化 2: 使用更快的模型 // 优化 3: 简化提示词 _agent = new ChatCompletionAgent( chatClient: chatClient, name: "CustomerService", instructions: "你是专业客服。礼貌、准确地回答问题，使用简单语言。" ); // 优化 4: 添加缓存 _cache = new AgentResponseCache(); } public async Task<string> HandleQuestionAsync(string question) { // 优化 5: 先检查缓存 if (_cache.TryGetCachedResponse(question, out var cachedResponse)) { return cachedResponse; } var thread = new AgentThread(); await thread.AddUserMessageAsync(question); var response = await _agent.InvokeAsync(thread); // 缓存响应 _cache.CacheResponse(question, response.Content); return response.Content; } }

优化后性能指标：

平均响应时间：2.1 秒（提升 75%）
每月 API 费用：$180（节省 60%）
并发能力：25 请求/秒（提升 400%）

小结

性能优化是一个持续的过程，关键要点：

测量优先：先测量，再优化，避免过早优化
找到瓶颈：使用性能测试找出真正的性能瓶颈
逐步优化：一次优化一个点，验证效果
平衡取舍：在性能、成本、质量之间找到平衡
持续监控：部署后持续监控性能指标

记住：最好的优化是避免不必要的工作。在编写代码时就考虑性能，比事后优化要容易得多。

更多AIGC文章

RAG技术全解：从原理到实战的简明指南

更多VibeCoding文章

Agent Framework：性能优化

概述

为什么性能优化很重要？

性能问题的常见表现

性能优化的关键点

1. 减少 API 调用次数

优化技巧

2. 使用缓存策略

实现简单缓存

使用缓存的代理

3. 优化提示词（Prompt）长度

优化技巧

4. 使用流式响应

5. 并行处理多个请求

6. 限制对话历史长度

7. 选择合适的模型

性能测试方法

1. 响应时间测试

2. 并发性能测试

3. 完整的性能测试示例

性能优化检查清单

实际案例：优化前后对比

优化前的代码

优化后的代码

小结

AgentFramework: 安全最佳实践

退化的意思是不是，机器人不知道自己的位置和方向了，一般来说在非退化场景，周围的环境可以给自身一个约束，这个约束是满秩，可以确定自身位置，如果面临退化环境，比如空旷的地带，没有环境反馈约束，就不满秩了，

为什么测试脚本的代码质量至关重要？

2025年信息学奥赛CSP-S2提高组题解

ECharts 安装

学术写作新纪元：解锁书匠策AI期刊论文写作的“五维突破法”