LiteLLM Proxy 的完整配置层级
litellm_settings:核心运行时与日志设置general_settings:通用代理与安全设置router_settings:路由、重试与降级策略model_list:模型定义与成本信息environment_variables:环境变量映射callback_settings:回调服务特定配置声明模型及其元数据
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
rpm: 60
- model_name: claude-3-5-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20240620
api_key: os.environ/ANTHROPIC_API_KEY
model_list:
- model_name: azure-gpt4
litellm_params:
model: azure/gpt-4
api_base: https://your-resource.openai.azure.com/
api_key: os.environ/AZURE_API_KEY
api_version: 2024-02-15-preview
高级路由、重试与冷却机制
router_settings:
routing_strategy: usage-based-routing-v2
enable_pre_call_checks: true
cooldown_time: 30
disable_cooldowns: false
retry_policy:
AuthenticationErrorRetries: 3
TimeoutErrorRetries: 3
fallbacks: [{"gpt-3.5-turbo": ["gpt-4"]}]
model_group_alias: {"gpt-4": "gpt-3.5-turbo"}
router_settings:
load_balancing:
strategy: "weighted_round_robin"
weights:
- model_name: gpt-4
weight: 0.7
- model_name: claude-3-opus
weight: 0.3
Redis、S3、Qdrant 语义缓存
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
password: your_password
namespace: litellm.caching.caching
ttl: 600
# 语义缓存
qdrant_semantic_cache_embedding_model: openai-embedding
similarity_threshold: 0.8
cache_params: type: s3 s3_bucket_name: cache-bucket-litellm s3_region_name: us-west-2 s3_aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID s3_aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
JWT、KMS 与访问控制
general_settings: master_key: sk-xxx enable_jwt_auth: false key_management_system: google_kms use_google_kms: true disable_master_key_return: false allowed_ips: ["192.168.1.0/24"] allowed_routes: ["/v1/chat/completions", "/v1/embeddings"] admin_only_routes: ["/model/list", "/key/list"]
general_settings: database_url: "postgresql://user:pass@localhost:5432/litellm" max_parallel_requests: 50 global_max_parallel_requests: 200
监控、追踪与调试配置
litellm_settings: success_callback: ["langfuse", "openmeter"] failure_callback: ["sentry"] callbacks: ["otel"] service_callbacks: ["datadog", "prometheus"] turn_off_message_logging: false redact_user_api_key_info: false set_verbose: false json_logs: true
litellm_settings: langfuse_default_tags: ["cache_hit", "user_api_key_alias"] langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY langfuse_secret_key: os.environ/LANGFUSE_SECRET_KEY
必须设置的核心 API 密钥
OPENAI_API_KEY:OpenAI 服务ANTHROPIC_API_KEY:Anthropic 服务AZURE_API_KEY / AZURE_API_VERSION:Azure OpenAIGOOGLE_API_KEY:Google AI / GeminiAWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY:AWS 服务GROQ_API_KEY:Groq 高速推理COHERE_API_KEY:Cohere 服务REPLICATE_API_KEY:Replicate 服务HUGGINGFACE_API_KEY:HuggingFace 服务LANGFUSE_SECRET_KEY:Langfuse 追踪DD_API_KEY:Datadog 监控降级、容错与网络配置
litellm_settings:
default_fallbacks: ["claude-opus"]
content_policy_fallbacks: [{"gpt-3.5-turbo": ["claude-opus"]}]
context_window_fallbacks: [{"gpt-3.5-turbo": ["gpt-4"]}]
request_timeout: 60
force_ipv4: false
num_retries: 3
general_settings: background_health_checks: true health_check_interval: 300 disable_health_checks: false
容器化部署与扩展
version: "3.9"
services:
litellm:
image: ghcr.io/berriai/litellm:main-stable
ports:
- "4000:4000"
volumes:
- ./config.yaml:/app/config.yaml
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/litellm
- LITELLM_MASTER_KEY=sk-1234
command: ["--config", "/app/config.yaml", "--port", "4000", "--num_workers", "8"]
depends_on:
- db
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm-proxy
spec:
replicas: 3
selector:
matchLabels:
app: litellm
template:
spec:
containers:
- name: litellm
image: ghcr.io/berriai/litellm:main-stable
ports:
- containerPort: 4000
主流工具与服务集成
export ANTHROPIC_BASE_URL="http://0.0.0.0:4000" export ANTHROPIC_AUTH_TOKEN="$LITELLM_MASTER_KEY"
{
"github.copilot.advanced": {
"debug.overrideProxyUrl": "http://localhost:4000",
"debug.testOverrideProxyUrl": "http://localhost:4000"
}
}
config_list=[{
"model": "my-fake-model",
"api_base": "http://localhost:4000",
"api_type": "open_ai",
"api_key": "NULL"
}]
并发控制与性能调优
max_parallel_requests:每部署最大并发global_max_parallel_requests:全局最大并发num_workers:工作进程数量request_timeout:请求超时时间background_health_checks:后台健康检查disable_retry_on_max_parallel_request_limit_error:禁用并发超限重试general_settings: max_parallel_requests: 100 global_max_parallel_requests: 500 proxy_batch_write_at: 10 proxy_batch_polling_interval: 3600
Slack、Email 与企业告警
general_settings:
alerting: ["slack", "email"]
alerting_threshold: 300
alerting_args:
slack_webhook_url: "https://hooks.slack.com/services/xxx"
alert_channels:
budget_alerts: "#budget-alerts"
system_alerts: "#system-alerts"
spend_report_frequency: "1d"
alert_to_webhook_url:
budget_alert: "https://webhook.site/xxx"
system_alert: "https://webhook.site/yyy"
general_settings: budget_local: 100.0 budget_duration: 30d budget_alert_buffer: 0.8
PostgreSQL、连接池与优雅降级
general_settings: database_url: "postgresql://user:pass@localhost:5432/litellm" database_connection_pool_limit: 100 database_connection_timeout: 60 disable_spend_logs: false store_prompts_in_spend_logs: true allow_requests_on_db_unavailable: true maximum_spend_logs_retention_period: 30d
cache_params:
type: redis
redis_startup_nodes:
- host: "127.0.0.1"
port: "7001"
service_name: "mymaster"
sentinel_nodes:
- ["localhost", 26379]
高级控制与扩展功能
allowed_routes:限制可访问的 API 路由enforce_user_param:强制所有请求包含 'user' 参数pass_through_endpoints:直通端点配置store_model_in_db:在数据库中存储模型信息ui_access_mode: "admin_only":限制 UI 仅管理员访问enable_oauth2_auth:启用 OAuth2 认证custom_auth:自定义认证逻辑key_generation_settings:控制密钥生成权限general_settings:
enable_oauth2_auth: true
oauth2_providers:
- name: "google"
client_id: "your-google-client-id"
client_secret: "your-google-client-secret"
- name: "github"
client_id: "your-github-client-id"
client_secret: "your-github-client-secret"
Responses API、MCP 等新功能
model_list:
- model_name: openai/o1-pro
litellm_params:
model: openai/o1-pro
api_key: os.environ/OPENAI_API_KEY
reasoning_effort: "medium"
litellm_settings:
mcp_aliases:
github: github_mcp_server
zapier: zapier_mcp_server
mcp_servers:
- name: github_mcp_server
command: "npx"
args: ["@modelcontextprotocol/server-github"]
litellm_settings:
guardrails:
- prompt_injection:
callbacks: [lakera_prompt_injection]
default_on: true
- pii_masking:
callbacks: [presidio]
default_on: false
- hide_secrets_guard:
callbacks: [hide_secrets]
default_on: false
代理服务器管理与监控
litellm --config config.yaml:启动代理服务器litellm --port 4000:指定端口启动litellm --num_workers 8:设置工作进程数litellm --debug:启用调试模式# 设置环境变量 export LITELLM_PROXY_URL=http://localhost:4000 export LITELLM_PROXY_API_KEY=sk-your-key # 创建虚拟密钥 litellm create-key --max_budget 10.00 --duration 30d # 查看使用统计 litellm spend --days 7
curl http://localhost:4000/health # 模型列表 curl http://localhost:4000/model/list \ -H "Authorization: Bearer sk-your-master-key"
常见错误与解决方案
# 增加超时时间 litellm_settings: request_timeout: 120 num_retries: 5
# 调整工作进程数 --num_workers 4 # 启用连接池 database_connection_pool_limit: 50
# 配置 RPM 限制
model_list:
- model_name: gpt-4
litellm_params:
model: gpt-4
rpm: 60 # 请求每分钟
深入学习与配置参考