Streaming

OpenRouter API 允许从任何模型流式传输响应原文链接：https://openrouter.ai/docs/api/reference/streaming。这对于构建聊天界面或其他 UI 应在模型生成响应时更新的应用程序很有用。

要启用流式传输，你可以在请求中将 stream 参数设置为 true。然后模型会将响应分块流式传输到客户端，而不是一次返回整个响应。

以下是如何流式传输响应并处理它的示例：

import { OpenRouter } from '@openrouter/sdk';

const openRouter = new OpenRouter({
  apiKey: '{{API_KEY_REF}}',
});

const question = 'How would you build the tallest building ever?';

const stream = await openRouter.chat.send({
  model: '{{MODEL}}',
  messages: [{ role: 'user', content: question }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices?.[0]?.delta?.content;
  if (content) {
    console.log(content);
  }

  // Final chunk includes usage stats
  if (chunk.usage) {
    console.log('Usage:', chunk.usage);
  }
}

附加信息

对于 SSE (Server-Sent Events) 流，OpenRouter 偶尔会发送注释以防止连接超时。这些注释看起来像：

: OPENROUTER PROCESSING

根据 SSE 规范，可以安全地忽略注释有效负载。但是，你可以根据需要利用它来改善 UX，例如通过显示动态加载指示器。

生成 ID 在所有 endpoints（chat completions、completions、responses 和 messages）的 X-Generation-Id 响应头中返回，这对于调试和关联请求很有用。

某些 SSE 客户端实现可能无法根据规范解析有效负载，这会在你 JSON.stringify 非 JSON 有效负载时导致未捕获的错误。我们推荐以下客户端：

流取消

可以通过中止连接来取消流式请求。对于支持的 providers，这会立即停止模型处理和计费。

Provider 支持：

支持的：

OpenAI、Azure、Anthropic
Fireworks、Mancer、Recursal
AnyScale、Lepton、OctoAI
Novita、DeepInfra、Together
Cohere、Hyperbolic、Infermatic
Avian、XAI、Cloudflare
SFCompute、Nineteen、Liquid
Friendli、Chutes、DeepSeek

暂不支持：

AWS Bedrock、Groq、Modal
Google、Google AI Studio、Minimax
HuggingFace、Replicate、Perplexity
Mistral、AI21、Featherless
Lynn、Lambda、Reflection
SambaNova、Inflection、ZeroOneAI
AionLabs、Alibaba、Nebius
Kluster、Targon、InferenceNet

流式传输期间的错误处理

OpenRouter 根据错误在流式传输过程中发生的时间以不同方式处理错误：

发送任何 tokens 之前的错误

如果在任何 tokens 被流式传输到客户端之前发生错误，OpenRouter 会返回带有适当 HTTP 状态代码的标准 JSON 错误响应：

{
  "error": {
    "code": 400,
    "message": "Invalid model specified"
  }
}

常见 HTTP 状态代码包括：

400：Bad Request（无效参数）
401：Unauthorized（无效 API key）
402：Payment Required（积分不足）
429：Too Many Requests（速率受限）
502：Bad Gateway（provider 错误）
503：Service Unavailable（没有可用的 providers）

发送 tokens 后发生错误（Mid-Stream）

如果在某些 tokens 已经流式传输到客户端后发生错误，OpenRouter 无法更改 HTTP 状态代码（已经是 200 OK）。相反，错误作为 Server-Sent Event (SSE) 发送，具有统一结构：

data: {"id":"cmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"openai/gpt-4o","provider":"openai","error":{"code":"server_error","message":"Provider disconnected unexpectedly"},"choices":[{"index":0,"delta":{"content":""},"finish_reason":"error"}]}

Mid-stream 错误的关键特征：

错误出现在顶层，与标准响应字段（id、object、created 等）并列
包含 choices 数组和 finish_reason: "error" 以正确终止流
HTTP 状态保持 200 OK，因为 headers 已经发送
流在此统一错误事件后终止

代码示例

以下是如何在流式实现中正确处理两种类型错误的示例：

import { OpenRouter } from '@openrouter/sdk';

const openRouter = new OpenRouter({
  apiKey: '{{API_KEY_REF}}',
});

async function streamWithErrorHandling(prompt: string) {
  try {
    const stream = await openRouter.chat.send({
      model: '{{MODEL}}',
      messages: [{ role: 'user', content: prompt }],
      stream: true,
    });

    for await (const chunk of stream) {
      // Check for errors in chunk
      if ('error' in chunk) {
        console.error(`Stream error: ${chunk.error.message}`);
        if (chunk.choices?.[0]?.finish_reason === 'error') {
          console.log('Stream terminated due to error');
        }
        return;
      }

      // Process normal content
      const content = chunk.choices?.[0]?.delta?.content;
      if (content) {
        console.log(content);
      }
    }
  } catch (error) {
    // Handle pre-stream errors
    console.error(`Error: ${error.message}`);
  }
}

API 特定行为

不同的 API endpoints 可能会以略有不同的方式处理流式错误：

OpenAI Chat Completions API：如果没有处理 chunks，则直接返回 ErrorResponse；如果处理了一些 chunks，则在响应中包含错误信息
OpenAI Responses API：可能会将某些错误代码（如 context_length_exceeded）转换为成功响应和 finish_reason: "length" 而不是将它们视为错误