Skip to main content

GPT-5 Thinking Mode

This page only documents GPT reasoning behavior that was revalidated against Crazyrouter production on 2026-03-22. The current primary example uses:
  • gpt-5.4
  • POST /v1/responses
Claude does not currently support POST /v1/responses. If you are integrating Claude, use POST /v1/messages or POST /v1/chat/completions instead of the request shape on this page.
POST /v1/responses

Basic usage

curl https://crazyrouter.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-5.4",
    "input": "Which is larger, 9.11 or 9.9? Explain briefly.",
    "reasoning": {
      "effort": "high",
      "summary": "detailed"
    }
  }'

Verified response shape

Verified output.type values:
["reasoning", "message"]
Typical reasoning item shape:
{
  "id": "rs_xxx",
  "type": "reasoning",
  "encrypted_content": "...",
  "summary": [
    {
      "type": "summary_text",
      "text": "..."
    }
  ]
}
The final answer is returned through the message item:
{
  "type": "message",
  "content": [
    {
      "type": "output_text",
      "text": "..."
    }
  ]
}

reasoning parameter

FieldTypeDescription
effortstringReasoning intensity. Verified values include low, medium, and high
summarystringSummary granularity. concise and detailed were revalidated

Practical rule

  • If you only want stronger reasoning, send effort
  • If you need an inspectable reasoning summary, send both effort and summary
In the current recheck:
  • with only effort, the reasoning item existed but summary could be an empty array
  • with summary: "detailed", summary_text was returned reliably

Extract the thinking summary

Python
response = client.responses.create(
    model="gpt-5.4",
    input="Design a high-concurrency message queue architecture.",
    reasoning={
        "effort": "high",
        "summary": "detailed"
    }
)

for item in response.output:
    if item.type == "reasoning":
        for part in item.summary:
            if part.type == "summary_text":
                print("Thinking summary:", part.text)
    elif item.type == "message":
        for content in item.content:
            if content.type == "output_text":
                print("Final answer:", content.text)

Streaming thinking

The following Responses SSE event names were revalidated in production:
  • response.reasoning_summary_part.added
  • response.reasoning_summary_text.delta
  • response.reasoning_summary_text.done
  • response.output_text.delta
  • response.output_text.done
  • response.completed
Example:
Python
stream = client.responses.create(
    model="gpt-5.4",
    input="Explain briefly why 9.9 is larger than 9.11.",
    reasoning={
        "effort": "high",
        "summary": "detailed"
    },
    stream=True
)

for event in stream:
    if event.type == "response.reasoning_summary_text.delta":
        print(f"[Thinking summary] {event.delta}", end="")
    elif event.type == "response.output_text.delta":
        print(event.delta, end="")

Current recommendation

  • If you need a visible reasoning field, prefer gpt-5.4 with the Responses API
  • Do not treat gpt-5.4 Chat Completions reasoning_content as the current primary contract
  • If you only care about the final answer, Chat Completions with reasoning_effort is still fine
Reasoning mode increases both latency and token usage. Higher effort generally costs more.