🏢 Elevator Bench

Comparing elevator simulator implementations across different AI models

claude-opus-4.5
Tool: claude-code | Mode: standard
Provider: anthropic-default
Run Time: 00:04:57
Cost: 1.03
claude-sonnet-4.5
Tool: claude-code | Mode: standard
Provider: anthropic-default
Run Time: 00:08:09
Cost: 0.45
claude-opus-4.5
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:05:45
Cost: 1x
claude-opus-4.6
Tool: copilot | Mode: standard
Provider: copilot-default
Cost: 3x
claude-sonnet-4.5
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:13:30
Cost: 2x
gemini-2.5-pro
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:03:15
Cost: 1x
gemini-3-flash
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:01:36
Cost: 0.33x
glm-4.6
Tool: copilot | Mode: agent
Provider: openrouter-default
Run Time: 00:02:15
gpt-5-codex
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:10:00
Cost: 2x
gpt-5.1-codex
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:05:05
Cost: 1x
gpt-5.3-codex
Tool: copilot | Mode: standard
Provider: copilot-default
Cost: 1x
grok-code-fast-1
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:01:50
Cost: 0x
kimi-k2-thinking
Tool: copilot | Mode: agent
Provider: openrouter-default
Run Time: 00:06:30
Cost: 0
gemini-2.5-pro
Tool: gemini-cli | Mode: standard
Provider: gemini-default
Run Time: 00:04:53
gemini-3-pro
Tool: gemini-cli | Mode: standard
Provider: gemini-default
Run Time: 00:05:32
glm-4.6
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:20:00
Cost: 0.18
glm-4.7
Tool: kilo | Mode: orchestrator
Provider: deepinfra
Run Time: 00:17:25
Cost: 0.12
glm-5
Tool: kilo | Mode: orchestrator
Provider: kilo-default
Run Time: 00:04:15
Cost: 0.21
kimi-k2-0905
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:14:00
Cost: 0.14
kimi-k2-thinking
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:07:25
Cost: 0.87
minimax-m2
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:05:45
Cost: 0.19
minimax-m2.1
Tool: kilo | Mode: orchestrator
Provider: kilo-gateway
Run Time: 00:06:48
Cost: 0.02
qwen_coder_opus
Tool: kilo | Mode: standard
Provider: openrouter-default
qwen3-coder-480b
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:13:30
Cost: 0.13
qwen3-coder-next
Tool: kilo | Mode: code
Provider: llama.cpp
Cost: 0.0
Qwen3.5-35B-A3B
Tool: kilo | Mode: standard
Provider: llama.cpp
Run Time: 00:02:15
Cost: 0.14
big-pickle
Tool: opencode | Mode: build
Provider: zen
Run Time: 00:01:45
Cost: 0.00
stealth-2602
Tool: opencode | Mode: build
Provider: stealth
Run Time: 00:02:15
Cost: 0.14
claude-haiku-4.5
Tool: pi | Mode: standard
Provider: github
Run Time: 00:02:15
Cost: 0.14
claude-opus-4.6
Tool: pi | Mode: standard
Provider: github
Run Time: 00:02:15
Cost: 0.14
gemini-3-flash-preview
Tool: pi | Mode: standard
Provider: github
Run Time: 00:02:15
Cost: 0.14
gemini-3.1-pro-preview
Tool: pi | Mode: standard
Provider: github
Run Time: 00:02:15
Cost: 0.14
gpt-5.3-codex:high
Tool: pi | Mode: standard
Provider: github
Run Time: 00:02:15
Cost: 0.14
gpt-5.3-codex:low
Tool: pi | Mode: standard
Provider: github
Run Time: 00:02:15
Cost: 0.14
gpt-5.4:high
Tool: pi | Mode: standard
Provider: github
Run Time: 00:02:15
Cost: 0.14