🏢 Elevator Bench

Comparing elevator simulator implementations across different AI models

claude-opus-4.5
Tool: claude-code | Mode: standard
Provider: anthropic-default
Run Time: 00:04:57
Cost: 1.03
claude-sonnet-4.5
Tool: claude-code | Mode: standard
Provider: anthropic-default
Run Time: 00:08:09
Cost: 0.45
claude-opus-4.5
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:05:45
Cost: 1x
claude-sonnet-4.5
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:13:30
Cost: 2x
gemini-2.5-pro
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:03:15
Cost: 1x
gemini-3-flash
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:01:36
Cost: 0.33x
glm-4.6
Tool: copilot | Mode: agent
Provider: openrouter-default
Run Time: 00:02:15
gpt-5-codex
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:10:00
Cost: 2x
gpt-5.1-codex
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:05:05
Cost: 1x
grok-code-fast-1
Tool: copilot | Mode: agent
Provider: copilot-default
Run Time: 00:01:50
Cost: 0x
kimi-k2-thinking
Tool: copilot | Mode: agent
Provider: openrouter-default
Run Time: 00:06:30
Cost: 0
gemini-2.5-pro
Tool: gemini-cli | Mode: standard
Provider: gemini-default
Run Time: 00:04:53
gemini-3-pro
Tool: gemini-cli | Mode: standard
Provider: gemini-default
Run Time: 00:05:32
glm-4.6
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:20:00
Cost: 0.18
glm-4.7
Tool: kilo | Mode: orchestrator
Provider: deepinfra
Run Time: 00:17:25
Cost: 0.12
glm-5
Tool: kilo | Mode: orchestrator
Provider: kilo-default
Run Time: 00:04:15
Cost: 0.21
kimi-k2-0905
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:14:00
Cost: 0.14
kimi-k2-thinking
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:07:25
Cost: 0.87
minimax-m2
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:05:45
Cost: 0.19
minimax-m2.1
Tool: kilo | Mode: orchestrator
Provider: kilo-gateway
Run Time: 00:06:48
Cost: 0.02
qwen3-coder-480b
Tool: kilo | Mode: orchestrator
Provider: openrouter-default
Run Time: 00:13:30
Cost: 0.13
big-pickle
Tool: opencode | Mode: build
Provider: zen
Run Time: 00:01:45
Cost: 0.00
stealth-2602
Tool: opencode | Mode: build
Provider: stealth
Run Time: 00:02:15
Cost: 0.14