What happens when the same bug-fixing task is sent to Claude, ZAI (GLM), OpenAI Codex, and Google Gemini simultaneously?
This question sparked the AgentForge project. We built a system that connects multiple LLM CLIs with the NATS JetStream message queue to process the same tasks in parallel, and in the process, we made some unexpected discoveries. This article focuses on the comparative experimental findings during the setup phase.
The system’s design and implementation will be covered in Part 2.
List of AIs Tested
The final configuration of 18 operational workers is as follows:
| Family | Model | Notes |
|---|---|---|
| Claude Code | claude-sonnet-4-6 | Main development worker |
| Claude Code | claude-sonnet-4-5 | Previous generation comparison |
| Claude Code | claude-haiku-4-5 | Lightweight & High-speed |
| Claude Code | claude-opus-4-6 | Top-tier |
| Claude Code | claude-opus-4-5 | Previous generation comparison |
| ZAI (GLM) | glm-5.1 | High-tier |
| ZAI (GLM) | glm-4.7 | Mid-tier |
| ZAI (GLM) | glm-4.5-air | Lightweight tier |
| OpenAI Codex | gpt-5.5 | |
| Codex | gpt-5.4 | 1M context |
| Codex | gpt-5.4-mini | 400K context |
| Codex | gpt-5.3-codex | 272K context |
| Google Gemini | gemini-2.5-flash | |
| Gemini | gemini-2.5-pro | High-tier |
| Gemini | gemini-2.5-flash-lite | Lightweight |
The list was much shorter when we first started. It grew as we experimented with which models were available.
Discovery 1: Claude 3.x Series is Already Inaccessible
Those who have used Claude Code for a long time might recall Claude 3.7 Sonnet, 3.5 Sonnet, and 3.5 Haiku. We attempted to add these models as workers.
claude --model claude-3-7-sonnet-20250219 --print "hello"
# → "may not exist or no access"
All three models returned the same error. The Claude 3 series reached its EOL in early 2026, and access via the Claude Code CLI has been blocked. Currently, only the 4.x series is available with a Claude Code subscription.
Conclusion: Claude workers were configured using only the 4.5/4.6 series.
Discovery 2: Limited Model Selection for ChatGPT Account Codex
The OpenAI Codex CLI authenticates with a ChatGPT Plus/Pro account or a separate API key. If authenticated via a ChatGPT account, the accessible models are limited.
codex --model gpt-5.5-pro "fix the bug"
# → "Model gpt-5.5-pro is not supported with ChatGPT account"
codex --model gpt-5.5 "fix the bug"
# → Works normally
Models available with a ChatGPT account:
| Model | Context | Inference Level |
|---|---|---|
| gpt-5.5 | 1M / 1M | High |
| gpt-5.4 | 1M / 1M | Medium |
| gpt-5.4-mini | 400K / 400K | Medium |
| gpt-5.3-codex | 272K / 400K | Medium |
All other models, including gpt-5.5-pro, returned a “not supported with ChatGPT account” error. More models are available with an API key, but that’s a different approach.
Discovery 3: Gemini CLI Only Supports 2.5 Series
We tested various models with the Gemini CLI (gemini binary).
gemini -p "hello" -m gemini-2.0-flash
# → ModelNotFoundError: models/gemini-2.0-flash is not found
gemini -p "hello" -m gemini-1.5-pro
# → ModelNotFoundError
gemini -p "hello" -m gemini-2.5-flash
# → Works normally
Gemini models accessible with the current account:
gemini-2.5-flash— Default recommended modelgemini-2.5-pro— High-tiergemini-2.5-flash-lite— Lightweight
Versions of Gemini 2.0 and below return ModelNotFoundError. While this might vary based on account plan or API key type, based on the Gemini CLI, only the 2.5 series worked reliably.
Discovery 4: ZAI Can Be Bypassed with Claude SDK
ZAI is a service that provides an endpoint compatible with the Anthropic API. This allows us to use GLM models with the Claude Code CLI by changing just two environment variables.
ANTHROPIC_BASE_URL=https://<ZAI endpoint> \
ANTHROPIC_AUTH_TOKEN=<ZAI_KEY> \
claude --model glm-5.1 --print "fix the bug"
Since Claude Code internally uses the Anthropic Python SDK, simply overriding ANTHROPIC_BASE_URL allows calling ZAI’s GLM models with the same format. It was interesting that we could reuse the existing claude backend without any separate adapter code.
The three GLM models used were:
glm-5.1— High-tierglm-4.7— Cost-performance balanceglm-4.5-air— Lightweight & High-speed
4-Way Fan-out Comparison Test
We simultaneously issued the same Go bug-fixing task to 4 representative workers out of the 18 (Claude Sonnet, GLM-5.1, Codex gpt-5.5, Gemini 2.5 Flash).
Task: "fix the off-by-one error in the binary search function"
Response times (wall clock):
| Worker | Model | Response Time |
|---|---|---|
| cc-go-dev-01 | claude-sonnet-4-6 | ~8 seconds |
| cc-zai-high-dev-01 | glm-5.1 | ~12 seconds |
| codex-py-dev-01 | gpt-5.5 | ~15 seconds |
| gemini-py-dev-01 | gemini-2.5-flash | ~10 seconds |
More interesting than the response times were the differences in their approaches. Claude tended to refactor the entire function, while Gemini preferred minimal modifications. Codex often included test code along with the fix.
Of course, this is a single task result and has no statistical significance. It was a verification at the “does it actually work” level, not a benchmark.
Distributed Workers: Adding a Second Host
If all workers are on the same server, the comparative experiment loses some of its meaning. Therefore, we added Claude workers to a second host.
The method for workers to access the NATS broker (on the first host) from the second host is via an autossh tunnel.
[Service]
ExecStart=autossh -N -L 4222:127.0.0.1:4222 broker-host
By forwarding the local port 4222 to the broker, workers can connect to nats://127.0.0.1:4222 from any host without code changes.
Advantage of this method: Workers don’t need to know where the broker is. They can always connect to localhost:4222.
Most Panicked Moment During Operation
The most distressing situation was losing the NATS operator signing key. NATS JetStream uses NKey-based authentication, and the operator/account’s signing key (nsc seed) is required to issue credentials for new workers.
nsc add user --account Services --name new-worker
# → "signing key not found"
There was no backup. Ultimately, we had to perform a large-scale cutover, regenerating the entire NATS operator and replacing all worker credentials with a new permission tree. Service downtime was approximately 60 seconds.
Lesson: Always create an offline backup of the NATS operator seed immediately after generation. If it’s lost, regeneration is the only option.
Summary
Practical conclusions from this experiment:
- Claude 3.x is EOL - Inaccessible via Claude Code CLI as of 2026. Use only 4.x.
- Codex ChatGPT Account Limited to 4 Models - gpt-5.5, 5.4, 5.4-mini, 5.3-codex. Pro models require a separate API key.
- Gemini Only 2.5 Series - Previous versions inaccessible via CLI.
- ZAI Integrable via Claude SDK Environment Variable Override - No separate adapter needed.
- NATS NKey Must Be Backed Up - Losing the signing key means reissuing everything.
The next installment will cover how these workers are connected, discussing system design and implementation.