I Sent the Same Coding Task to 4 AIs Simultaneously

What happens when the same bug-fixing task is sent to Claude, ZAI (GLM), OpenAI Codex, and Google Gemini simultaneously?

This question sparked the AgentForge project. We built a system that connects multiple LLM CLIs with the NATS JetStream message queue to process the same tasks in parallel, and in the process, we made some unexpected discoveries. This article focuses on the comparative experimental findings during the setup phase.

The system’s design and implementation will be covered in Part 2.

List of AIs Tested

The final configuration of 18 operational workers is as follows:

Family	Model	Notes
Claude Code	claude-sonnet-4-6	Main development worker
Claude Code	claude-sonnet-4-5	Previous generation comparison
Claude Code	claude-haiku-4-5	Lightweight & High-speed
Claude Code	claude-opus-4-6	Top-tier
Claude Code	claude-opus-4-5	Previous generation comparison
ZAI (GLM)	glm-5.1	High-tier
ZAI (GLM)	glm-4.7	Mid-tier
ZAI (GLM)	glm-4.5-air	Lightweight tier
OpenAI Codex	gpt-5.5
Codex	gpt-5.4	1M context
Codex	gpt-5.4-mini	400K context
Codex	gpt-5.3-codex	272K context
Google Gemini	gemini-2.5-flash
Gemini	gemini-2.5-pro	High-tier
Gemini	gemini-2.5-flash-lite	Lightweight

The list was much shorter when we first started. It grew as we experimented with which models were available.

Discovery 1: Claude 3.x Series is Already Inaccessible

Those who have used Claude Code for a long time might recall Claude 3.7 Sonnet, 3.5 Sonnet, and 3.5 Haiku. We attempted to add these models as workers.

claude --model claude-3-7-sonnet-20250219 --print "hello"
# → "may not exist or no access"

All three models returned the same error. The Claude 3 series reached its EOL in early 2026, and access via the Claude Code CLI has been blocked. Currently, only the 4.x series is available with a Claude Code subscription.

Conclusion: Claude workers were configured using only the 4.5/4.6 series.

Discovery 2: Limited Model Selection for ChatGPT Account Codex

The OpenAI Codex CLI authenticates with a ChatGPT Plus/Pro account or a separate API key. If authenticated via a ChatGPT account, the accessible models are limited.

codex --model gpt-5.5-pro "fix the bug"
# → "Model gpt-5.5-pro is not supported with ChatGPT account"

codex --model gpt-5.5 "fix the bug"
# → Works normally

Models available with a ChatGPT account:

Model	Context	Inference Level
gpt-5.5	1M / 1M	High
gpt-5.4	1M / 1M	Medium
gpt-5.4-mini	400K / 400K	Medium
gpt-5.3-codex	272K / 400K	Medium

All other models, including gpt-5.5-pro, returned a “not supported with ChatGPT account” error. More models are available with an API key, but that’s a different approach.

Discovery 3: Gemini CLI Only Supports 2.5 Series

We tested various models with the Gemini CLI (gemini binary).

gemini -p "hello" -m gemini-2.0-flash
# → ModelNotFoundError: models/gemini-2.0-flash is not found

gemini -p "hello" -m gemini-1.5-pro
# → ModelNotFoundError

gemini -p "hello" -m gemini-2.5-flash
# → Works normally

Gemini models accessible with the current account:

gemini-2.5-flash — Default recommended model
gemini-2.5-pro — High-tier
gemini-2.5-flash-lite — Lightweight

Versions of Gemini 2.0 and below return ModelNotFoundError. While this might vary based on account plan or API key type, based on the Gemini CLI, only the 2.5 series worked reliably.

Discovery 4: ZAI Can Be Bypassed with Claude SDK

ZAI is a service that provides an endpoint compatible with the Anthropic API. This allows us to use GLM models with the Claude Code CLI by changing just two environment variables.

ANTHROPIC_BASE_URL=https://<ZAI endpoint> \
ANTHROPIC_AUTH_TOKEN=<ZAI_KEY> \
claude --model glm-5.1 --print "fix the bug"

Since Claude Code internally uses the Anthropic Python SDK, simply overriding ANTHROPIC_BASE_URL allows calling ZAI’s GLM models with the same format. It was interesting that we could reuse the existing claude backend without any separate adapter code.

The three GLM models used were:

glm-5.1 — High-tier
glm-4.7 — Cost-performance balance
glm-4.5-air — Lightweight & High-speed

4-Way Fan-out Comparison Test

We simultaneously issued the same Go bug-fixing task to 4 representative workers out of the 18 (Claude Sonnet, GLM-5.1, Codex gpt-5.5, Gemini 2.5 Flash).

Task: "fix the off-by-one error in the binary search function"

Response times (wall clock):

Worker	Model	Response Time
cc-go-dev-01	claude-sonnet-4-6	~8 seconds
cc-zai-high-dev-01	glm-5.1	~12 seconds
codex-py-dev-01	gpt-5.5	~15 seconds
gemini-py-dev-01	gemini-2.5-flash	~10 seconds

More interesting than the response times were the differences in their approaches. Claude tended to refactor the entire function, while Gemini preferred minimal modifications. Codex often included test code along with the fix.

Of course, this is a single task result and has no statistical significance. It was a verification at the “does it actually work” level, not a benchmark.

Distributed Workers: Adding a Second Host

If all workers are on the same server, the comparative experiment loses some of its meaning. Therefore, we added Claude workers to a second host.

The method for workers to access the NATS broker (on the first host) from the second host is via an autossh tunnel.

[Service]
ExecStart=autossh -N -L 4222:127.0.0.1:4222 broker-host

By forwarding the local port 4222 to the broker, workers can connect to nats://127.0.0.1:4222 from any host without code changes.

Advantage of this method: Workers don’t need to know where the broker is. They can always connect to localhost:4222.

Most Panicked Moment During Operation

The most distressing situation was losing the NATS operator signing key. NATS JetStream uses NKey-based authentication, and the operator/account’s signing key (nsc seed) is required to issue credentials for new workers.

nsc add user --account Services --name new-worker
# → "signing key not found"

There was no backup. Ultimately, we had to perform a large-scale cutover, regenerating the entire NATS operator and replacing all worker credentials with a new permission tree. Service downtime was approximately 60 seconds.

Lesson: Always create an offline backup of the NATS operator seed immediately after generation. If it’s lost, regeneration is the only option.

Summary

Practical conclusions from this experiment:

Claude 3.x is EOL - Inaccessible via Claude Code CLI as of 2026. Use only 4.x.
Codex ChatGPT Account Limited to 4 Models - gpt-5.5, 5.4, 5.4-mini, 5.3-codex. Pro models require a separate API key.
Gemini Only 2.5 Series - Previous versions inaccessible via CLI.
ZAI Integrable via Claude SDK Environment Variable Override - No separate adapter needed.
NATS NKey Must Be Backed Up - Losing the signing key means reissuing everything.

The next installment will cover how these workers are connected, discussing system design and implementation.