Getting a second opinion from two other AIs

I’ve been using Claude Code daily for about six months and I trust it more than I probably should. But one tool can make you narrow-minded. It produces one vision.

I kept wanting a second opinion from a different model. Actually getting one was annoying. Copy the context, open another tool, paste it in, lose half the file references, re-explain what I’m looking at. Some tools don’t even run locally without fiddling with API keys. I’d do it once, get something useful back, then not bother again because the friction was too high.

What I wanted was to stay in Claude Code and say “ask someone else.” So I built that.

The skill

I made /consult. When I run it, Claude takes whatever I’m working on, writes a prompt, and hands it to a bash script that runs Codex CLI (OpenAI) and OpenCode (Gemini) in parallel with the same prompt. Both responses come back into Claude’s context, and it presents them side by side: where they agree, where they diverge, and a combined take.

A bash script and a markdown file. No extra API keys. No lost context because Claude writes the prompt itself.

What I noticed

They agree on the big stuff. If both models flag the same problem independently, it’s almost certainly real. That signal is hard to get from a single model because you have nothing to calibrate against.

They disagree on how to fix things. One will suggest something that sounds reasonable until you read the other’s take and realise it was bad advice. That’s the whole point. The disagreements are more valuable than the agreements.

Each one catches things the other misses. One spots a structural issue, the other spots a tone problem. Alone, you’d get half the picture and think it was complete.

Neither response is as useful as reading them against each other.

The gaps

Claude still makes the final call. It writes the prompt, reads both responses, writes the synthesis. I suspect there’s a bias toward whichever response is longer, and I haven’t tested that properly.

The CLI tools take a couple of minutes and aren’t always reliable. Codex occasionally returns nothing. OpenCode wraps its output in formatting noise.

But after months of running everything through one model, I’m reading AI suggestions the way I’d read code reviews from two different people. Looking for what one noticed that the other didn’t.