NeurIPS 2026 · Multi-Agent RTL · Self-Verifying

Verilog in seconds.
No testbench needed.

ChipMATE is a self-trained multi-agent framework. A Verilog agent and a Python reference-model agent cross-verify each other on random stimuli — no golden oracle. Try it directly below.

$
try:
↓ scroll
80.1%
VerilogEval V2 pass@1
ChipMATE-Agents-9B
200×
Smaller than DeepSeek V4
9B vs 1.6T parameters
0
Golden testbenches needed
self-verified at train + inference
2
Cooperating agents
Verilog ⇄ Python reference model
The architecture

Two agents. Cross-verified.

Most RTL generators rely on a golden testbench — a curated oracle that says "this is correct". ChipMATE has neither at train nor inference time. Instead, two agents independently propose implementations of the same spec, then probe each other with random stimuli. Agreement is the only signal.

01 / GENERATE

Verilog agent

Proposes a synthesizable top_module implementation. Optimized during RL with reward shaped on cross-agreement, not on test-pass rate. Sees only the natural-language spec + port skeleton — no reference solutions.

02 / REFERENCE

Python ref-model agent

Independently writes a Python function that emulates the same spec — cycle- accurate for sequential logic. This is the oracle proxy: any mismatch with the Verilog DUT under random inputs is a candidate bug in either agent.

03 / VERIFY

Cross-verify harness

Compiles the Verilog under iverilog, drives random port stimuli, compares cycle-by-cycle against the Python model. Reports a match rate ∈ [0, 1] and a structured diagnostic when they disagree.

04 / REFINE

Feedback loop

Disagreement traces are fed back as natural-language diagnostics for the next turn. After ≤ 5 rounds, the loop either reaches match_rate = 1.0 or surfaces the best candidate. No oracle, no curated testset.

By the numbers

Punching 200× above its weight

ChipMATE's two open-source checkpoints (4B and 9B base) beat DeepSeek V4 — a 1.6-trillion-parameter MoE — on VerilogEval V2 pass@1. Below: per-model accuracy on the public benchmark.

Model Type Params VerilogEval V2 pass@1
GPT-4o API ~undisclosed 53.4%
Claude 3.5 Sonnet API ~undisclosed 59.7%
DeepSeek V4 API · open 1.6T (MoE) 71.2%
ChipMATE-Agents-4B open · self-hostable 4B 75.0%
ChipMATE-Agents-9B open · self-hostable 9B 80.1%
ChipMATE multi-agent training overview
Multi-agent RL training: two agents cross-verify on procedurally-generated specs, the disagreement signal becomes the reward. No golden testbench, no curated dataset.