NeurIPS 2026 · Multi-Agent RTL · Self-Verifying

Verilog in seconds.
No testbench needed.

ChipMATE is a self-trained multi-agent framework. A Verilog agent and a Python reference-model agent cross-verify each other on random stimuli — no golden oracle. Try it directly below.

try:

How does it work? → Read the paper

↓ scroll

80.1%

VerilogEval V2 pass@1

ChipMATE-Agents-9B

200×

Smaller than DeepSeek V4

9B vs 1.6T parameters

Golden testbenches needed

self-verified at train + inference

Cooperating agents

Verilog ⇄ Python reference model

The architecture

Two agents. Cross-verified.

Most RTL generators rely on a golden testbench — a curated oracle that says "this is correct". ChipMATE has neither at train nor inference time. Instead, two agents independently propose implementations of the same spec, then probe each other with random stimuli. Agreement is the only signal.

01 / GENERATE

Verilog agent

Proposes a synthesizable top_module implementation. Optimized during RL with reward shaped on cross-agreement, not on test-pass rate. Sees only the natural-language spec + port skeleton — no reference solutions.

02 / REFERENCE

Python ref-model agent

Independently writes a Python function that emulates the same spec — cycle- accurate for sequential logic. This is the oracle proxy: any mismatch with the Verilog DUT under random inputs is a candidate bug in either agent.

03 / VERIFY

Cross-verify harness

Compiles the Verilog under iverilog, drives random port stimuli, compares cycle-by-cycle against the Python model. Reports a match rate ∈ [0, 1] and a structured diagnostic when they disagree.

04 / REFINE

Feedback loop

Disagreement traces are fed back as natural-language diagnostics for the next turn. After ≤ 5 rounds, the loop either reaches match_rate = 1.0 or surfaces the best candidate. No oracle, no curated testset.

By the numbers

Punching 200× above its weight

ChipMATE's two open-source checkpoints (4B and 9B base) beat DeepSeek V4 — a 1.6-trillion-parameter MoE — on VerilogEval V2 pass@1. Below: per-model accuracy on the public benchmark.

Model	Type	Params	VerilogEval V2 pass@1
GPT-4o	API	~undisclosed	53.4%
Claude 3.5 Sonnet	API	~undisclosed	59.7%
DeepSeek V4	API · open	1.6T (MoE)	71.2%
ChipMATE-Agents-4B	open · self-hostable	4B	75.0%
ChipMATE-Agents-9B	open · self-hostable	9B	80.1%

ChipMATE multi-agent training overview — Multi-agent RL training: two agents cross-verify on procedurally-generated specs, the disagreement signal becomes the reward. No golden testbench, no curated dataset.