Razavi-Bench

Tasks50
Rubric0–4
Models3
Judges2

An expert-curated benchmark that tests whether frontier AI
models can reason about analog circuit design.

Leaderboard

#ModelScore

Score Distribution

Judge Agreement

Evaluation Rubric

ScoreMeaningCriteria
4CorrectCorrect conclusion and reasoning; topology, device roles, dominant mechanism, trend, and key assumptions are right.
3Mostly correctMain conclusion is right, with a minor omission, imprecision, or modeling flaw that does not change the result.
2Partially correctIdentifies some relevant mechanism, but misses an important circuit detail, trend, or design consequence.
1Mostly incorrectMain conclusion is wrong, but the answer contains a small amount of relevant circuit understanding.
0Incorrect / unusableFundamentally wrong, internally inconsistent, or based on a mistaken topology/device/connection.

Judges evaluate against golden solutions prioritizing analog-circuit reasoning over surface similarity. See GitHub for full details.