ChemGraph Leaderboard

ChemGraph Leaderboard provides a reproducible evaluation of agentic AI frameworks and large language models (LLMs) for computational chemistry and materials science.

Models are evaluated daily on 14 chemistry queries grouped into 8 task categories:

Category Queries Description
SMILES Lookup 2 Convert molecule names to SMILES strings
Coordinate Gen 2 Generate 3D coordinates from SMILES
Geometry Opt 1 Geometry optimization with DFT/ML potentials
Vib Frequency 1 Vibrational frequency analysis
Thermochem 1 Thermochemical properties (enthalpy, entropy, Gibbs)
Dipole 1 Dipole moment calculation
Energy 3 Single-point energy and geometry opt with JSON extraction
Reaction Gibbs 3 Reaction Gibbs free energy for multi-step workflows

Each model's score reflects its ability to follow structured tool protocols, generate physically meaningful results, and reason across chemistry-specific contexts. Results are scored by an LLM judge with binary accuracy (correct/incorrect) and 5% relative tolerance for numerical values.

Use this leaderboard to explore how different models and agents perform across core chemistry tasks, from small-molecule modeling to multi-step reaction workflows.

{
  • "headers": [
    • "T",
    • "Model",
    • "Average โฌ†๏ธ",
    • "SMILES Lookup",
    • "Coordinate Gen",
    • "Geometry Opt",
    • "Vib Frequency",
    • "Thermochem",
    • "Dipole",
    • "Energy",
    • "Reaction Gibbs",
    • "Type",
    • "Architecture",
    • "Precision",
    • "Hub License",
    • "#Params (B)",
    • "Hub โค๏ธ",
    • "Available on the hub",
    • "Model sha"
    ],
  • "data": [
    • [
      • 1,
      • "<a target="_blank" href="https://huggingface.co/openai/gpt-4o" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4o</a>",
      • 75,
      • 50,
      • 50,
      • 100,
      • 100,
      • 100,
      • 100,
      • 66.67,
      • 33.33,
      • "",
      • "?",
      • "float16",
      • "?",
      • 0,
      • 0,
      • true,
      • "main"
      ],
    • [
      • 2,
      • "<a target="_blank" href="https://huggingface.co/openai/gpt-5.4" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-5.4</a>",
      • 58.33,
      • 100,
      • 100,
      • 0,
      • 0,
      • 100,
      • 100,
      • 33.33,
      • 33.33,
      • "",
      • "?",
      • "float16",
      • "?",
      • 0,
      • 0,
      • true,
      • "main"
      ],
    • [
      • 3,
      • "<a target="_blank" href="https://huggingface.co/openai/gpt-5.2" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-5.2</a>",
      • 58.33,
      • 100,
      • 100,
      • 0,
      • 0,
      • 100,
      • 100,
      • 33.33,
      • 33.33,
      • "",
      • "?",
      • "float16",
      • "?",
      • 0,
      • 0,
      • true,
      • "main"
      ],
    • [
      • 4,
      • "<a target="_blank" href="https://huggingface.co/openai/gpt-5.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-5.1</a>",
      • 54.17,
      • 100,
      • 100,
      • 0,
      • 0,
      • 100,
      • 100,
      • 33.33,
      • 0,
      • "",
      • "?",
      • "float16",
      • "?",
      • 0,
      • 0,
      • true,
      • "main"
      ],
    • [
      • 5,
      • "<a target="_blank" href="https://huggingface.co/anthropic/claude-opus-4.6" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">anthropic/claude-opus-4.6</a>",
      • 41.67,
      • 100,
      • 100,
      • 0,
      • 0,
      • 100,
      • 0,
      • 33.33,
      • 0,
      • "",
      • "?",
      • "float16",
      • "?",
      • 0,
      • 0,
      • true,
      • "main"
      ]
    ],
  • "metadata": null
}