ChemGraph Leaderboard

ChemGraph Leaderboard provides a reproducible evaluation of agentic AI frameworks and large language models (LLMs) for computational chemistry and materials science.

This leaderboard benchmarks models on a diverse set of tasks, including:

  • Molecular geometry optimization, vibration analysis, and thermochemistry estimation.
  • Reaction thermodynamics prediction (enthalpy, Gibbs free energy) .
  • Tool-usage accuracy in multi-agent workflows.

Each model’s score reflects its ability to follow structured tool protocols, generate physically meaningful results, and reason across chemistry-specific contexts.
The benchmark results are generated offline and uploaded as part of the ChemGraph paper.

Use this leaderboard to explore how different models and agents perform across core chemistry tasks, from small-molecule modeling to multi-step reaction workflows.

{
  • "headers": [
    • "T",
    • "Model",
    • "Average ⬆️",
    • "name2smi",
    • "name2coord",
    • "name2opt",
    • "name2vib",
    • "name2gibbs",
    • "name2file",
    • "smi2coord",
    • "smi2opt",
    • "smi2vib",
    • "smi2gibbs",
    • "smi2file",
    • "react2enthalpy",
    • "react2gibbs",
    • "react2enthalpy_multiagent",
    • "react2gibbs_multiagent",
    • "Type",
    • "Architecture",
    • "Precision",
    • "Hub License",
    • "#Params (B)",
    • "Hub ❤️",
    • "Available on the hub",
    • "Model sha"
    ],
  • "data": [
    • [
      • 1,
      • "<a target="_blank" href="https://huggingface.co/anthropic/claude-3.5-haiku" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">anthropic/claude-3.5-haiku</a>",
      • 88.96,
      • 93.33,
      • 86.67,
      • 100,
      • 83.33,
      • 100,
      • 100,
      • 86.67,
      • 93.33,
      • 80,
      • 96.67,
      • 96.67,
      • 66.67,
      • 68.89,
      • 86.67,
      • 95.56,
      • "",
      • "?",
      • "float16",
      • "?",
      • 0,
      • 0,
      • false,
      • "main"
      ],
    • [
      • 2,
      • "<a target="_blank" href="https://huggingface.co/openai/gpt-4o-mini" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4o-mini</a>",
      • 85.93,
      • 100,
      • 90,
      • 100,
      • 90,
      • 100,
      • 86.67,
      • 93.33,
      • 100,
      • 86.67,
      • 93.33,
      • 86.67,
      • 40,
      • 48.89,
      • 86.67,
      • 86.67,
      • "",
      • "?",
      • "float16",
      • "?",
      • 0,
      • 0,
      • false,
      • "main"
      ],
    • [
      • 3,
      • "<a target="_blank" href="https://huggingface.co/qwen/Qwen2.5-14B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">qwen/Qwen2.5-14B</a>",
      • 63.33,
      • 93.33,
      • 86.67,
      • 86.67,
      • 60,
      • 60,
      • 70,
      • 90,
      • 80,
      • 86.67,
      • 80,
      • 76.67,
      • 13.33,
      • 17.78,
      • 24.44,
      • 24.44,
      • "",
      • "Qwen2ForCausalLM",
      • "float16",
      • "?",
      • 0,
      • 0,
      • true,
      • "main"
      ]
    ],
  • "metadata": null
}