Foundations

SWE-bench

Benchmark for evaluating LLMs on real-world software engineering tasks from GitHub.

Related topics