Benchmark for evaluating LLMs on real-world software engineering tasks from GitHub.
Foundations
The Agentic Workflow