Overview

Berkeley Function Calling Leaderboard (BFCL)

A benchmark designed to evaluate LLMs' ability to use functions/tools effectively. BFCL V3 introduces multi-turn and multi-step scenarios, testing models on complex interactions requiring state tracking, implicit actions, and error recovery. Includes diverse categories like Python functions, REST APIs, SQL queries, and domain-specific tools across 1000+ test cases.

from benchthing import Bench

bench = Bench("bfcl")

bench.run(
    benchmark="bfcl",
    task_id="1",
    models=yourLanguageModels
)

result = bench.get_result("1")

Sign up to get access to the Berkeley Function Calling Leaderboard (BFCL) benchmark API