GAIA Benchmark¶
Let’s dive into the GAIA benchmark - a rigorous test designed to challenge AI agents’ problem-solving skills!
🤔 What’s GAIA?¶
GAIA is a benchmark that tests AI systems on a wide range of real-world tasks that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
🚀 Let’s Get Started!¶
0. Prepare the docker images:¶
We assume that you have followed the Instruction to prepare the docker images for the server, client and ReAct agent.
1. Launch the Milvus service:¶
First, we need to launch the Milvus service. It’s like the librarian of our AI team, helping to organize and retrieve information quickly.
docker-compose -f dockerfiles/compose/milvus.yaml up
2. Launch the server and the client:¶
Now, let’s bring our server and client agents online. This command sets up the collaborative environment.
docker-compose -f dockerfiles/compose/gaia.yaml up
3. Let the Games Begin!¶
With everything in place, it’s time to start the GAIA benchmark.
python scripts/gaia/test_gaia.py --level <level_in_gaia> --max_workers <thread_number>
Replace
<level_in_gaia>
with your chosen difficulty (1, 2, or 3).Set
<thread_number>
to decide how many tasks the team handles simultaneously.