felixmortas's picture
Create README.md
f79be97 verified

🤖 Autonomous Agentic System – GAIA Benchmark Solver

Final project for the Hugging Face Agents Course. I developed a high-level autonomous agent capable of solving complex, multi-step tasks from the GAIA Benchmark (General AI Assistants), involving real-world tool usage and multimodal reasoning.

The concept: A robust agentic workflow built with LangGraph that follows a Thought-Action-Observation cycle to decompose 20 validation queries into executable steps, navigating through technical constraints like API rate limits and data extraction challenges.

Technical highlights:

  • Resilient Model Orchestration: Implemented a fallback & routing strategy using Gemini 2.5 Pro as the primary brain, with automatic switching to Gemini Flash, Mistral, or Groq-hosted models to bypass free-tier rate limits without interrupting the execution flow.

  • Advanced Tool Engineering: Instead of overloading the context window with many small tools, I developed a utils.py library of complex functions. The agent uses a refined set of "Super-Tools" (Web Search, Excel manipulation, Audio Transcription, API interaction) that handle internal logic complexity autonomously.

  • Multimodal Innovation: Engineered a custom Video Analysis sub-agent. Since no free direct video-to-text API was available, I built a pipeline that intelligently extracts frames and metadata to reconstruct temporal context for the LLM.

  • Custom RAG Architecture: Integrated ChromaDB with a specialized retrieval algorithm optimized for the specific nuances of the GAIA dataset, ensuring the agent retrieves only the most relevant context for its reasoning steps.

  • Observability & Evaluation: Self-hosted LangFuse locally to monitor traces, evaluate agent costs, and debug the Reasoning-on-Action (Re-Act) loops without incurring cloud platform fees.

  • Full-Stack Deployment: Interface built with Gradio and hosted on Hugging Face Spaces, managed via Git for version control and CI/CD.

Results: Successfully validated 16 "Level 1" GAIA tasks, demonstrating a high degree of autonomy in tool selection and the ability to maintain long-term state across multiple reasoning cycles.

View certification