Multi-model AI life coach agent.

I built this project to showcase my passion for building end-to-end AI systems.

Coach Aria isn't just another chatbot — it's a fully-functional, multimodal AI agent that can:

Have meaningful coaching conversations powered by Llama 3.3
Understand your voice using Whisper STT
Understand images you send using Llama 4 Scout Vision
Speak back to you with natural voice synthesis via ElevenLabs
Generate contextual images using Stable Diffusion XL
Remember important details about you across sessions using vector memory
Intelligently route between text, audio, and image responses

I built this to demonstrate that I can design, architect, and implement complex AI systems from scratch — not just follow tutorials.

Features in Action

Voice Understanding

Aria can listen to your voice messages and transcribe them in real-time using Whisper.

Voice messages are transcribed using Whisper Large v3 Turbo via Groq

Image Understanding

Aria can analyze and understand images you send, providing detailed descriptions and context-aware responses.

Aria uses Llama 4 Scout Vision model to understand and describe images

Voice Response

Aria can speak back to you with natural, expressive voice synthesis.

https://github.com/BhimPrasadAdhikari/life-coach/raw/main/public/screenshots/videos/agent_speak.mp4

Natural voice responses powered by ElevenLabs TTS

AI Image Generation

Aria can generate contextual images based on the conversation. Here are some examples:

Images generated by Stable Diffusion XL based on coaching conversations

What Can Aria Do?

Capability	Description	Technologies
Conversational Coaching	Empathetic, growth-mindset based conversations	LangGraph, Groq Llama 3.3
Voice Input	Transcribes your voice messages in real-time	Whisper Large v3 Turbo via Groq
Voice Output	Sends voice responses that sound natural	ElevenLabs TTS
Image Understanding	Analyzes images you send and responds intelligently	Llama 4 Scout Vision via Groq
Image Generation	Creates visualizations based on conversation context	Stable Diffusion XL via HuggingFace
Long-Term Memory	Remembers your goals, challenges, and preferences	Vector Store + Semantic Search
Smart Routing	Automatically decides when to respond with text, voice, or images	LLM-powered intent classification

System Architecture

High-Level Workflow

Detailed Node Operations

How Each Node Works

Node	Purpose	What It Does
Memory Retrieval	Context Loading	Searches vector store for relevant past memories using semantic similarity
Router	Intent Classification	Uses LLM to classify if user wants text, audio, or image response
Conversation	Text Response	Generates empathetic coaching response using Llama 3.3
Audio	Voice Response	Generates text response + converts to speech via ElevenLabs
Image	Visual Response	Creates scenario prompt → enhances it → generates image via SDXL
Memory Saving	Learning	Analyzes conversation for important facts and stores in vector DB

Tech Stack

Layer	Technology
Framework	LangGraph
LLM	Groq (Llama 3.3 70B)
Vision	Llama 4 Scout Vision
STT	Whisper Large v3 Turbo
TTS	ElevenLabs
Image Gen	Stable Diffusion XL
Vector DB	Qdrant
Interface	Chainlit
Deployment	Google Cloud Run

Getting Started

Prerequisites

Python 3.13+
API Keys for: Groq, ElevenLabs, HuggingFace, Qdrant

Installation

bash
# Clone the repository
git clone https://github.com/BhimPrasadAdhikari/whatsapp-agent.git
cd whatsapp-agent

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Environment Setup

Copy the example environment file and add your API keys:

bash
cp .env.example .env

Then edit .env with your API keys. 📖 See API Setup Guide for detailed instructions on obtaining all API keys.

env
GROQ_API_KEY=your_groq_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
HF_TOKEN=your_huggingface_token
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_api_key

Run the Application

bash
chainlit run interfaces/chainlit/app.py -w

Let's Connect!

I'm actively looking for internship and full-time opportunities in AI/ML Engineering.

LinkedIn •

License

This project is open source and available under the MIT License.

Built with ❤️ by Bhim Prasad Adhikari

Project Gallery

About this Project