AI/ML
In Progress
February 15, 2026

Multi-model AI life coach agent.

A multi-model AI life coach agent built with LangGraph that understands your voice, sees your images, and speaks back to you. Powered by open-source models Llama 3.3, Whisper and SD XL.

Multi-model AI life coach agent.

Project Gallery

Gallery 0
Gallery 1
Gallery 2
Gallery 3
Gallery 4

About this Project

I built this project to showcase my passion for building end-to-end AI systems.

Coach Aria isn't just another chatbot — it's a fully-functional, multimodal AI agent that can:

  • Have meaningful coaching conversations powered by Llama 3.3
  • Understand your voice using Whisper STT
  • Understand images you send using Llama 4 Scout Vision
  • Speak back to you with natural voice synthesis via ElevenLabs
  • Generate contextual images using Stable Diffusion XL
  • Remember important details about you across sessions using vector memory
  • Intelligently route between text, audio, and image responses

I built this to demonstrate that I can design, architect, and implement complex AI systems from scratch — not just follow tutorials.

Features in Action

Voice Understanding

Aria can listen to your voice messages and transcribe them in real-time using Whisper.

Voice messages are transcribed using Whisper Large v3 Turbo via Groq

Image Understanding

Aria can analyze and understand images you send, providing detailed descriptions and context-aware responses.

Aria uses Llama 4 Scout Vision model to understand and describe images


Voice Response

Aria can speak back to you with natural, expressive voice synthesis.

https://github.com/BhimPrasadAdhikari/life-coach/raw/main/public/screenshots/videos/agent_speak.mp4

Natural voice responses powered by ElevenLabs TTS


AI Image Generation

Aria can generate contextual images based on the conversation. Here are some examples:

Images generated by Stable Diffusion XL based on coaching conversations


What Can Aria Do?

CapabilityDescriptionTechnologies
Conversational CoachingEmpathetic, growth-mindset based conversationsLangGraph, Groq Llama 3.3
Voice InputTranscribes your voice messages in real-timeWhisper Large v3 Turbo via Groq
Voice OutputSends voice responses that sound naturalElevenLabs TTS
Image UnderstandingAnalyzes images you send and responds intelligentlyLlama 4 Scout Vision via Groq
Image GenerationCreates visualizations based on conversation contextStable Diffusion XL via HuggingFace
Long-Term MemoryRemembers your goals, challenges, and preferencesVector Store + Semantic Search
Smart RoutingAutomatically decides when to respond with text, voice, or imagesLLM-powered intent classification

System Architecture

High-Level Workflow

Detailed Node Operations

How Each Node Works

NodePurposeWhat It Does
Memory RetrievalContext LoadingSearches vector store for relevant past memories using semantic similarity
RouterIntent ClassificationUses LLM to classify if user wants text, audio, or image response
ConversationText ResponseGenerates empathetic coaching response using Llama 3.3
AudioVoice ResponseGenerates text response + converts to speech via ElevenLabs
ImageVisual ResponseCreates scenario prompt → enhances it → generates image via SDXL
Memory SavingLearningAnalyzes conversation for important facts and stores in vector DB

Tech Stack

LayerTechnology
FrameworkLangGraph
LLMGroq (Llama 3.3 70B)
VisionLlama 4 Scout Vision
STTWhisper Large v3 Turbo
TTSElevenLabs
Image GenStable Diffusion XL
Vector DBQdrant
InterfaceChainlit
DeploymentGoogle Cloud Run

Getting Started

Prerequisites

  • Python 3.13+
  • API Keys for: Groq, ElevenLabs, HuggingFace, Qdrant

Installation

bash
# Clone the repository git clone https://github.com/BhimPrasadAdhikari/whatsapp-agent.git cd whatsapp-agent # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt

Environment Setup

Copy the example environment file and add your API keys:

bash
cp .env.example .env

Then edit .env with your API keys. 📖 See API Setup Guide for detailed instructions on obtaining all API keys.

env
GROQ_API_KEY=your_groq_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key HF_TOKEN=your_huggingface_token QDRANT_URL=your_qdrant_url QDRANT_API_KEY=your_qdrant_api_key

Run the Application

bash
chainlit run interfaces/chainlit/app.py -w

Let's Connect!

I'm actively looking for internship and full-time opportunities in AI/ML Engineering.

LinkedIn


License

This project is open source and available under the MIT License.


Built with ❤️ by Bhim Prasad Adhikari

Project Statistics

Views
0
Language
Python

Technologies

PythonChainlitLangGraphGitLangchain