OPEN SOURCE · WINDOWS 10/11 · PYTHON 3.10–3.12

J.A.R.V.I.S v2

JUST A RATHER VERY INTELLIGENT SYSTEM

A full-screen transparent desktop overlay powered by voice, gesture, and AI. Dual LLM routing, MediaPipe hand tracking, wake-word detection, PCB-aesthetic HUD, and a sandboxed Gemini coding assistant — all running locally on Windows.

⬇ Download ⎇ View on GitHub ▶ Quick Start

VOICE ACTIVE · GESTURE CAM ONLINE · GROQ + GEMINI CONNECTED · MIT LICENSE

CAPABILITIES

FEATURE SET

🎙

VOICE CONTROL

Porcupine wake word ("Jarvis") → Whisper local transcription → Google STT fallback. Always listening, zero cloud dependency.

✋

GESTURE ENGINE

MediaPipe HandLandmarker. Index finger moves cursor. Pinch clicks. Fist right-clicks. V-shape scrolls. No controller needed.

🤖

DUAL AI ROUTING

General questions → Groq llama-3.1-8b-instant. Coding requests → Gemini 1.5 Flash with file system access. Automatic routing.

💻

CODE ENGINE

Gemini writes, edits, and deletes files inside a sandboxed workspace. Path traversal blocked. Multi-turn coding context.

⚡

MACRO SYSTEM

Voice-triggered macros chain any sequence of actions — close all apps, launch Roblox, open YouTube — with configurable delays.

🔮

PCB OVERLAY

Full-screen transparent HUD. Real system data — CPU, RAM, disk, network, temp. Radar sweep. Circuit board frame. Edge-darkness vignette.

😴

SLEEP / WAKE

60s idle → overlay fades out, animation drops to 2fps, mic stays active. Wake by voice, F9, or Ctrl+Space. Smooth opacity animation.

🎓

SELF-TEACHING

Add new commands by voice: "Teach Jarvis to open Netflix". Writes directly to commands.json. Hot-reloads instantly.

🛡

SAFE SHUTDOWN

"System Exit Code 0" — graceful full shutdown. "System Exit Code 1" — release camera so other apps can use it. Mic stays active.

COMMAND REFERENCE

WHAT YOU CAN SAY

// SYSTEM

"shutdown"Shuts down Windows in 10 seconds

"restart"Restarts Windows in 10 seconds

"hibernate"Hibernates the PC

"cancel shutdown"Aborts a pending shutdown

"System Exit Code 0"Closes JARVIS gracefully

"System Exit Code 1"Releases camera — mic stays active

// BUILTIN

"volume up / louder"Increases system volume by 5 steps

"mute / silence"Toggles audio mute

"screenshot"Saves PNG to Desktop with timestamp

"lock screen"Locks the workstation

"close Chrome"Kills matching process (fuzzy match)

"search for song.mp3 on C drive"Walks C:\\ with fuzzy filename match

// MACROS

"gaming mode"Close all → Roblox → gaming playlist

"study mode"Close all → NotebookLM → YouTube → Gemini

"work mode"Close all → VS Code → Discord → GitHub

"chill mode"Close all → Spotify → Reddit

// CODING (routes to Gemini)

"write a Python script that…"Creates file in workspace

"edit the file main.py"Modifies file in workspace

"set workspace to C:\my\project"Changes code sandbox directory

HAND TRACKING

GESTURE CONTROL

☝️

INDEX POINT

Move cursor

🤌

PINCH

Left click

✊

FIST

Right click

✌️

V SHAPE

Scroll up / down

INSTALLATION

QUICK START

PYTHON ENVIRONMENT

Requires Python 3.10–3.12. MediaPipe does not support 3.14 yet.

py -3.12 -m venv .venv
.venv\Scripts\activate

INSTALL DEPENDENCIES

If PyAudio fails: pip install pipwin && pipwin install pyaudio

pip install -r requirements.txt

INSTALL FFMPEG

Required for Whisper local speech recognition. Download from gyan.dev/ffmpeg/builds, extract to C:\ffmpeg, add C:\ffmpeg\bin to your system PATH.

ffmpeg -version  # verify in a new terminal

CONFIGURE API KEYS

Copy config.example.py to config.py and fill in your keys. config.py is gitignored and never committed.

copy config.example.py config.py
# edit config.py with your keys

RUN

Test the overlay without any keys first, then run the full system.

python gui.py    # overlay preview — no keys needed
python main.py   # full system

BUILT WITH

TECH STACK

Python 3.12 PySide6 MediaPipe 0.10+ OpenCV Groq API Google Gemini Picovoice Porcupine OpenAI Whisper pyttsx3 PyAutoGUI pynput psutil Windows 10/11 MIT License