J

OPEN SOURCE · WINDOWS 10/11 · PYTHON 3.10–3.12

J.A.R.V.I.S v2

JUST A RATHER VERY INTELLIGENT SYSTEM

A full-screen transparent desktop overlay powered by voice, gesture, and AI. Dual LLM routing, MediaPipe hand tracking, wake-word detection, PCB-aesthetic HUD, and a sandboxed Gemini coding assistant — all running locally on Windows.

VOICE ACTIVE · GESTURE CAM ONLINE · GROQ + GEMINI CONNECTED · MIT LICENSE

FEATURE SET


🎙
VOICE CONTROL

Porcupine wake word ("Jarvis") → Whisper local transcription → Google STT fallback. Always listening, zero cloud dependency.

GESTURE ENGINE

MediaPipe HandLandmarker. Index finger moves cursor. Pinch clicks. Fist right-clicks. V-shape scrolls. No controller needed.

🤖
DUAL AI ROUTING

General questions → Groq llama-3.1-8b-instant. Coding requests → Gemini 1.5 Flash with file system access. Automatic routing.

💻
CODE ENGINE

Gemini writes, edits, and deletes files inside a sandboxed workspace. Path traversal blocked. Multi-turn coding context.

MACRO SYSTEM

Voice-triggered macros chain any sequence of actions — close all apps, launch Roblox, open YouTube — with configurable delays.

🔮
PCB OVERLAY

Full-screen transparent HUD. Real system data — CPU, RAM, disk, network, temp. Radar sweep. Circuit board frame. Edge-darkness vignette.

😴
SLEEP / WAKE

60s idle → overlay fades out, animation drops to 2fps, mic stays active. Wake by voice, F9, or Ctrl+Space. Smooth opacity animation.

🎓
SELF-TEACHING

Add new commands by voice: "Teach Jarvis to open Netflix". Writes directly to commands.json. Hot-reloads instantly.

🛡
SAFE SHUTDOWN

"System Exit Code 0" — graceful full shutdown. "System Exit Code 1" — release camera so other apps can use it. Mic stays active.

WHAT YOU CAN SAY


// SYSTEM
"shutdown"Shuts down Windows in 10 seconds
"restart"Restarts Windows in 10 seconds
"hibernate"Hibernates the PC
"cancel shutdown"Aborts a pending shutdown
"System Exit Code 0"Closes JARVIS gracefully
"System Exit Code 1"Releases camera — mic stays active
// BUILTIN
"volume up / louder"Increases system volume by 5 steps
"mute / silence"Toggles audio mute
"screenshot"Saves PNG to Desktop with timestamp
"lock screen"Locks the workstation
"close Chrome"Kills matching process (fuzzy match)
"search for song.mp3 on C drive"Walks C:\\ with fuzzy filename match
// MACROS
"gaming mode"Close all → Roblox → gaming playlist
"study mode"Close all → NotebookLM → YouTube → Gemini
"work mode"Close all → VS Code → Discord → GitHub
"chill mode"Close all → Spotify → Reddit
// CODING (routes to Gemini)
"write a Python script that…"Creates file in workspace
"edit the file main.py"Modifies file in workspace
"set workspace to C:\my\project"Changes code sandbox directory

GESTURE CONTROL


☝️
INDEX POINT
Move cursor
🤌
PINCH
Left click
FIST
Right click
✌️
V SHAPE
Scroll up / down

QUICK START


01
PYTHON ENVIRONMENT

Requires Python 3.10–3.12. MediaPipe does not support 3.14 yet.

py -3.12 -m venv .venv
.venv\Scripts\activate
02
INSTALL DEPENDENCIES

If PyAudio fails: pip install pipwin && pipwin install pyaudio

pip install -r requirements.txt
03
INSTALL FFMPEG

Required for Whisper local speech recognition. Download from gyan.dev/ffmpeg/builds, extract to C:\ffmpeg, add C:\ffmpeg\bin to your system PATH.

ffmpeg -version  # verify in a new terminal
04
CONFIGURE API KEYS

Copy config.example.py to config.py and fill in your keys. config.py is gitignored and never committed.

copy config.example.py config.py
# edit config.py with your keys
05
RUN

Test the overlay without any keys first, then run the full system.

python gui.py    # overlay preview — no keys needed
python main.py   # full system

TECH STACK


Python 3.12 PySide6 MediaPipe 0.10+ OpenCV Groq API Google Gemini Picovoice Porcupine OpenAI Whisper pyttsx3 PyAutoGUI pynput psutil Windows 10/11 MIT License