Complete system architecture for writing style LoRA + voice clone TTS on M4 Pro Mac Mini
The system has two independent AI pipelines that share infrastructure:
| Pipeline | Input | Model | Output |
|---|---|---|---|
| Writing Style | Prompt (draft request) | Qwen 2.5 7B + LoRA via Ollama | Styled text |
| Voice Clone | Text + Reference audio | OpenVoice V2 + MeloTTS via FastAPI | Audio waveform (WAV/MP3) |
| Unified Pipeline | Prompt + Reference audio | Both above in sequence | Styled text + spoken audio |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Client โ
โ "Draft a reply thanking my colleague" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ POST /api/tts/lora/generate
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SvelteKit Backend (Deno) โ
โ 1. Validate request โ
โ 2. Build messages array with system prompt + user prompt โ
โ 3. Server-side fetch to Ollama localhost:11434 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ POST /api/chat { model:"thota-writing" }
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Ollama (Metal GPU, M4 Pro) โ
โ Base: Qwen 2.5 7B Q4_KM + LoRA: thota-style-lora.gguf โ
โ System: "You are Thota's writing assistant..." โ
โ Output: "Nice one โ thanks for flagging this..." โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ JSON { message.content }
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SvelteKit Backend โ
โ Return styled text to client โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Client โ
โ POST /api/tts/pipeline { text, referenceAudio } โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SvelteKit Backend (Deno) โ
โ 1. Call Ollama โ get styled text โ
โ 2. Base64-encode reference audio (or use stored voice ID) โ
โ 3. Call FastAPI localhost:8000/tts โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
โ POST /tts { text, reference } โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Python Server (Mac Mini, port 8000) โ
โ 1. Load reference audio โ
โ 2. MeloTTS: synthesize base audio (text โ waveform) โ
โ 3. OpenVoice: clone tone color from reference โ
โ 4. Return WAV audio โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ audio/wav
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SvelteKit Backend โ
โ Stream audio back to client as chunked response โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Mac Mini M4 Pro (Home Network) โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Ollama โ โ FastAPI โ โ SvelteKit/Deno โ โ
โ โ :11434 โ โ :8000 โ โ :3000 โ โ
โ โ (localhost)โ โ (localhost)โ โ (localhost) โ โ
โ โโโโโโโโฌโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโโโโโฌโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ localhost only โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ outbound tunnel
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cloudflare Edge Network โ
โ Tunnel: cloudflared (persistent, outbound-only) โ
โ Public URL: voice-api.yourdomain.com โ Mac Mini :3000 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
All services bind to localhost only. Only Cloudflare Tunnel connects outward from the Mac Mini. No inbound ports opened on router.
/Users/
โโโ thota/
โโโ ai-models/ # Model weights (50GB+ free space)
โ โโโ qwen2.5-7b/ # Base Qwen 2.5 7B Instruct
โ โโโ thota-style-lora/ # Trained LoRA adapter
โ โโโ openvoice-v2/ # OpenVoice V2 checkpoints
โ โโโ checkpoints_v2/
โโโ voice-references/ # ๐ FileVault encrypted
โ โโโ neutral/
โ โโโ happy/
โ โโโ authoritative/
โ โโโ ... (by emotion)
โโโ datasets/
โ โโโ email-parsed.jsonl # Parsed email instruction pairs
โ โโโ whatsapp-parsed.jsonl # Parsed WhatsApp pairs
โ โโโ combined-dataset.jsonl # Deduplicated + merged
โโโ scripts/
โโโ parse_emails.py
โโโ parse_whatsapp.py
โโโ train_lora.py
| Model / File | Size | Location | Source |
|---|---|---|---|
| Qwen 2.5 7B Instruct (Q4_KM) | ~4โ5 GB | ~/.cache/huggingface/ | HuggingFace |
| thota-style-lora.gguf | 100โ500 MB | ai-models/thota-style-lora/ | Trained output |
| OpenVoice V2 checkpoints | ~400 MB | openvoice-v2/checkpoints_v2/ | myshell-ai/S3 |
| MeloTTS (EN base) | ~300 MB | openvoice-v2/ | myshell-ai/MeloTTS |
| Reference audio (1hr) | ~1 GB (24kHz) | voice-references/ | Thota's recordings |
| Service | Manager | Restart Policy | Command / Config |
|---|---|---|---|
| Ollama server | launchd or tmux | Restart on crash, start on boot | launchctl start com.ollama.server |
| FastAPI TTS server | launchd or tmux | Restart on crash, start on boot | uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1 |
| SvelteKit/Deno | launchd | Restart on crash, start on boot | deno task start |
| Cloudflare Tunnel | launchd | Restart on crash, reconnect on network | cloudflared tunnel run --token <TOKEN> |