
"I ran a local LLM for a month and my disk filled up so the app froze."
Common incident. The biggest local-LLM trap is gradual disk/RAM accumulation. This post covers four patterns from long-term operations that keep it stable.
Bottom line, four stable-operation patterns
Typical usage and danger lines for four areas, in one view. The RAM gauge reflects post-load with one model held, the disk gauge reflects 6-month accumulation.
Averages target 100~500 member chatrooms. Bigger rooms generate more logs.
1. Disk management
GGUF model file locations
Where Replyer stores models:
- macOS,
~/Library/Application Support/Replyer/models/*.gguf - Windows,
%APPDATA%\Replyer\models\*.gguf
Per-model disk vs RAM side-by-side
Disk footprint (GB) on SSD and RAM held (GB) after load for 4 models. Disk is a one-time cost, RAM is a per-operation cost.
Q4 3B fits next to an 8GB laptop's OS and browser comfortably. 27B wants 32GB+ RAM as the safe ceiling.
Removing unused models
In Replyer's [Settings → disk usage]:
- View current model / other downloaded models / total disk used
- [Clean up] button or per-model [Delete] for the unused ones
- Confirm free disk immediately after
Manual, delete the unused .gguf files from the models/ folder directly.
2. RAM management
One model loaded at a time
Replyer keeps a single model in RAM. Switching unloads the old + loads the new. Caveat, first reply after switch costs an extra 5~10s (load time), frequent switching accumulates latency → pick a default and stick with it.
Three RAM pressure signals
- Sudden latency jump (3s → 15s), swapping
- OS alerts "out of memory" or other apps force-closed
- Replyer reply failure "Failed to load model from file" → auto-recovered since R66+
Mitigation, close other heavy apps (50-tab Chrome / video / VMs), or pick a smaller model.
3. Logs / cache management
Replyer's log / data folders
logs/system / reply / error JSONLconversations/per-chatroom conversation JSONLbackups/auto-backup zipsagent_history/persona version history
Daily log accumulation by room size
Logs outpace model files once a room grows past 500 members.
2,000+ member rooms can hit 36GB/month, scheduled cleanup is mandatory.
Auto-cleanup policy
Replyer's [Settings → log cleanup], logs older than 30 days auto-deleted (default), conversations older than 90 days auto-deleted (default), users can adjust retention. Manual [Clean up] button for immediate cleanup.
4. Auto-recovery (introduced in R66+)
R66 (v0.12.7) added disk / model auto-recovery. If a GGUF corrupts (truncated download / disk full), Replyer's load attempt yields "Failed to load model from file" → auto-delete the corrupt file + retry download. Friendly message to the user (3 possible causes, partial download / unsupported build / out of memory). Before R66 the same problem could infinite-loop (load fail → re-download → fail again).
config.json corruption auto-banner
If config.json corrupts, top-fixed amber banner on launch, restore option from the nearest .bak file, one-click restore + auto-reload.
Stale filelock auto-cleanup
R66 cleans stale filelocks, locks older than 30 min auto-ignored (was 5 min, too many false positives), .incomplete mtime < 60s treated as active → lock protected. Model download interruption then retry is safe.
Actual 6-month+ usage data
Long-running operations, average disk (Replyer data) 5~15GB, active models 1~2, RAM use 4~10GB, monthly log accumulation 0.5~3GB (pre-cleanup), monthly cleanup frequency 1~4 times. Most run with defaults + monthly disk check + quarterly model cleanup for 6+ months stable.
Frequently asked questions
Q. What happens if disk fills during model download?
Since R66+ download pre-checks disk space. Model size + 1GB headroom required, else download refused with a friendly message. Filling during download → partial download → R66+ auto-clean + retry.
Q. What to do when RAM pressure signals appear?
- Immediate, close heavy apps (Chrome tabs / video)
- Short-term, switch to a smaller model (e.g. Qwen 2.5 3B)
- Long-term, RAM upgrade or desktop operating environment
See the RAM matrix in no-GPU laptop guide.
Q. Can I delete log files directly?
Yes. Delete .jsonl files under logs/ / conversations/. No impact on operation. But past-conversation analytics (Diagnostics keywords / routing) lose that data.
Q. Can I prune backup zips?
Yes. Delete .zip files under backups/ or use Replyer's [Backup → clean old backups]. Keeping 1~2 months of recent backups is enough.
Q. What if auto-recovery doesn't kick in?
- Quit Replyer
- Delete the corrupt
.gguffrom themodels/folder - Restart Replyer → fresh model download
Q. Can I monitor disk / RAM automatically?
Replyer's [Settings → system resources] or the /api/system/resources endpoint. Custom scripts can poll daily. For general operators, a monthly [Settings] glance is enough.
Next steps
To start auto-replies in your chatroom, download Replyer for your OS and follow the usage manual for the step-by-step guide.