2026-04-14

Local LLM disk + RAM health - GGUF cache cleanup, memory signals, auto-recovery

Local LLM disk + RAM health - GGUF cache cleanup, memory signals, auto-recovery

"I ran a local LLM for a month and my disk filled up so the app froze."

Common incident. The biggest local-LLM trap is gradual disk/RAM accumulation. This post covers four patterns from long-term operations that keep it stable.

Bottom line, four stable-operation patterns

Typical usage and danger lines for four areas, in one view. The RAM gauge reflects post-load with one model held, the disk gauge reflects 6-month accumulation.

Averages target 100~500 member chatrooms. Bigger rooms generate more logs.

1. Disk management

GGUF model file locations

Where Replyer stores models:

  • macOS, ~/Library/Application Support/Replyer/models/*.gguf
  • Windows, %APPDATA%\Replyer\models\*.gguf

Per-model disk vs RAM side-by-side

Disk footprint (GB) on SSD and RAM held (GB) after load for 4 models. Disk is a one-time cost, RAM is a per-operation cost.

Q4 3B fits next to an 8GB laptop's OS and browser comfortably. 27B wants 32GB+ RAM as the safe ceiling.

Removing unused models

In Replyer's [Settings → disk usage]:

  1. View current model / other downloaded models / total disk used
  2. [Clean up] button or per-model [Delete] for the unused ones
  3. Confirm free disk immediately after

Manual, delete the unused .gguf files from the models/ folder directly.

2. RAM management

One model loaded at a time

Replyer keeps a single model in RAM. Switching unloads the old + loads the new. Caveat, first reply after switch costs an extra 5~10s (load time), frequent switching accumulates latency → pick a default and stick with it.

Three RAM pressure signals

  • Sudden latency jump (3s → 15s), swapping
  • OS alerts "out of memory" or other apps force-closed
  • Replyer reply failure "Failed to load model from file" → auto-recovered since R66+

Mitigation, close other heavy apps (50-tab Chrome / video / VMs), or pick a smaller model.

3. Logs / cache management

Replyer's log / data folders

  • logs/ system / reply / error JSONL
  • conversations/ per-chatroom conversation JSONL
  • backups/ auto-backup zips
  • agent_history/ persona version history

Daily log accumulation by room size

Logs outpace model files once a room grows past 500 members.

2,000+ member rooms can hit 36GB/month, scheduled cleanup is mandatory.

Auto-cleanup policy

Replyer's [Settings → log cleanup], logs older than 30 days auto-deleted (default), conversations older than 90 days auto-deleted (default), users can adjust retention. Manual [Clean up] button for immediate cleanup.

4. Auto-recovery (introduced in R66+)

R66 (v0.12.7) added disk / model auto-recovery. If a GGUF corrupts (truncated download / disk full), Replyer's load attempt yields "Failed to load model from file" → auto-delete the corrupt file + retry download. Friendly message to the user (3 possible causes, partial download / unsupported build / out of memory). Before R66 the same problem could infinite-loop (load fail → re-download → fail again).

config.json corruption auto-banner

If config.json corrupts, top-fixed amber banner on launch, restore option from the nearest .bak file, one-click restore + auto-reload.

Stale filelock auto-cleanup

R66 cleans stale filelocks, locks older than 30 min auto-ignored (was 5 min, too many false positives), .incomplete mtime < 60s treated as active → lock protected. Model download interruption then retry is safe.

Actual 6-month+ usage data

Long-running operations, average disk (Replyer data) 5~15GB, active models 1~2, RAM use 4~10GB, monthly log accumulation 0.5~3GB (pre-cleanup), monthly cleanup frequency 1~4 times. Most run with defaults + monthly disk check + quarterly model cleanup for 6+ months stable.

Frequently asked questions

Q. What happens if disk fills during model download?

Since R66+ download pre-checks disk space. Model size + 1GB headroom required, else download refused with a friendly message. Filling during download → partial download → R66+ auto-clean + retry.

Q. What to do when RAM pressure signals appear?

  1. Immediate, close heavy apps (Chrome tabs / video)
  2. Short-term, switch to a smaller model (e.g. Qwen 2.5 3B)
  3. Long-term, RAM upgrade or desktop operating environment

See the RAM matrix in no-GPU laptop guide.

Q. Can I delete log files directly?

Yes. Delete .jsonl files under logs/ / conversations/. No impact on operation. But past-conversation analytics (Diagnostics keywords / routing) lose that data.

Q. Can I prune backup zips?

Yes. Delete .zip files under backups/ or use Replyer's [Backup → clean old backups]. Keeping 1~2 months of recent backups is enough.

Q. What if auto-recovery doesn't kick in?

  1. Quit Replyer
  2. Delete the corrupt .gguf from the models/ folder
  3. Restart Replyer → fresh model download

Q. Can I monitor disk / RAM automatically?

Replyer's [Settings → system resources] or the /api/system/resources endpoint. Custom scripts can poll daily. For general operators, a monthly [Settings] glance is enough.

Next steps

To start auto-replies in your chatroom, download Replyer for your OS and follow the usage manual for the step-by-step guide.