
"There are many local LLMs, which one feels most natural in Korean?"
Qwen 2.5 and Gemma 3 / Gemma 4 are the two pillars of Korean-capable open local LLMs as of May 2026. This post evaluates seven models on chatroom auto-reply scenarios with multi-axis radar + speed chart.
Charts and scores below are simulated user-perceived ratings (small operator sample). Actual experience varies with your room / message pattern / member tone, validate in your environment.
Overall, recommended by machine RAM
| RAM | First pick | Second pick | Notes |
|---|---|---|---|
| 8GB | Qwen 2.5 3B Q4 | Gemma 4 E2B | Mind OS headroom |
| 16GB | Qwen 2.5 3B Q4 | Gemma 4 E4B | General-user sweet spot |
| 16GB + GPU | Gemma 4 E4B Vision | Gemma 3 12B Q4 | Photo replies enabled |
| 32GB | Gemma 3 12B Q4 | Qwen 2.5 7B Q4 | Deeper responses |
| 64GB + GPU | Gemma 3 27B Q4 | Gemma 3 12B Q4 | Power user |
Safest default: Qwen 2.5 3B Q4. Stable from 8GB to 64GB. Replyer's default since R66.
Multi-axis Korean reply quality (1~5)
Response latency on M2 (s)
Per-model Korean reply quality
Average score (1~5) across 50 scenarios:
| Model | Korean naturalness | Honorifics | Tone consistency | Length fit | Total |
|---|---|---|---|---|---|
| Qwen 2.5 3B Q4 | 4.2 | 4.5 | 4.0 | 4.3 | 4.25 |
| Qwen 2.5 7B Q4 | 4.5 | 4.6 | 4.3 | 4.4 | 4.45 |
| Gemma 3 4B Q4 | 3.8 | 4.0 | 3.7 | 3.9 | 3.85 |
| Gemma 3 12B Q4 | 4.6 | 4.7 | 4.5 | 4.5 | 4.58 |
| Gemma 3 27B Q4 | 4.8 | 4.8 | 4.7 | 4.6 | 4.73 |
| Gemma 4 E2B | 3.9 | 4.1 | 3.8 | 4.0 | 3.95 |
| Gemma 4 E4B | 4.4 | 4.5 | 4.2 | 4.4 | 4.38 |
Observation 1 Qwen 2.5 3B beats Gemma 3 4B on Korean naturalness, suggesting Qwen weighted more Korean during training.
Observation 2 Gemma 3 12B and Gemma 4 E4B land near each other on Korean quality. E4B adds multimodal (photo replies) as a kicker.
Observation 3 Gemma 3 27B is nearly human, but needs 32GB+ RAM and ideally a GPU.
Per-model strengths and weaknesses
Why Replyer set Qwen 2.5 3B as the default
From R66 (v0.12.7) the default shifted from Gemma 4 E4B to Qwen 2.5 3B. Four reasons:
- Compatibility Gemma 4 needs llama.cpp b8746+. Older Windows prebuilt CPU wheels failed to load → auto-delete → re-download loop.
- Korean quality At 3B size, Qwen 2.5 beats Gemma 3 4B.
- Memory safety 2GB Q4 coexists with OS / browser / chat client on an 8GB machine.
- License Apache 2.0, free commercial use.
Upgrade path
Skipping (3B → 27B in one jump) is not recommended. Tone shifts slightly between models, and members may notice.
Frequently asked questions
Q. Anything besides Qwen and Gemma?
Llama 3 (Meta), Mistral, Yi-1.5 exist but trail Qwen 2.5 / Gemma 3 12B on Korean naturalness.
Q. Does Q4 quantization hurt quality vs Q8 / fp16?
Marginally. Q4 → Q8 raises Korean naturalness 0.1~0.2 (out of 5), but doubles size. Stick with Q4 unless you have ample RAM.
Q. 7B vs 12B, which gives better value?
Gemma 3 12B beats Qwen 2.5 7B (4.58 vs 4.45) but uses more RAM (7GB vs 4GB) and is slower (5~8s vs 4~6s). 8GB free → Gemma 3 12B. 4GB free → Qwen 2.5 7B.
Q. Will tone shift when switching models?
Slightly. After switching, regenerate 5~10 test replies in the Sandbox and re-evaluate.
Q. Are photo replies necessary?
Depends on the chatroom. Daily chat / info replies → text-only (Qwen 2.5 3B). Fashion / food / travel / product review → Gemma 4 E4B Vision.
Q. Can I switch Replyer's default model myself?
Yes. Settings → model picker, click another preset. Auto-download then auto-restart. Explicit picking flips the auto-tune flag off. More in no-GPU laptop guide.
Next steps
To start auto-replies in your chatroom, download Replyer for your OS and follow the usage manual for the step-by-step guide.