Jackrong fine-tunes Gemma 4 into Gemopus local models
Pseudonymous developer Jackrong released Gemopus, fine-tuned Claude Opus–style models from Google’s Gemma 4: a 26B Mixture-of-Experts that activates ~4B and a 4B edge model for phones and laptops.
Pseudonymous developer Jackrong released Gemopus, a pair of Claude Opus–style fine-tunes built on Google’s open-source Gemma 4. The family includes Gemopus-4-26B-A4B, a 26-billion-parameter Mixture of Experts that activates about 4 billion parameters during inference, and Gemopus-4-E4B, a 4-billion-parameter edge model designed to run on modern phones and thin laptops.
Both models are provided in GGUF format for use with LM Studio and llama.cpp. Jackrong published the full training code and a step-by-step fine-tuning guide on GitHub. The fine-tuning pipeline uses Unsloth and LoRA, the same tools employed in the developer’s earlier Qwopus project, and the materials are reproducible on platforms such as Colab.
Independent benchmarks and tests were published alongside the release. AI infrastructure engineer Kyle Hessling wrote on X that the 26B MoE handled one-shot requests over long contexts and ran quickly on VRAM-limited systems. Hessling reported that the E4B passed 14 core competence tests, cleared long-context tests at 30,000 and 60,000 tokens, and succeeded on 13 out of 13 needle-in-haystack retrieval probes, including a stretch retrieval test at one million tokens using YaRN 8× RoPE scaling.
The 26B variant supports a native context length of 131,000 tokens and extends to 524,000 tokens with YaRN. Hessling wrote on X that the 26B “crushed” retrieval tests out to the extended context. Benchmark details and logs are available on the model card and the project repository.
Jackrong published edge performance numbers showing the E4B at roughly 45–60 tokens per second on an iPhone 17 Pro Max and 90–120 tokens per second on a MacBook Air M3/M4 via MLX. The MoE design allows the 26B model to offload work on unified memory systems or GPUs with under 10 GB of VRAM.
The fine-tuning approach avoided inserting Claude-style chain-of-thought traces directly into Gemma’s weights. The model card states, “There is no need for excessive imagination or superstitious replication of the Claude-style chain of thought.” The developer focused the tuning on answer quality, structural clarity and conversational tone, citing research that copying a teacher model’s surface reasoning text produces imitation rather than transferred reasoning ability.
The release notes several caveats. Tool calling remains unreliable across the Gemma 4 series in llama.cpp and LM Studio, with failures, format mismatches and looping reported. Jackrong describes Gemopus as “an engineering exploration reference rather than a fully production-ready solution” and recommends the Qwopus 3.5 series for workflows that demand a more validated local model. The developer and other contributors also report that Gemma’s training dynamics show wider loss fluctuations and greater sensitivity to hyperparameters compared with some other bases.
A separate community project, Ornstein, is pursuing reasoning-specific improvements on the same 26B Gemma 4 base. Jackrong indicated work on a denser 31B Gemopus variant. Both Gemopus models are available for download and the public materials allow researchers and developers to reproduce the fine-tuning steps.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.








