lemonade

Working Group: Auto-Tune

Overview

Lead: This working group is led by Michele Balistreri, whose handle is @bitgamma on GitHub and @mikkoph on Discord.

Background: Running a model well requires choosing the right backend and tuning performance parameters such as batch size, GPU layer offload, thread count, and context size. Today, users either guess, look up Reddit threads, or ask an LLM — approaches that are error-prone and often outdated due to knowledge cutoff. Lemonade already has a bench command that measures TTFT, TPS, and VRAM usage across backends and parameter combinations, and a recipe system that defines per-model configuration. What is missing is a mechanism to turn benchmark data into actionable, hardware-aware defaults that apply automatically.

Why: A user should install Lemonade and get good performance without needing to understand backend flags or run manual benchmarks. Hardware-aware defaults lower the barrier to entry and make Lemonade competitive with cloud-based alternatives where performance is abstracted away.

Goal: Enable Lemonade instances to self-optimize models and backends by detecting the machine’s hardware profile and applying community-validated performance parameters. The end state is that a user pulls a model, loads it, and Lemonade selects the best backend and tuning flags for their hardware — with the option to override or fine-tune manually.

Contributing

Please see the general contribution guidelines, then contact @mikkoph on Discord before getting started to discuss the roadmap.

Scope

This working group focuses on performance parameters (batch size, GPU layers, threads, context size, backend selection) that are determined by hardware characteristics. Quality parameters such as temperature, top_p, and chat template are model- or use-case-dependent and are explicitly out of scope.

Roadmap

Roadmap items are high-level objectives that may span multiple issues and PRs. Details can also be re-defined.

Phase 1: Archetype Detection & Profile Schema

Phase 2: Auto-Apply at Load Time

Phase 3: Profile Distribution

Phase 4: Community Data Collection & Curation

Phase 5: Adaptive Tuning