Continue provides open-source Integrated Development Environment (IDE) extensions, such as for Visual Studio Code and JetBrains, and an open-source CLI that lets developers leverage custom AI coding agents.
This guide walks through how to use Lemonade Server with the Continue VS Code extension for code generation, editing, and chat capabilities, all running locally on your AMD PC.
Before you start, make sure you have the following:
http://localhost:13305. If you change the port in Lemonade Server (e.g., to 8020, 8040, etc.), you’ll need to update the API Base URL in Continue’s configuration to match the same port.For best results, a code-tuned model with at least 20B parameters is required. To run such a model:
Use the Model Manager or lemonade CLI to download your desired model, for example:
lemonade pull <model-name>
Example downloading Qwen3-Coder:
lemonade pull Qwen3-Coder-30B-A3B-Instruct-GGUF
Start Lemonade Server: Ensure Lemonade Server is running at http://localhost:13305. The server starts automatically after installation. You can verify with lemonade status.
Add “Continue”: Type “Continue” in the search box. Click “Install” on the Continue extension entry.
Example marketplace screen:

Add Lemonade Server Provider: Click the model dropdown menu in the Continue sidebar, then select “Add Chat Model”. Choose “Lemonade Server” from the list of available providers. Continue will set the default address to http://localhost:13305, but it can be changed to match a different setup.
Example configuration screen:

Select Your Model: Once Lemonade Server is added, use the drop-down menu to select the model you downloaded earlier (e.g., Qwen3-Coder-30B-A3B-Instruct-GGUF).
Example model selection:

Continue provides three interaction modes for different development tasks:

See the Continue Documentation for detailed descriptions.
In this example, we’ll use Qwen3-Coder-30B-A3B-Instruct-GGUF model to build a Python game.
Input: I want to create an asteroids game using PyGame. What guidelines should I follow in the code to do so?

The model provides a basic framework for an Asteroids game. You can then prompt it to provide you some sample code to get started.
Input: Provide me a basic implementation to get started.

In the top-right corner, you can click the “Create file” to move the code from the chat window to a Python file and save it. To run, install pygame and execute the code with python main.py.
In this example, we’ll use Plan mode to have the LLM analyze your code and provide feedback. Plan mode reviews your code and suggests improvements, but does not modify your files.
To use Plan mode with large files, increase Lemonade Server’s context size:
Load the model with a higher context size: Open a terminal and run:
lemonade load <model-name> --ctx-size 8192
For persistent changes, see Server Configuration.
Use Plan mode in VS Code: Select the “Plan” option in Continue, enter your prompt and press Alt+Enter to include the currently active file as context.
Input: What improvements could be made to this game?

Lastly, we’ll use Agent Mode to take action to change the code to implement improvements.

Here, we can see that the agent edited the code in main.py to improve the gameplay and add colors.
lemonade pull <model-name> to install models you want to use. Refer to the supported models list for available options.Load. lemonade load <model-name> --ctx-size 8192
@ symbol to perform refactoring or changes across multiple files.Model not appearing in Continue
lemonade pull <model-name>
Slow response times
Missing error handling in generated code
Inconsistent code style