Skip to content

LangChain Integration Guide

LangChain is a popular Python framework for building LLM-powered applications — including RAG pipelines, agents, and chatbots. This guide shows how to connect LangChain to Lemonade Server as a fully local, offline alternative to OpenAI.


Prerequisites


Setup (Under 5 Minutes)

Step 1 — Install LangChain

pip install langchain langchain-openai

Step 2 — Configure LangChain to use Lemonade Server

LangChain supports any OpenAI-compatible backend via ChatOpenAI. Point it to Lemonade's local server:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:13305/api/v1",
    api_key="lemonade",           # required by LangChain but unused by Lemonade
    model="Llama-3.2-3B-Instruct-Hybrid",  # any model you have pulled
)

Step 3 — Send your first message

from langchain_core.messages import HumanMessage

response = llm.invoke([HumanMessage(content="What is the capital of France?")])
print(response.content)
# Paris

Example 1 — Simple Chat

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(
    base_url="http://localhost:13305/api/v1",
    api_key="lemonade",
    model="Llama-3.2-3B-Instruct-Hybrid",
)

messages = [
    SystemMessage(content="You are a helpful assistant. Be concise."),
    HumanMessage(content="Explain what a vector database is in one sentence."),
]

response = llm.invoke(messages)
print(response.content)

Example 2 — RAG Pipeline (Chat with Your Documents)

This example builds a full Retrieval-Augmented Generation pipeline using Lemonade as the LLM backend — fully local and offline.

Additional prerequisite: pull the embedding model before running:

lemonade pull nomic-embed-text-v1-GGUF
pip install langchain langchain-openai langchain-community langchain-chroma langchain-text-splitters pypdf
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# ── Connect to Lemonade ──────────────────────────────────
LEMONADE_BASE_URL   = "http://localhost:13305/api/v1"
LEMONADE_API_KEY    = "lemonade"
MODEL_NAME          = "Llama-3.2-3B-Instruct-Hybrid"
EMBEDDING_MODEL     = "nomic-embed-text-v1-GGUF"

llm = ChatOpenAI(
    base_url=LEMONADE_BASE_URL,
    api_key=LEMONADE_API_KEY,
    model=MODEL_NAME,
)

# Requires the embedding model to be pulled first:
#   lemonade pull nomic-embed-text-v1-GGUF
# check_embedding_ctx_length=False disables LangChain's OpenAI-specific
# tokenizer check, which fails against non-OpenAI providers.
embeddings = OpenAIEmbeddings(
    base_url=LEMONADE_BASE_URL,
    api_key=LEMONADE_API_KEY,
    model=EMBEDDING_MODEL,
    check_embedding_ctx_length=False,
)

# ── Load and chunk your PDF ──────────────────────────────
loader = PyPDFLoader("your_document.pdf")
docs   = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks   = splitter.split_documents(docs)

# ── Store in ChromaDB ────────────────────────────────────
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever   = vectorstore.as_retriever(search_kwargs={"k": 3})

# ── Build RAG chain ──────────────────────────────────────
prompt = PromptTemplate.from_template("""
Answer the question using ONLY the context below.
If unsure, say "I don't know based on this document."

Context: {context}
Question: {question}
Answer:
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# ── Ask questions ────────────────────────────────────────
answer = rag_chain.invoke("What is the main topic of this document?")
print(answer)

Example 3 — Prompt Template + Chain

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(
    base_url="http://localhost:13305/api/v1",
    api_key="lemonade",
    model="Llama-3.2-3B-Instruct-Hybrid",
)

prompt = ChatPromptTemplate.from_template(
    "Summarize the following text in 3 bullet points:\n\n{text}"
)

chain = prompt | llm | StrOutputParser()

result = chain.invoke({"text": "Your long text goes here..."})
print(result)

Troubleshooting

Problem Fix
Connection refused Make sure Lemonade Server is running (lemonade status)
Model not found Pull the model first: lemonade pull MODEL_NAME
Embeddings not working Pull the model first: lemonade pull nomic-embed-text-v1-GGUF. Make sure model= and check_embedding_ctx_length=False are set in OpenAIEmbeddings
Slow responses Use a smaller model (e.g. 3B instead of 7B)

Why Use Lemonade with LangChain?

  • 100% offline — no API keys, no internet required after setup
  • Drop-in replacement — change one URL to switch from OpenAI to local
  • Full LangChain ecosystem — chains, prompt templates, and RAG pipelines work out of the box
  • Privacy — your documents never leave your machine

Resources