LangChain applications using Ollama

Like almost everyone else in the tech industry today, I've been looking into LLMs recently. I'm still fairly new to everything but I feel like I'm starting to get a general sense of how things work. What's inside an LLM is something I'm (at least at this point) still considering a black box. Regardless, it's fun to play around with these models and learn more about how they work and how software developers can start using these in their toolkits.

As part of this, I tried to write a simple LangChain program that lets me programmatically work with an LLM running on my machine. The code examples shown on the LangChain documentation default to using OpenAI's ChatGPT, but for whatever reason my OpenAI API keys keep getting rate limited. I didn't feel like shelling out $20 yet, and besides, I have an LLM running locally, so why not make use of it.

Fortunately, LangChain can work with Ollama. The following sections describe the steps I took to get everything working.

1. Install Ollama

The first thing to do is, of course, have an LLM running locally! We'll use Ollama to do this. On macOS, the easiest way is to use brew install ollama to install Ollama and brew services to keep it running.

~/W/l/llms main ❯ brew services start ollama 
==> Successfully started `ollama` (label: homebrew.mxcl.ollama)

At this point Ollama should be listening on port 11434 for incoming requests. You can open up http://localhost:11434/ in the browser to double check.

Next, browse through the Ollama library and choose which model you want to run locally. In this case we want to run llama2 so let's ask Ollama to make that happen. Run ollama pull llama2.

2. Install LangChain

The next step is to have a Python project with all the necessary dependencies installed.

Initialize a Python project somewhere on your machine, using whatever tools you use. I personally use poetry to manage Python projects. The following command should take care of installing all the dependencies.

~/W/l/llms main ❯ poetry add fastapi langchain langserve sse-starlette uvicorn

3. Write the code

Now that we have all the dependencies in place, let's focus on the code! Here's the code I used:

from typing import List

from fastapi import FastAPI
from langchain.llms import Ollama
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langserve import add_routes
import uvicorn

llama2 = Ollama(model="llama2")

template = PromptTemplate.from_template("Tell me a joke about {topic}.")

chain = template | llama2 | CommaSeparatedListOutputParser()

app = FastAPI(title="LangChain", version="1.0", description="The first server ever!")

add_routes(app, chain, path="/chain")

if __name__ == "__main__":
    uvicorn.run(app, host="localhost", port=9001)

As you can see, there are a few things going on in this code. We first build an Ollama model object with the model set to llama2. Next we write a simple prompt template with a parameter called topic. We'll later see how the user can pass a topic to get back a response from the LLM. Next, we initialize a chain using the LangChain Expression Language. Finally, we initialize a new FastAPI application and then use langserve.add_routes to mount the LangServe API routes.

Save this code to a file called main.py and run it using python main.py. This should start a FastAPI server containing the LangServe endpoint we just defined. On my machine I see the following message when the app starts:

~/W/l/llms main ❯ python chain.py
INFO:     Started server process [25442]
INFO:     Waiting for application startup.

 __          ___      .__   __.   _______      _______. _______ .______     ____    ____  _______ 
|  |        /   \     |  \ |  |  /  _____|    /       ||   ____||   _  \    \   \  /   / |   ____|
|  |       /  ^  \    |   \|  | |  |  __     |   (----`|  |__   |  |_)  |    \   \/   /  |  |__   
|  |      /  /_\  \   |  . `  | |  | |_ |     \   \    |   __|  |      /      \      /   |   __|  
|  `----./  _____  \  |  |\   | |  |__| | .----)   |   |  |____ |  |\  \----.  \    /    |  |____ 
|_______/__/     \__\ |__| \__|  \______| |_______/    |_______|| _| `._____|   \__/     |_______|

LANGSERVE: Playground for chain "/chain/" is live at:
LANGSERVE:  │
LANGSERVE:  └──> /chain/playground/
LANGSERVE:
LANGSERVE: See all available routes at /docs/

LANGSERVE: ⚠️ Using pydantic 2.5.1. OpenAPI docs for invoke, batch, stream, stream_log endpoints will not be generated. API endpoints and playground should work as expected. If you need to see the docs, you can downgrade to pydantic 1. For example, `pip install pydantic==1.10.13`. See https://github.com/tiangolo/fastapi/issues/10360 for details.

INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:9001 (Press CTRL+C to quit)

And that should be it! You can now visit http://localhost:9001/chain/playground/ to play around with the LLM interface you just built! Here's a screenshot of the LangServe Playground I see on my machine:

LangServe Playground

It works! And the cool thing is that there are no API keys or anything involved. All the code necessary for this application is running locally.