Ollama has emerged as the go-to solution for running large language models (LLMs) locally, and its Python library (version 0.4.7 as of 2025) simplifies AI integration for developers. This tutorial will guide you through:
Ensure your system meets these minimum specs:
# Install Python library
pip install ollama
# Download base model
ollama pull llama3.2
Start with a simple Q&A implementation:
import ollama
response = ollama.chat(
model='llama3.2',
messages=[
{'role': 'system', 'content': 'You are a technical documentation expert'},
{'role': 'user', 'content': 'Explain gradient descent in simple terms'}
]
)
print(response['message']['content'])
Handle large outputs efficiently:
stream = ollama.chat(
model='mistral',
messages=[{'role': 'user', 'content': 'Describe quantum entanglement'}],
stream=True
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
Maintain conversation history:
chat_history = []
def ask(message):
chat_history.append({'role': 'user', 'content': message})
response = ollama.chat(model='llama3.2', messages=chat_history)
chat_history.append(response['message'])
return response
Control model behavior:
response = ollama.chat(
model='llama3.2',
messages=[...],
options={
'temperature': 0.7, # 0-1 scale
'num_ctx': 4096, # Context window
'repeat_penalty': 1.2
}
)
Robust implementation pattern:
try:
response = ollama.chat(model='unknown-model', messages=[...])
except ollama.ResponseError as e:
if e.status_code == 404:
print("Model not found - pulling from registry...")
ollama.pull('unknown-model')
else:
raise
For high-performance applications:
import asyncio
from ollama import AsyncClient
async def main():
client = AsyncClient()
response = await client.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': 'Explain blockchain'}]
)
print(response['message']['content'])
asyncio.run(main())
temperature=0
for code generation tasksstream=True
for responses >100 tokensollama pull
The Ollama Python library enables developers to harness cutting-edge AI while maintaining full data control. With its simple API and local execution model, it's ideal for:
For advanced implementations, explore the official GitHub repo and Ollama documentation.
Category: GenAI