Small local agents¶

Experimenting with agents does not require access to external services, expensive hardware, or even the cloud. This short tutorial notebook is a introduction to language model-based agents using small language models, and demonstrating their use to query a database.

This is a repository on Github: https://github.com/lgautier/local-small-language-agent

Setup¶

Local Ollama server¶

The language models will we use will be served by Ollama. The default instructions use podman to run that Ollama server in a rootless container.

If podman cannot be available, or using an isolated system where running ollama outside of a container is not an issue, alternative instructions follow.

This assumes that podman is installed. If not yet the case, the instructions are here: https://podman.io/docs/installation

In [1]:
! podman run --replace -d --rm -p 11434:11434 -v dot_ollama:/root/.ollama --name ollama docker.io/ollama/ollama
ed86c9e2787467f1ee8c4c1e8996388bd48bbc189d8792e8e6be7800d06ddccc

If on a debian-based system where sudo-ing an installation of ollama is possible, for example on Google Colab, the following instructions can be used. Create a new jupyter cell with the code below, or replace the podman cell above with it.

# Derived from the the Google Cloud doc:
# https://medium.com/google-cloud/gemma-3-ollama-on-colab-a-developers-quickstart-7bbf93ab8fef
! echo $(tput setaf 1)"Installing APT packages."$(tput setaf 0)
! sudo apt-get -qq update && sudo apt-get -qq install -y pciutils lshw
! echo $(tput setaf 1)"Fetching Ollama."$(tput setaf 0)
! curl -fsSL https://ollama.com/install.sh | sh
! echo $(tput setaf 1)"Starting Ollama server."$(tput setaf 0)
! nohup ollama serve > ollama.log 2>&1 &

Required Python packages¶

The Python packages ollama and langchain, and langchain_ollama are required. Make sure they are installed before continuing.

In [2]:
import dataclasses
import langchain
import langchain_ollama
import ollama

Utility code¶

In order to minimize code duplication, we encapsulate most of Ollama model management into a class.

In [3]:
@dataclasses.dataclass(frozen=True)
class OllamaModel:
    """Wrapper class for models with Ollama and LangChain."""

    name: str

    def __str__(self) -> str:
        return self.name

    def has_tools(self) -> bool:
        return 'tools' in self.show().capabilities

    def is_pulled(self) -> bool:
        for mdl in ollama.list().models:
            if mdl.model == self.name:
                return True
        return False

    def pull(self):
        """Pull the model to make it available to the local Ollama instance."""
        ollama.pull(self.name)

    def show(self) -> ollama._types.ShowResponse:
        return ollama.show(self.name)

    def new_chat(self, temperature:float = 0) -> langchain_ollama.ChatOllama:
        res = langchain_ollama.ChatOllama(
            model=str(self),
            temperature=temperature
        )
        return res

Object display customization¶

We also create a convenience custom HTML render for AI responses.

In [4]:
import jinja2

_aimessage_template = jinja2.Template(    
    """<table>
    <thead><tr><td colspan="2">AI message</td></tr></thead>
    <tbody>
    <tr>
      {% if response.content %}
      <td style="vertical-align: top; border-right: 2px solid rgb(40, 40, 150); width: 6em;"><b>response:</b></td>
      <td style="text-align: left; text-indent: 0px;">
        <div style="white-space: pre-wrap;">
        {{ response.content }}
        </div>
      </td>
      {% endif %}
      {% if response.tool_calls %}
      <td style="vertical-align: top; border-right: 2px solid rgb(150, 40, 40); width: 6em;"><b>tools:</b></td>
      <td style="text-align: left; text-indent: 0px;">
        <div style="white-space: pre-wrap;">
        {{ response.tool_calls }}
        </div>
      </td>
      {% endif %}
  </tr></tbody></table>
  """)


def aimessage_html(obj):
    return _aimessage_template.render(response=obj)
    
html_formatter = get_ipython().display_formatter.formatters['text/html']
html_formatter.for_type_by_name('langchain_core.messages.ai', 'AIMessage', aimessage_html)
In [5]:
def toolmessage_html(obj):
    _toolmessage_template = jinja2.Template(    
        """<table>
        <thead><tr><td colspan="2">Tool message</td></tr></thead>
        <tbody>
        <tr>
          {% if response.content %}
          <td style="vertical-align: top; border-right: 2px solid rgb(40, 40, 150); width: 6em;"><b>response:</b></td>
          <td style="text-align: left; text-indent: 0px;">
            <div style="white-space: pre-wrap;">
            {{ response.content }}
            </div>
          </td>
          {% endif %}
        </tr>
        </tbody></table>
        """)
    return _toolmessage_template.render(response=obj)


html_formatter = get_ipython().display_formatter.formatters['text/html']
html_formatter.for_type_by_name('langchain_core.messages.tool', 'ToolMessage', toolmessage_html)
In [6]:
def highlight(text):
    return f'\n\033[1m{text}\033[0m'

List of language models¶

Below is the list all models you want to use here. Be mindful about the capabilities of the system this will run on, in particular the RAM or the VRAM is a GPU is present.

This notebook was created with gemma3:4b (Gemma3 with 4B parameters) and granite4:1b (Granite4 with 1B parameters).

In [7]:
MODELS = (
    # OllamaModel('deepseek-r1:1.5b'),
    OllamaModel('gemma3:4b'),
    OllamaModel('functiongemma:270m'),
    OllamaModel('granite4:1b'),
    # OllamaModel('phi4-mini:3.8b'),
    # OllamaModel('qwen3:4b')
)

We are ready to start. First, we ensure that the models we need are pulled locally.

In [8]:
for mdl in MODELS:
    if mdl.is_pulled():
        print(f'Model {mdl} already pulled.')
    else:
        print(f'Pulling model {mdl}...', end='', flush=True)
        mdl.pull()
        print('done.')
Model gemma3:4b already pulled.
Pulling model functiongemma:270m...done.
Model granite4:1b already pulled.

Language models and their limitations¶

We'll start a simple question that demonstrate that LLMs are eerily good at the pattern matching part of understanding, but might not be able to go beyond that.

In [9]:
from langchain_core.messages.human import HumanMessage
from langchain_core.messages.system import SystemMessage
from langchain_core.tools import tool
In [10]:
human_message = HumanMessage('Make the following calculation: 392373366237 * 562310951975')
for mdl in MODELS:
    print(f'\n\033[1m{str(mdl)}\033[0m\n')
    chat = mdl.new_chat()
    response = chat.invoke([human_message])
    display(response)
gemma3:4b

AI message
response:
Okay, let's calculate 392373366237 * 562310951975. This is a large multiplication, and it's best done using a calculator or a programming language. Here's the result: **2236358838888888888888888888888888888888
functiongemma:270m

AI message
response:
I apologize, but I cannot perform this calculation. The calculation tool I have access to is specialized for calculating mathematical operations related to financial calculations, such as calculating stock market valuations or stock market indices. I cannot access or perform complex financial data like this.
granite4:1b

AI message
response:
The result of multiplying 392373366237 by 562310951975 is: 220,646,204,278,964,000,795

Is this correct? The actual answer is:

In [11]:
print(f'{392373366237 * 562310951975:,}')
220,635,841,098,362,793,468,075

Unless this notebook is running in a future when LLMs have learned to multiply, or this notebook became part of the training data, the LLM answer is either likely not correct OR the LLM declines to provide an answer to the question (which is the case with functiongemma).

One should not rely on the current LLM technology to perform correctly mathematical operations, or more generally to evaluate expressions or run algorithms. However, LLMs have seen training data that describes steps to resolve similar problems. They might be able to generate code (software instructions), for example a function call, to get to the answer. This can already be observed when using gemma3:4b: the answer from the LLM shows includes Python code that returns the correct answer is evaluated.

If we have a way to know when LLM answers contains code, we could simply evaluate that code and consider the resulting text its answer a dialogue. In fact, it does not even have to be a dialogue and code evaluation represent simply an action.

Additional instructions in the prompt can get us there with this example. We can add to it that if evaluating code seems better path to the correct answer, only code should be returned. To facilitate the triaging of answers (answer is text / the direct answer vs answer is code to get to the answer), we can also indicate the format to use.

In [12]:
answer_as_code = HumanMessage("""
IF this type of question would be better answered by evaluating code, return instead
ONLY Python code in the answer. Use the Markdown format for code snippets:
```python
<code>
```
""")
human_message = HumanMessage("""Make the following calculation: 392373366237 * 562310951975""")
for mdl in MODELS:
    print(highlight(str(mdl)))
    chat = mdl.new_chat()
    response = chat.invoke([human_message, answer_as_code])
    display(response)
gemma3:4b
AI message
response:
```python print(392373366237 * 562310951975) ```
functiongemma:270m
AI message
response:
I apologize, but I cannot assist with generating Python code. My current capabilities are limited to assisting with mathematical calculations and data analysis using the provided tools. I cannot generate programming code.
granite4:1b
AI message
response:
```python result = 392373366237 * 562310951975 result ```

This is looking promising, except for functiongemma that is adamant about being unabe to generate code.

A good check to perform with LLMs is whether a new prompt causes regression (that is loose the ability to answer some questions correctly). To achieve this we ask a question that should be answerable without code. promptThat is a question for which the answer was almost certainly in the training data for the model.

In [13]:
human_message = HumanMessage("""What is the name of the natural satellite orbiting around the earth?""")
for mdl in MODELS:
    print(highlight(str(mdl)))
    chat = mdl.new_chat()
    response = chat.invoke([human_message, answer_as_code])
    display(response)
gemma3:4b
AI message
response:
```python print("The name of the natural satellite orbiting around the Earth is the Moon.") ```
functiongemma:270m
AI message
response:
I apologize, but I cannot assist with programming or generating Python code. My current capabilities are limited to assisting with tasks related to natural language processing and text generation. I cannot provide programming advice or code snippets.
granite4:1b
AI message
response:
The name of the natural satellite orbiting around the Earth is called **the Moon**.

While gemma3:4b its technically correct, it may indicate that this model will resort to writing code more often than necessary. functiongemma is consistent in messaging that it cannot generate code.

Adding tools as context¶

Agents are language models to which the availability of tools (software functions), with a description of what they do, is specified as context. For gemma3:4b this can be achieved through a system prompt, while a few other models with a "tools" capability Python functions can passed directly. This is the case with granite4:1b for example.

Tools in prompt¶

We showed earlier that a prompt can indicate that code evaluation is available. Similarly, we can indicate that tools are available and calling a tool can be used as an answer.

If we make our multiplication a tool, a prompt can look like follows:

In [14]:
system_prompt_tools = """
You have access to functions. 
You don't have to use tools if not necessary, but *IF* you decide to invoke any of the function(s),
you MUST put it in a valid JSON expression like:
[
  {"name": function name,
   "parameters": dictionary of argument name and its value}
]

You SHOULD NOT include any other text in the response if you call a function.

Functions:
[
  { "name": "multiply",
    "description": "Multiply two numbers.",
    "parameters": {
      "type": "object",
      "properties": {
          "a": "number",
          "b": "number"
      },
      "required": [
          "a", "b"
      ]
    }
  }
]
"""

Tools as tools¶

The use of tools has been somewhat formalized since the early days of prompt engineering, a mere double-digit number of months in the past, we have seen so far.

Many of the most recent language model specifically include a capability called "tools". This frees a user, or developer, from having to work on prompt sections that declare the availablity of tools in a way that the language model reacts to it, and on parsing out if an answer is text or code. langchain provides a wrapper for this and let one simply declare any Python function as tool. For the multiplication we can just plug the Python multiplication operator.

In [15]:
import operator
tools = [tool(operator.mul)]

Language model opting to answer with a tool call¶

We have shown that a prompt context, wrapped in a langchain tool declaration when the model has the capability, can lead a language model to answer with code. If we ask again our multiplication question:

In [16]:
human_message = HumanMessage("""Make the following calculation: 392373366237 * 562310951975""")

for model in MODELS:
    chat = model.new_chat()
    if model.has_tools():
        print(f'{highlight(model.name)} (with tools)\n')
        chat = chat.bind_tools(tools)
        messages = [human_message]
    else:
        print(f'{highlight(model.name)}\n')
        messages = [SystemMessage(system_prompt_tools),
                    human_message]
    response = chat.invoke(messages)
    display(response)
gemma3:4b

AI message
response:
```json [ { "name": "multiply", "parameters": { "a": 392373366237, "b": 562310951975 } } ] ```
functiongemma:270m (with tools)

AI message
tools:
[{'name': 'mul', 'args': {'a': 392373366237, 'b': 562310951975}, 'id': '7c9769d1-8531-4868-866f-3d00eb598097', 'type': 'tool_call'}]
granite4:1b (with tools)

AI message
tools:
[{'name': 'mul', 'args': {'a': '392373366237', 'b': '562310951975'}, 'id': '62be38a6-3878-4ad3-839d-31093d5d8d3e', 'type': 'tool_call'}]

All language models understand the assignment and should provide correct function calls. Interestingly, creating a function call is creating code which functiongemma is otherwise convinced it is unable to do.

Agentic linear chaining - an example¶

Database¶

We create a local database for demonstration purposes.

In [17]:
import sqlite3

dbcon = sqlite3.connect(':memory:', check_same_thread=False)
dbcon.execute("""
CREATE TABLE medication(
ndc STRING PRIMARY KEY,
common_name STRING,
quantity INTEGER
);
""")
for values in (
    ('ABCD-DEF', 'Aspirin', 3), ('ABCD-GHI', 'Aspirin', 5), ('GHIK-KLM', 'bacitracin', 10)
):
    dbcon.execute("""
    INSERT INTO medication(ndc, common_name, quantity) VALUES (?,?,?);
    """, values)

Tools to query a database¶

Functions to query the database:

  • get_ndc_code to get all NDC codes for the common name of a medication
  • get_medication_stock to get the stock for a given NDC code
In [18]:
def get_ndc_code(medication: str) -> tuple[str, ...]:
    """Get possible ndc codes for a medication.
    
    Args:
      medication: name of the medication for which NDC code should be returned.
    """
    cursor = dbcon.cursor()
    cursor.execute('SELECT ndc FROM medication WHERE common_name==?',
                   (medication, ))
    return tuple(row[0] for row in cursor.fetchall())


def get_ndc_stock(ndc_code: str) -> int:
    """Get the number of boxes of medication in stock.
    
    Args:
      ndc_code: NDC code to retrieve stock for.
    """
    cursor = dbcon.cursor()
    cursor.execute('SELECT quantity FROM medication WHERE ndc==?',
                   (ndc_code, ))
    return cursor.fetchone()[0]

Prompting an LLM to use tools (when needed)¶

The prompt may contain several tools available. A prompt context will be similar to the one we showed for the multiplication.

In [19]:
system_prompt_tools = """
You have access to functions. 
You don't have to use tools if not necessary, but *IF* you decide to invoke any of the function(s),
you MUST put it in a valid JSON expression like:
[
  {"name": function name,
   "parameters": dictionary of argument name and its value}
]

You SHOULD NOT include any other text in the response if you call a function.

Functions:
[
  {
    "name": "get_ndc_stock",
    "description": "Get the number of boxes of medication in stock.",
    "parameters": {
      "type": "object",
      "properties": {
        "ndc_code": {
          "type": "string"
        }
      },
      "required": [
        "ndc_code"
      ]
    }
  },
  {
    "name": "get_ndc_code",
    "description": "Get possible ndc codes for a medication.",
    "parameters": {
      "type": "object",
      "properties": {
        "medication": {
          "type": "string"
        }
      },
      "required": [
        "medication"
      ]
    },
  }
]
"""

Whenever a model has the capabilities "tools", the user or developer is spared the prompt context engineering. The decorator tool() in langchain can be used instead.

Note: langchain is essentially an abstraction layer over prompts. It can create the equivalent of our engineered prompt contexts from Python function definitions, feeding the LLM with information in the docstring, or in additional declarative structures (see https://docs.langchain.com/oss/python/langchain/tools#advanced-schema-definition).

In [20]:
tools = [tool(get_ndc_code), tool(get_ndc_stock)]
In [21]:
def invoke_model(model, human_message, system_message=SystemMessage('')):
    chat = model.new_chat()
    if model.has_tools():
        print(f'\n{highlight(model.name)} (with tools)\n')
        chat = chat.bind_tools(tools)
    else:
        print(f'\n{highlight(model.name)}\n')
    messages = [system_message,
                human_message]
    response = chat.invoke(messages)
    return response, chat

human_message = HumanMessage('Name the first month of the year.')

for model in MODELS:
    response, chat = invoke_model(model, human_message)
    display(response)

gemma3:4b

AI message
response:
January! Do you want to play a quick game of guessing months? 😊

functiongemma:270m (with tools)

AI message
response:
I apologize, but I cannot assist with retrieving the specific NDC code for the first month of the year. My current tools are designed for retrieving medication-related data and stock information. I cannot query or retrieve specific calendar or stock data for specific dates.

granite4:1b (with tools)

AI message
response:
The first month of the year is **January**.

functiongemma:270 appears challenged with the question. Its size (just over 1/4th of granite4:1b) might be the issue.

In [22]:
tool_map = {tool.name: tool for tool in tools}
if len(tool_map) != len(tools):
    raise ValueError("'tools' contains at lease one tool name duplicate.")
In [23]:
human_message = HumanMessage('How many boxes of medication with NDC code ABCD-DEF do we have left?')

for model in MODELS:
    response, chat = invoke_model(model, human_message, SystemMessage(system_prompt_tools))
    display(response)
    if response.tool_calls:
        for tool_call in response.tool_calls:
            print(
                f'  --> calling {tool_call["name"]}'
                f'({", ".join("=".join((k, repr(v))) for k, v in tool_call["args"].items())})'
            )
            selected_tool = tool_map[tool_call['name']]
            res = selected_tool.invoke(tool_call)
            print('      result:')
            print(f'        {res.content}')
    else:
        print('Assessing automatically if there is a tool call is left as an exercise for the reader.')

gemma3:4b

AI message
response:
```json [ { "name": "get_ndc_stock", "parameters": { "ndc_code": "ABCD-DEF" } } ] ```
Assessing automatically if there is a tool call is left as an exercise for the reader.


functiongemma:270m (with tools)

AI message
tools:
[{'name': 'get_ndc_stock', 'args': {'ndc_code': 'ABCD-DEF'}, 'id': 'a63df468-8c35-4d1c-ba01-f07c58b4a29a', 'type': 'tool_call'}]
  --> calling get_ndc_stock(ndc_code='ABCD-DEF')
      result:
        3


granite4:1b (with tools)

AI message
tools:
[{'name': 'get_ndc_stock', 'args': {'ndc_code': 'ABCD-DEF'}, 'id': 'fe96a455-02c4-4f10-ab45-e64a4d3ff059', 'type': 'tool_call'}]
  --> calling get_ndc_stock(ndc_code='ABCD-DEF')
      result:
        3
In [24]:
human_message = HumanMessage("What are possible NDC codes for the medication 'Aspirin'?")

for model in MODELS:
    response, chat = invoke_model(model, human_message, SystemMessage(system_prompt_tools))
    display(response)
    if response.tool_calls:
        print('      result:')
        for tool_call in response.tool_calls:
            selected_tool = tool_map[tool_call['name']]
            res = selected_tool.invoke(tool_call)    
            print(f'        {res.content}')
    else:
        print('Assessing automatically if there is a tool call is left as an exercise for the reader.')

gemma3:4b

AI message
response:
```json [ { "name": "get_ndc_code", "parameters": { "medication": "Aspirin" } } ] ```
Assessing automatically if there is a tool call is left as an exercise for the reader.


functiongemma:270m (with tools)

AI message
tools:
[{'name': 'get_ndc_code', 'args': {'medication': 'Aspirin'}, 'id': '5d987bf2-cc49-4058-b0d7-08196877a7d3', 'type': 'tool_call'}]
      result:
        ["ABCD-DEF", "ABCD-GHI"]


granite4:1b (with tools)

AI message
tools:
[{'name': 'get_ndc_code', 'args': {'medication': 'Aspirin'}, 'id': '65211877-5146-4db6-acfd-6c658edb630b', 'type': 'tool_call'}]
      result:
        ["ABCD-DEF", "ABCD-GHI"]

Cool. But what did we exactly gain here? If you are are programmer or a data analyst knowing how to write SQL, arguably not much. However, what we have achieved is akin to a text user interface, or language user interface, that can be an alternative to a graphical user interface.

Reasoning in steps¶

So far we only showed questions for which either the language model had the answer memorized from the data it was trained on, or for which the answer was one tool call away.

There are questions questions for which the answer requires a sequence of recall from training data or tool calls, with the output of a step used as the input to the next step. For example, asking for how many Aspirin boxes are in stock, would require to 1) get the NDC codes for Aspirin products, 2) query the stock for each, and finally 3) sum those values.

In [25]:
human_message = HumanMessage('How many boxes of Aspirin do we have in stock altogether?')

for model in MODELS:
    response, chat = invoke_model(model, human_message, SystemMessage(system_prompt_tools))
    display(response)

gemma3:4b

AI message
response:
```json [ { "name": "get_ndc_stock", "parameters": { "ndc_code": "98304-600" } } ] ```

functiongemma:270m (with tools)

AI message
tools:
[{'name': 'get_ndc_stock', 'args': {'ndc_code': 'Aspirin'}, 'id': '7d0683fa-c668-4640-86c8-2b74bc4f273f', 'type': 'tool_call'}]

granite4:1b (with tools)

AI message
tools:
[{'name': 'get_ndc_code', 'args': {'medication': 'Aspirin'}, 'id': 'bf98a6fe-bc7f-4027-ae38-1a8e8a9b8c2e', 'type': 'tool_call'}]

Odds are are that the results are not great. Bare language models used in this notebook generally do not handle this well without guidance from a prompt. For example,

ReAct¶

The ReAct approach prompts a language model to match its answers to the following pattern:

  1. question
  2. thought: the language model's answer to the question. The answer can be the use of a tool.
  3. action: whenever the answer is the use of a tool, a function call, this is selected here.
  4. action input: the input to the action.
  5. observation: observe the result of performing the action.
  6. The sequence of steps 2 to 5 can be repeated N times
  7. final answer

It is derived from the empirical observation that LLMs appeared to give more "thoughtful" answers, or better solve reasoning-oriented questions when asked to think step by step.

React can be idependently achieved through adding the description to a prompt and iteratively parsing language model responses and crafting new messages. However, in many scenario this is can be abstracted out through the use of library. We use langchain again to demonstrate how it can work:

In [29]:
from langchain.agents import create_agent

human_message = HumanMessage("How many boxes of Aspirin do we have in stock altogether?")

for model in MODELS:
    print(highlight(model.name))
    if model.has_tools():
        react = create_agent(model.new_chat(), tools=tools)
        r_stream = react.stream({'messages': [human_message]}, stream_mode='updates')
        try:
            for chunk in r_stream:
                for k in chunk.keys():
                    for m in chunk[k]['messages']:
                        display(m)
        except TypeError as err:
            print('Error:')
            print(err)
    else:
        print('Parsing AI messages into next actions and results is left as an exercise.')
gemma3:4b
Parsing AI messages into next actions and results is left as an exercise.

functiongemma:270m
AI message
tools:
[{'name': 'get_ndc_stock', 'args': {'ndc_code': 'Aspirin'}, 'id': '0d55848c-3003-408c-846e-2ccb2238bc38', 'type': 'tool_call'}]
Error:
'NoneType' object is not subscriptable

granite4:1b
AI message
tools:
[{'name': 'get_ndc_code', 'args': {'medication': 'Aspirin'}, 'id': '89afdd0e-92bf-4d6e-9270-10ec8dfb20c9', 'type': 'tool_call'}]
Tool message
response:
["ABCD-DEF", "ABCD-GHI"]
AI message
tools:
[{'name': 'get_ndc_stock', 'args': {'ndc_code': 'ABCD-DEF'}, 'id': '6886023e-3e2e-4ea3-a0f4-5e0a4e1b0c6a', 'type': 'tool_call'}]
Tool message
response:
3
AI message
tools:
[{'name': 'get_ndc_stock', 'args': {'ndc_code': 'ABCD-GHI'}, 'id': '7e28b8c2-e435-4474-86ef-e55bc82bc785', 'type': 'tool_call'}]
Tool message
response:
5
AI message
response:
The total number of boxes in stock for Aspirin is **8**.

When calling tools works without error, the answer is indeed correct.

Conclusion¶

Small language models are able to handle simple queries and chain sequences of steps to reach an answer. They can run locally / isolated and on relatively modest hardware.