Skip to content

Local models

Local models

  • This guide shows how to set up local models.
  • You should already be familiar with the quickstart guide.
  • You should also quickly skim the configuration guide to understand the global configuration and configuration files.

Examples

  • Issue #303 has several examples of how to use local models.
  • We also welcome concrete examples of how to use local models per pull request into this guide.

Using litellm

Currently, all models are supported via litellm (but if you have specific needs, we're open to add more specific model classes in the `models submodule).

If you use local models, you most likely need to add some extra keywords to the litellm call. This is done with the model_kwargs dictionary which is directly passed to litellm.completion.

In other words, this is how we invoke litellm:

litellm.completion(
    model=model_name,
    messages=messages,
    **model_kwargs
)
Complete model class
import json
import logging
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any

import litellm
from tenacity import (
    before_sleep_log,
    retry,
    retry_if_not_exception_type,
    stop_after_attempt,
    wait_exponential,
)

from minisweagent.models import GLOBAL_MODEL_STATS

logger = logging.getLogger("litellm_model")


@dataclass
class LitellmModelConfig:
    model_name: str
    model_kwargs: dict[str, Any] = field(default_factory=dict)
    litellm_model_registry: Path | str | None = os.getenv("LITELLM_MODEL_REGISTRY_PATH")


class LitellmModel:
    def __init__(self, **kwargs):
        self.config = LitellmModelConfig(**kwargs)
        self.cost = 0.0
        self.n_calls = 0
        if self.config.litellm_model_registry and Path(self.config.litellm_model_registry).is_file():
            litellm.utils.register_model(json.loads(Path(self.config.litellm_model_registry).read_text()))

    @retry(
        stop=stop_after_attempt(10),
        wait=wait_exponential(multiplier=1, min=4, max=60),
        before_sleep=before_sleep_log(logger, logging.WARNING),
        retry=retry_if_not_exception_type(
            (
                litellm.exceptions.UnsupportedParamsError,
                litellm.exceptions.NotFoundError,
                litellm.exceptions.PermissionDeniedError,
                litellm.exceptions.ContextWindowExceededError,
                litellm.exceptions.APIError,
                litellm.exceptions.AuthenticationError,
                KeyboardInterrupt,
            )
        ),
    )
    def _query(self, messages: list[dict[str, str]], **kwargs):
        try:
            return litellm.completion(
                model=self.config.model_name, messages=messages, **(self.config.model_kwargs | kwargs)
            )
        except litellm.exceptions.AuthenticationError as e:
            e.message += " You can permanently set your API key with `mini-extra config set KEY VALUE`."
            raise e

    def query(self, messages: list[dict[str, str]], **kwargs) -> dict:
        response = self._query(messages, **kwargs)
        cost = litellm.cost_calculator.completion_cost(response)
        self.n_calls += 1
        self.cost += cost
        GLOBAL_MODEL_STATS.add(cost)
        return {
            "content": response.choices[0].message.content or "",  # type: ignore
        }

You can set model_kwargs in an agent config file like the following one:

Default configuration file
agent:
  system_template: |
    You are a helpful assistant that can interact with a computer.

    Your response must contain exactly ONE bash code block with ONE command (or commands connected with && or ||).
    Include a THOUGHT section before your command where you explain your reasoning process.
    Format your response as shown in <format_example>.

    <format_example>
    Your reasoning and analysis here. Explain why you want to perform the action.

    ```bash
    your_command_here
    ```
    </format_example>

    Failure to follow these rules will cause your response to be rejected.
  instance_template: |
    Please solve this issue: {{task}}

    You can execute bash commands and edit files to implement the necessary changes.

    ## Recommended Workflow

    This workflows should be done step-by-step so that you can iterate on your changes and any possible problems.

    1. Analyze the codebase by finding and reading relevant files
    2. Create a script to reproduce the issue
    3. Edit the source code to resolve the issue
    4. Verify your fix works by running your script again
    5. Test edge cases to ensure your fix is robust
    6. Submit your changes and finish your work by issuing the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`.
       Do not combine it with any other command. <important>After this command, you cannot continue working on this task.</important>

    ## Important Rules

    1. Every response must contain exactly one action
    2. The action must be enclosed in triple backticks
    3. Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
       However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files

    <system_information>
    {{system}} {{release}} {{version}} {{machine}} {{processor}}
    </system_information>

    ## Formatting your response

    Here is an example of a correct response:

    <example_response>
    THOUGHT: I need to understand the structure of the repository first. Let me check what files are in the current directory to get a better understanding of the codebase.

    ```bash
    ls -la
    ```
    </example_response>

    ## Useful command examples

    ### Create a new file:

    ```bash
    cat <<'EOF' > newfile.py
    import numpy as np
    hello = "world"
    print(hello)
    EOF
    ```

    ### Edit files with sed:

    {%- if system == "Darwin" -%}
    <important>
    You are on MacOS. For all the below examples, you need to use `sed -i ''` instead of `sed -i`.
    </important>
    {%- endif -%}

    ```bash
    # Replace all occurrences
    sed -i 's/old_string/new_string/g' filename.py

    # Replace only first occurrence
    sed -i 's/old_string/new_string/' filename.py

    # Replace first occurrence on line 1
    sed -i '1s/old_string/new_string/' filename.py

    # Replace all occurrences in lines 1-10
    sed -i '1,10s/old_string/new_string/g' filename.py
    ```

    ### View file content:

    ```bash
    # View specific lines with numbers
    nl -ba filename.py | sed -n '10,20p'
    ```

    ### Any other command you want to run

    ```bash
    anything
    ```
  action_observation_template: |
    <returncode>{{output.returncode}}</returncode>
    {% if output.output | length < 10000 -%}
    <output>
    {{ output.output -}}
    </output>
    {%- else -%}
    <warning>
    The output of your last command was too long.
    Please try a different command that produces less output.
    If you're looking at a file you can try use head, tail or sed to view a smaller number of lines selectively.
    If you're using grep or find and it produced too much output, you can use a more selective search pattern.
    If you really need to see something from the full command's output, you can redirect output to a file and then search in that file.
    </warning>
    {%- set elided_chars = output.output | length - 10000 -%}
    <output_head>
    {{ output.output[:5000] }}
    </output_head>
    <elided_chars>
    {{ elided_chars }} characters elided
    </elided_chars>
    <output_tail>
    {{ output.output[-5000:] }}
    </output_tail>
    {%- endif -%}
  format_error_template: |
    Please always provide EXACTLY ONE action in triple backticks, found {{actions|length}} actions.
    If you want to end the task, please issue the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`
    without any other command.
    Else, please format your response exactly as follows:

    <response_example>
    Here are some thoughts about why you want to perform the action.

    ```bash
    <action>
    ```
    </response_example>

    Note: In rare cases, if you need to reference a similar format in your command, you might have
    to proceed in two steps, first writing TRIPLEBACKTICKSBASH, then replacing them with ```bash.
  step_limit: 0.
  cost_limit: 3.
  mode: confirm
environment:
  env:
    PAGER: cat
    MANPAGER: cat
    LESS: -R
    PIP_PROGRESS_BAR: 'off'
    TQDM_DISABLE: '1'
model:
  model_kwargs:
    temperature: 0.0
    drop_params: true

Updating the default mini configuration file

You can set the MSWEA_MINI_CONFIG_PATH setting to set path to the default mini configuration file. This will allow you to override the default configuration file with your own. See the configuration guide for more details.

In the last section, you can add

model:
  model_name: "my-local-model"
  model_kwargs:
    custom_llm_provider: "openai"
    ...
  ...

The other part that you most likely need to figure out are costs. There are two ways to do this with litellm:

  1. You set up a litellm proxy server (which gives you a lot of control over all the LM calls)
  2. You update the model registry (next section)

Updating the model registry

LiteLLM get its cost and model metadata from this file. You can override or add data from this file if it's outdated or missing your desired model by including a custom registry file.

The model registry JSON file should follow LiteLLM's format:

{
  "my-custom-model": {
    "max_tokens": 4096,
    "input_cost_per_token": 0.0001,
    "output_cost_per_token": 0.0002,
    "litellm_provider": "openai",
    "mode": "chat"
  },
  "my-local-model": {
    "max_tokens": 8192,
    "input_cost_per_token": 0.0,
    "output_cost_per_token": 0.0,
    "litellm_provider": "ollama",
    "mode": "chat"
  }
}

There are two ways of setting the path to the model registry:

  1. Set LITELLM_MODEL_REGISTRY_PATH (e.g., mini-extra config set LITELLM_MODEL_REGISTRY_PATH /path/to/model_registry.json)
  2. Set litellm_model_registry in the agent config file
model:
  litellm_model_registry: "/path/to/model_registry.json"
  ...
...

Concrete examples

Help us fill this section!

We welcome concrete examples of how to use local models per pull request into this guide. Please add your example here.