Skip to content

Local models

Local models

Examples

  • Issue #303 has several examples of how to use local models.
  • We also welcome concrete examples of how to use local models per pull request into this guide.

Using litellm

Currently, models are supported via litellm by default.

There are typically two steps to using local models:

  1. Editing the agent config file to add settings like custom_llm_provider and api_base.
  2. Either ignoring errors from cost tracking or updating the model registry to include your local model.

Setting API base/provider

If you use local models, you most likely need to add some extra keywords to the litellm call. This is done with the model_kwargs dictionary which is directly passed to litellm.completion.

In other words, this is how we invoke litellm:

litellm.completion(
    model=model_name,
    messages=messages,
    **model_kwargs
)

You can set model_kwargs in an agent config file like the following one:

Default configuration file
agent:
  system_template: |
    You are a helpful assistant that can interact with a computer.
  instance_template: |
    Please solve this issue: {{task}}

    You can execute bash commands and edit files to implement the necessary changes.

    ## Recommended Workflow

    This workflow should be done step-by-step so that you can iterate on your changes and any possible problems.

    1. Analyze the codebase by finding and reading relevant files
    2. Create a script to reproduce the issue
    3. Edit the source code to resolve the issue
    4. Verify your fix works by running your script again
    5. Test edge cases to ensure your fix is robust
    6. Submit your changes and finish your work by issuing the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`.
       Do not combine it with any other command. <important>After this command, you cannot continue working on this task.</important>

    ## Command Execution Rules

    You are operating in an environment where

    1. You issue at least one command
    2. The system executes the command(s) in a subshell
    3. You see the result(s)
    4. You write your next command(s)

    Each response should include:

    1. **Reasoning text** where you explain your analysis and plan
    2. At least one tool call with your command

    **CRITICAL REQUIREMENTS:**

    - Your response SHOULD include reasoning text explaining what you're doing
    - Your response MUST include AT LEAST ONE bash tool call
    - Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
    - However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files
    - Submit your changes and finish your work by issuing the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`.
      Do not combine it with any other command. <important>After this command, you cannot continue working on this task.</important>

    Example of a CORRECT response:
    <example_response>
    I need to understand the structure of the repository first. Let me check what files are in the current directory to get a better understanding of the codebase.

    [Makes bash tool call with {"command": "ls -la"} as arguments]
    </example_response>

    <system_information>
    {{system}} {{release}} {{version}} {{machine}}
    </system_information>

    ## Useful command examples

    ### Create a new file:

    ```bash
    cat <<'EOF' > newfile.py
    import numpy as np
    hello = "world"
    print(hello)
    EOF
    ```

    ### Edit files with sed:

    {%- if system == "Darwin" -%}
    <important>
    You are on MacOS. For all the below examples, you need to use `sed -i ''` instead of `sed -i`.
    </important>
    {%- endif -%}

    ```bash
    # Replace all occurrences
    sed -i 's/old_string/new_string/g' filename.py

    # Replace only first occurrence
    sed -i 's/old_string/new_string/' filename.py

    # Replace first occurrence on line 1
    sed -i '1s/old_string/new_string/' filename.py

    # Replace all occurrences in lines 1-10
    sed -i '1,10s/old_string/new_string/g' filename.py
    ```

    ### View file content:

    ```bash
    # View specific lines with numbers
    nl -ba filename.py | sed -n '10,20p'
    ```

    ### Any other command you want to run

    ```bash
    anything
    ```
  step_limit: 0
  cost_limit: 3.
  mode: confirm
environment:
  env:
    PAGER: cat
    MANPAGER: cat
    LESS: -R
    PIP_PROGRESS_BAR: 'off'
    TQDM_DISABLE: '1'
model:
  observation_template: |
    {%- if output.output | length < 10000 -%}
    {
      "returncode": {{ output.returncode }},
      "output": {{ output.output | tojson }}
      {%- if output.exception_info %}, "exception_info": {{ output.exception_info | tojson }}{% endif %}
    }
    {%- else -%}
    {
      "returncode": {{ output.returncode }},
      "output_head": {{ output.output[:5000] | tojson }},
      "output_tail": {{ output.output[-5000:] | tojson }},
      "elided_chars": {{ output.output | length - 10000 }},
      "warning": "Output too long."
      {%- if output.exception_info %}, "exception_info": {{ output.exception_info | tojson }}{% endif %}
    }
    {%- endif -%}
  format_error_template: |
    Tool call error:

    <error>
    {{error}}
    </error>

    Here is general guidance on how to submit correct toolcalls:

    Every response needs to use the 'bash' tool at least once to execute commands.

    Call the bash tool with your command as the argument:
    - Tool: bash
    - Arguments: {"command": "your_command_here"}

    If you want to end the task, please issue the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`
    without any other command.
  model_kwargs:
    drop_params: true

In the last section, you can add

model:
  model_name: "my-local-model"
  model_kwargs:
    custom_llm_provider: "openai"
    api_base: "https://..."
    ...
  ...

Updating the default mini configuration file

You can set the MSWEA_MINI_CONFIG_PATH setting to set path to the default mini configuration file. This will allow you to override the default configuration file with your own. See the global configuration guide for more details.

If this is not enough, our model class should be simple to modify:

Complete model class
import json
import logging
import os
import time
from collections.abc import Callable
from pathlib import Path
from typing import Any, Literal

import litellm
from pydantic import BaseModel

from minisweagent.models import GLOBAL_MODEL_STATS
from minisweagent.models.utils.actions_toolcall import (
    BASH_TOOL,
    format_toolcall_observation_messages,
    parse_toolcall_actions,
)
from minisweagent.models.utils.anthropic_utils import _reorder_anthropic_thinking_blocks
from minisweagent.models.utils.cache_control import set_cache_control
from minisweagent.models.utils.openai_multimodal import expand_multimodal_content
from minisweagent.models.utils.retry import retry

logger = logging.getLogger("litellm_model")


class LitellmModelConfig(BaseModel):
    model_name: str
    """Model name. Highly recommended to include the provider in the model name, e.g., `anthropic/claude-sonnet-4-5-20250929`."""
    model_kwargs: dict[str, Any] = {}
    """Additional arguments passed to the API."""
    litellm_model_registry: Path | str | None = os.getenv("LITELLM_MODEL_REGISTRY_PATH")
    """Model registry for cost tracking and model metadata. See the local model guide (https://mini-swe-agent.com/latest/models/local_models/) for more details."""
    set_cache_control: Literal["default_end"] | None = None
    """Set explicit cache control markers, for example for Anthropic models"""
    cost_tracking: Literal["default", "ignore_errors"] = os.getenv("MSWEA_COST_TRACKING", "default")
    """Cost tracking mode for this model. Can be "default" or "ignore_errors" (ignore errors/missing cost info)"""
    format_error_template: str = "{{ error }}"
    """Template used when the LM's output is not in the expected format."""
    observation_template: str = (
        "{% if output.exception_info %}<exception>{{output.exception_info}}</exception>\n{% endif %}"
        "<returncode>{{output.returncode}}</returncode>\n<output>\n{{output.output}}</output>"
    )
    """Template used to render the observation after executing an action."""
    multimodal_regex: str = ""
    """Regex to extract multimodal content. Empty string disables multimodal processing."""


class LitellmModel:
    abort_exceptions: list[type[Exception]] = [
        litellm.exceptions.UnsupportedParamsError,
        litellm.exceptions.NotFoundError,
        litellm.exceptions.PermissionDeniedError,
        litellm.exceptions.ContextWindowExceededError,
        litellm.exceptions.AuthenticationError,
        KeyboardInterrupt,
    ]

    def __init__(self, *, config_class: Callable = LitellmModelConfig, **kwargs):
        self.config = config_class(**kwargs)
        if self.config.litellm_model_registry and Path(self.config.litellm_model_registry).is_file():
            litellm.utils.register_model(json.loads(Path(self.config.litellm_model_registry).read_text()))

    def _query(self, messages: list[dict[str, str]], **kwargs):
        try:
            return litellm.completion(
                model=self.config.model_name,
                messages=messages,
                tools=[BASH_TOOL],
                **(self.config.model_kwargs | kwargs),
            )
        except litellm.exceptions.AuthenticationError as e:
            e.message += " You can permanently set your API key with `mini-extra config set KEY VALUE`."
            raise e

    def _prepare_messages_for_api(self, messages: list[dict]) -> list[dict]:
        prepared = [{k: v for k, v in msg.items() if k != "extra"} for msg in messages]
        prepared = _reorder_anthropic_thinking_blocks(prepared)
        return set_cache_control(prepared, mode=self.config.set_cache_control)

    def query(self, messages: list[dict[str, str]], **kwargs) -> dict:
        for attempt in retry(logger=logger, abort_exceptions=self.abort_exceptions):
            with attempt:
                response = self._query(self._prepare_messages_for_api(messages), **kwargs)
        cost_output = self._calculate_cost(response)
        GLOBAL_MODEL_STATS.add(cost_output["cost"])
        message = response.choices[0].message.model_dump()
        message["extra"] = {
            "actions": self._parse_actions(response),
            "response": response.model_dump(),
            **cost_output,
            "timestamp": time.time(),
        }
        return message

    def _calculate_cost(self, response) -> dict[str, float]:
        try:
            cost = litellm.cost_calculator.completion_cost(response, model=self.config.model_name)
            if cost <= 0.0:
                raise ValueError(f"Cost must be > 0.0, got {cost}")
        except Exception as e:
            cost = 0.0
            if self.config.cost_tracking != "ignore_errors":
                msg = (
                    f"Error calculating cost for model {self.config.model_name}: {e}, perhaps it's not registered? "
                    "You can ignore this issue from your config file with cost_tracking: 'ignore_errors' or "
                    "globally with export MSWEA_COST_TRACKING='ignore_errors'. "
                    "Alternatively check the 'Cost tracking' section in the documentation at "
                    "https://klieret.short.gy/mini-local-models. "
                    " Still stuck? Please open a github issue at https://github.com/SWE-agent/mini-swe-agent/issues/new/choose!"
                )
                logger.critical(msg)
                raise RuntimeError(msg) from e
        return {"cost": cost}

    def _parse_actions(self, response) -> list[dict]:
        """Parse tool calls from the response. Raises FormatError if unknown tool."""
        tool_calls = response.choices[0].message.tool_calls or []
        return parse_toolcall_actions(tool_calls, format_error_template=self.config.format_error_template)

    def format_message(self, **kwargs) -> dict:
        return expand_multimodal_content(kwargs, pattern=self.config.multimodal_regex)

    def format_observation_messages(
        self, message: dict, outputs: list[dict], template_vars: dict | None = None
    ) -> list[dict]:
        """Format execution outputs into tool result messages."""
        actions = message.get("extra", {}).get("actions", [])
        return format_toolcall_observation_messages(
            actions=actions,
            outputs=outputs,
            observation_template=self.config.observation_template,
            template_vars=template_vars,
            multimodal_regex=self.config.multimodal_regex,
        )

    def get_template_vars(self, **kwargs) -> dict[str, Any]:
        return self.config.model_dump()

    def serialize(self) -> dict:
        return {
            "info": {
                "config": {
                    "model": self.config.model_dump(mode="json"),
                    "model_type": f"{self.__class__.__module__}.{self.__class__.__name__}",
                },
            }
        }

The other part that you most likely need to figure out are costs. There are two ways to do this with litellm:

  1. You set up a litellm proxy server (which gives you a lot of control over all the LM calls)
  2. You update the model registry (next section)

Cost tracking

If you run with the above, you will most likely get an error about missing cost information.

If you do not need cost tracking, you can ignore these errors, ideally by editing your agent config file to add:

model:
  cost_tracking: "ignore_errors"
  ...
...

Alternatively, you can set the global setting:

export MSWEA_COST_TRACKING="ignore_errors"

However, note that this is a global setting, and will affect all models!

However, the best way to handle the cost issue is to add a model registry to litellm to include your local model.

LiteLLM gets its cost and model metadata from this file. You can override or add data from this file if it's outdated or missing your desired model by including a custom registry file.

The model registry JSON file should follow LiteLLM's format:

{
  "my-custom-model": {
    "max_tokens": 4096,
    "input_cost_per_token": 0.0001,
    "output_cost_per_token": 0.0002,
    "litellm_provider": "openai",
    "mode": "chat"
  },
  "my-local-model": {
    "max_tokens": 8192,
    "input_cost_per_token": 0.0,
    "output_cost_per_token": 0.0,
    "litellm_provider": "ollama",
    "mode": "chat"
  }
}

Model names

Model names are case sensitive. Please make sure you have an exact match.

Model provider

If you use the custom_llm_provider or have a provider prefixed to the model name (e.g., openai/...), then this must also match litellm_provider in the config!

There are two ways of setting the path to the model registry:

  1. Set LITELLM_MODEL_REGISTRY_PATH (e.g., mini-extra config set LITELLM_MODEL_REGISTRY_PATH /path/to/model_registry.json)
  2. Set litellm_model_registry in the agent config file
model:
  litellm_model_registry: "/path/to/model_registry.json"
  ...
...

Concrete examples

Generating SWE-bench trajectories with vLLM

This example shows how to generate SWE-bench trajectories using vLLM as the local inference engine.

First, launch a vLLM server with your chosen model. For example:

vllm serve ricdomolm/mini-coder-1.7b &

By default, the server will be available at http://localhost:8000.

Second, edit the mini-swe-agent SWE-bench config file located in src/minisweagent/config/benchmarks/swebench.yaml to include your local vLLM model:

model:
  model_name: "hosted_vllm/ricdomolm/mini-coder-1.7b"  # or hosted_vllm/path/to/local/model
  model_kwargs:
    api_base: "http://localhost:8000/v1"  # adjust if using a non-default port/address

If you need a custom registry, as detailed above, create a registry.json file:

cat > registry.json <<'EOF'
{
  "ricdomolm/mini-coder-1.7b": {
    "max_tokens": 40960,
    "input_cost_per_token": 0.0,
    "output_cost_per_token": 0.0,
    "litellm_provider": "hosted_vllm",
    "mode": "chat"
  }
}
EOF

Now you’re ready to generate trajectories! Let's solve the django__django-11099 instance of SWE-bench Verified:

LITELLM_MODEL_REGISTRY_PATH=registry.json mini-extra swebench \
    --output test/ --subset verified --split test --filter '^(django__django-11099)$'

You should now see the generated trajectory in the test/ directory.