Running Spring AI Ollama with Testcontainers

The Spring AI libraries offer a framework for integrating various Large Language Model (LLM) services. Further, Ollama is a platform for testing LLMs locally, including on laptops. This helps minimize reliance on hosted LLM services, decreasing costs, and may potentially accelerate the development.

Spring Boot TestContainers API helps manage container life cycles. It also allows us to write and run test cases that involve communication with the container.

In this article, let's explore integrating Spring AI with the LLM, running locally on the Ollama platform.

Prerequisites

First, we must install Ollama locally by following the instructions on the Ollama website. We can do this by using the correct local operating system binaries or selecting a Docker image to run it in a container. Once installed, we can pull the open-source LLMs of our choice using the Ollama CLI.

ollama pull <model_name>

Moreover, in this tutorial, we'll showcase testing a Spring AI application with the Ollama module from the Testcontainers library. The Testcontainers library makes integration testing easier across various programming languages.

Firstly, we always advise using the online Spring Initializr tool to correctly set up all the Maven dependencies. However, we'll cover the most important and pertinent libraries. So, let's start with the necessary Spring AI's Ollama starter library:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-model-ollama</artifactId>
  <version>1.0.0-M8</version>
</dependency>

Next, let's include the Spring Boot TestContainers and TestContainers libraries:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-testcontainers</artifactId>
  <version>3.3.3</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
  <version>1.0.0-M8</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>junit-jupiter</artifactId>
  <version>1.21.1</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>ollama</artifactId>
  <version>1.21.1</version>
  <scope>test</scope>
</dependency>

Key Components of Spring AI Ollama Library

Moving on, let's take a look at the key Spring AI components for integrating with the Ollama platform:

The OllamaChatModel is the client class responsible for making calls to the backend LLM service running on the Ollama platform. The query is wrapped in the Prompt object and is passed as an argument to the OllamaChatModel#call() method. But before the call, we must use its constructor or the inner static Prompt.Builder() class to create the Prompt object. We can pass the prompt as a string to its constructor and further build and send the LLM runtime configurations with the help of the OllamaOptions.Builder class. Finally, we receive the response from the LLM service wrapped in the ChatResponse object.

Next, we will develop a chat client using the Spring AI library to connect with the Ollama platform.

Configuration and Testing

In this section, we'll configure the Ollamacontainer module and the ChatClientModel from Spring AI to prepare for integration testing. And, finally, we'll run the test and show the result.

First, let's set up the TestContainers Ollama module in the Spring Boot test configuration file:

@TestConfiguration(proxyBeanMethods = false)
@Profile("ollama")
class OllamaTestContainersDefaultConfig {
    @Bean
    @ServiceConnection
    OllamaContainer ollamaContainer() {
        return new OllamaContainer(DockerImageName.parse("ollama/ollama:latest"))
          .withCreateContainerCmdModifier(
            cmd -> cmd.getHostConfig().withDeviceRequests(null)
          );
    }
}

In the OllamaTestContainersDefaultConfig#ollamaContainer() method, we have initialized and instantiated an OllamaContainer bean with the latest Ollama Docker image. The OllamaContainer class is the Testcontainers module responsible for deploying and running the Ollama Platform. Finally, the OllamaContainer#withCreateContainerCmdModifier() method customizes the container start-up command to force it to run on the CPU instead of the default GPU. We can remove this method if we must run the Ollama service on a GPU.

Further, the @ServiceConnection annotation on the method helps register the connection details with the Spring Boot application. This allows the OllamaAutoConfiguration class to instantiate the OllamaChatModel with the correct connection properties.

Now, let's use a test class to demonstrate connecting the chat client to a local Ollama instance running on a Docker container. We’ll begin by reviewing the setup that runs before the test method:

@SpringBootTest
@Import(OllamaTestContainersDefaultConfig.class)
@Testcontainers
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
@ActiveProfiles("ollama")
class SpringAiOllamaDefaultConfigLiveTest {
    @Autowired
    private OllamaContainer ollamaContainer;

    @Autowired
    private OllamaChatModel ollamaChatModel;

    @BeforeAll
    public void setup() throws IOException, InterruptedException {
        Container.ExecResult execResult = ollamaContainer.execInContainer(
          "ollama", "pull", OllamaModel.LLAMA3_2.getName());
        if (execResult.getExitCode() != 0) {
          throw new IOException("Failed to pull model: " + execResult.getStderr());
        }
    }
    //..test method
}

The @Testcontainers annotation from the Testcontainers Jupiter library on the class enables automatic lifecycle management of the Ollama container. Additionally, the @Import annotation includes the OllamaTestContainersDefaultConfig class and helps autowire the OllamaContainer bean at runtime.

The setup() method executes after the Ollama container is ready. In the method, we invoke OllamaContainer#execInContainer() to execute the ollama CLI to pull the Meta's Llama 3.2 LLM library into the local system. After this, the LLM service runs locally.

Moving on, let's implement the chat client connecting to the Ollama Platform service, in the test method:

void givenDefaultOllamaConnection_whenInvokedWithPrompt_thenResponds() {
    String prompt = """
        Context:
        The Amazon rainforest is the largest tropical rainforest in the world, 
        spanning several countries in South America. It is home to a vast 
        diversity of plant and animal species, many of which are not found
        anywhere else on Earth. The rainforest plays a crucial role in regulating
        the global climate by absorbing large amounts of carbon dioxide.
        Question: Why is the Amazon rainforest important for the Earth's climate?
        Instructions:
        Please answer strictly from the context provided in the prompt and do not
        include any additional information. Keep the answer short and concise.
        """;

    ChatResponse response = ollamaChatModel.call(
      new Prompt(prompt, OllamaOptions.builder()
        .model(OllamaModel.LLAMA3_2)
        .temperature(0.4)
	.build()));
    
    assertThat(response.getResult().getOutput())
      .isNotNull()
      .extracting(output -> output.getText().toLowerCase())
      .asString()
      .contains("carbon dioxide");
     logger.info("Response: {}", response.getResult().getOutput().getText());
}

In the method, we define the prompt with a context related to the benefits of the Amazon rainforest to the Earth's climate. Next, we invoke the OllamaChatModel#call() method with the prompt. Further, we configure a few LLM settings using the OllamaOptions.builder() method, such as the model and the temperature. Towards the end, we're also verifying the result with the help of the fluent style assertThat() method from the AssertJ library.

Finally, we get the output from the underlying LLM service, wrapped in the ChatResponse object:

Response: The Amazon rainforest absorbs large amounts of carbon dioxide, playing a crucial
role in regulating the global climate.

The test took nearly 15 minutes on a Windows 11 laptop (8GB RAM, 2.30GHz). Clearly, with limited computing resources, the response time is far from ideal.

Conclusion

In this article, we discussed performing integration testing between a Spring AI application and the Ollama platform powered by Testcontainers.

It's a cost-effective approach compared to using a hosted LLM service. However, running an LLM service is highly resource-intensive. Therefore, the service might respond slowly if the local system has limited computing resources. This may lead to a slowdown in the development cycle. Unfortunately, performance degrades proportionally to the size of the LLM.

Visit our GitHub repository to access the article's source code.

Kode Sastra

Search this blog