Skip to main content

Running Spring AI Ollama with Testcontainers

Spring AI and Ollama
Image by freepik

The Spring AI libraries offer a framework for integrating various Large Language Model (LLM) services. Further, Ollama is a platform for testing LLMs locally, including on laptops. This helps minimize reliance on hosted LLM services, decreasing costs, and may potentially accelerate the development.

Spring Boot TestContainers API helps manage container life cycles. It also allows us to write and run test cases that involve communication with the container.

In this article, let's explore integrating Spring AI with the LLM, running locally on the Ollama platform.

Prerequisites

First, we must install Ollama locally by following the instructions on the Ollama website. We can do this by using the correct local operating system binaries or selecting a Docker image to run it in a container. Once installed, we can pull the open-source LLMs of our choice using the Ollama CLI.

ollama pull <model_name>

Moreover, in this tutorial, we'll showcase testing a Spring AI application with the Ollama module from the Testcontainers library. The Testcontainers library makes integration testing easier across various programming languages.

Firstly, we always advise using the online Spring Initializr tool to correctly set up all the Maven dependencies. However, we'll cover the most important and pertinent libraries. So, let's start with the necessary Spring AI's Ollama starter library:

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-model-ollama</artifactId>
  <version>1.0.0-M8</version>
</dependency>

Next, let's include the Spring Boot TestContainers and TestContainers libraries:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-testcontainers</artifactId>
  <version>3.3.3</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
  <version>1.0.0-M8</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>junit-jupiter</artifactId>
  <version>1.21.1</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>ollama</artifactId>
  <version>1.21.1</version>
  <scope>test</scope>
</dependency>

Key Components of Spring AI Ollama Library

Moving on, let's take a look at the key Spring AI components for integrating with the Ollama platform:

Spring AI Ollama

The OllamaChatModel is the client class responsible for making calls to the backend LLM service running on the Ollama platform. The query is wrapped in the Prompt object and is passed as an argument to the OllamaChatModel#call() method. But before the call, we must use its constructor or the inner static Prompt.Builder() class to create the Prompt object. We can pass the prompt as a string to its constructor and further build and send the LLM runtime configurations with the help of the OllamaOptions.Builder class. Finally, we receive the response from the LLM service wrapped in the ChatResponse object.

Next, we will develop a chat client using the Spring AI library to connect with the Ollama platform.

Configuration and Testing

In this section, we'll configure the Ollamacontainer module and the ChatClientModel from Spring AI to prepare for integration testing. And, finally, we'll run the test and show the result.

First, let's set up the TestContainers Ollama module in the Spring Boot test configuration file:

@TestConfiguration(proxyBeanMethods = false)
@Profile("ollama")
class OllamaTestContainersDefaultConfig {
    @Bean
    @ServiceConnection
    OllamaContainer ollamaContainer() {
        return new OllamaContainer(DockerImageName.parse("ollama/ollama:latest"))
          .withCreateContainerCmdModifier(
            cmd -> cmd.getHostConfig().withDeviceRequests(null)
          );
    }
}

In the OllamaTestContainersDefaultConfig#ollamaContainer() method, we have initialized and instantiated an OllamaContainer bean with the latest Ollama Docker image. The OllamaContainer class is the Testcontainers module responsible for deploying and running the Ollama Platform. Finally, the OllamaContainer#withCreateContainerCmdModifier() method customizes the container start-up command to force it to run on the CPU instead of the default GPU. We can remove this method if we must run the Ollama service on a GPU.

Further, the @ServiceConnection annotation on the method helps register the connection details with the Spring Boot application. This allows the OllamaAutoConfiguration class to instantiate the OllamaChatModel with the correct connection properties.

Now, let's use a test class to demonstrate connecting the chat client to a local Ollama instance running on a Docker container. We’ll begin by reviewing the setup that runs before the test method:

@SpringBootTest
@Import(OllamaTestContainersDefaultConfig.class)
@Testcontainers
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
@ActiveProfiles("ollama")
class SpringAiOllamaDefaultConfigLiveTest {
    @Autowired
    private OllamaContainer ollamaContainer;

    @Autowired
    private OllamaChatModel ollamaChatModel;

    @BeforeAll
    public void setup() throws IOException, InterruptedException {
        Container.ExecResult execResult = ollamaContainer.execInContainer(
          "ollama", "pull", OllamaModel.LLAMA3_2.getName());
        if (execResult.getExitCode() != 0) {
          throw new IOException("Failed to pull model: " + execResult.getStderr());
        }
    }
    //..test method
}

The @Testcontainers annotation from the Testcontainers Jupiter library on the class enables automatic lifecycle management of the Ollama container. Additionally, the @Import annotation includes the OllamaTestContainersDefaultConfig class and helps autowire the OllamaContainer bean at runtime.

The setup() method executes after the Ollama container is ready. In the method, we invoke OllamaContainer#execInContainer() to execute the ollama CLI to pull the Meta's Llama 3.2 LLM library into the local system. After this, the LLM service runs locally.

Moving on, let's implement the chat client connecting to the Ollama Platform service, in the test method:

void givenDefaultOllamaConnection_whenInvokedWithPrompt_thenResponds() {
    String prompt = """
        Context:
        The Amazon rainforest is the largest tropical rainforest in the world, 
        spanning several countries in South America. It is home to a vast 
        diversity of plant and animal species, many of which are not found
        anywhere else on Earth. The rainforest plays a crucial role in regulating
        the global climate by absorbing large amounts of carbon dioxide.
        Question: Why is the Amazon rainforest important for the Earth's climate?
        Instructions:
        Please answer strictly from the context provided in the prompt and do not
        include any additional information. Keep the answer short and concise.
        """;

    ChatResponse response = ollamaChatModel.call(
      new Prompt(prompt, OllamaOptions.builder()
        .model(OllamaModel.LLAMA3_2)
        .temperature(0.4)
	.build()));
    
    assertThat(response.getResult().getOutput())
      .isNotNull()
      .extracting(output -> output.getText().toLowerCase())
      .asString()
      .contains("carbon dioxide");
     logger.info("Response: {}", response.getResult().getOutput().getText());
}

In the method, we define the prompt with a context related to the benefits of the Amazon rainforest to the Earth's climate. Next, we invoke the OllamaChatModel#call() method with the prompt. Further, we configure a few LLM settings using the OllamaOptions.builder() method, such as the model and the temperature. Towards the end, we're also verifying the result with the help of the fluent style assertThat() method from the AssertJ library.

Finally, we get the output from the underlying LLM service, wrapped in the ChatResponse object:

Response: The Amazon rainforest absorbs large amounts of carbon dioxide, playing a crucial
role in regulating the global climate.
The test took nearly 15 minutes on a Windows 11 laptop (8GB RAM, 2.30GHz). Clearly, with limited computing resources, the response time is far from ideal.

Conclusion

In this article, we discussed performing integration testing between a Spring AI application and the Ollama platform powered by Testcontainers. 

It's a cost-effective approach compared to using a hosted LLM service. However, running an LLM service is highly resource-intensive. Therefore, the service might respond slowly if the local system has limited computing resources. This may lead to a slowdown in the development cycle. Unfortunately, performance degrades proportionally to the size of the LLM.

Visit our GitHub repository to access the article's source code.

Comments

Popular posts from Kode Sastra

Qdrant DB - Spring AI Integration

Designed by Freepik This tutorial covers Spring AI's integration with Qdrant DB . It's an open-source, efficient, and scalable vector database. We'll insert some unstructured data into the vector DB. Then, we'll perform query and delete operations on the DB using the Spring AI framework. Brief Introduction to Qdrant DB It's a highly scalable multi-dimensional vector database with multiple flexible deployment options: Qdrant Cloud offers 100% managed SaaS on AWS, Azure, and GCP and a hybrid cloud variant on the Kubernetes cluster. It provides a unified console, to help create, manage, and monitor multi-node Qdrant DB clusters. It also supports on-premise private cloud deployments. This is for customers who want more control over management and data. Moreover, IAC tools like Terraform and Pulumi enable automated deployment and managemen...

Implement Rag with Spring AI and Qdrant DB

Designed by Freepik Earlier, we discussed Spring AI's integration with Qdrant DB . Continuing on the same lines, we'll explore and try implementing the Retrieval Augmented Generation (RAG) technique using Spring AI and Qdrant DB. We'll develop a chatbot that helps users query PDF documents, in natural language . RAG Technique Several LLMs exist, including OpenAI's GPT and Meta's Llama series, all pre-trained on publicly available internet data. However, they can't be used directly in a private enterprise's context because of the access restrictions to its knowledge base. Moreover, fine-tuning the LLMs is a time-consuming and resource-intensive process. Hence, augmenting the query or prompts with the information from the private knowledge base is the quickest and easiest way out . The application converts the user query into vectors. Then, it fires the q...

Building AI Assistance Using Spring AI's Function Calling API

Photo by Alex Knight on Unsplash Building AI assistance in existing legacy applications is gaining a lot of momentum. An AI assistant like a chatbot can provide users with a unified experience and enable them to perform functionalities across multiple modules through a single interface. In our article, we'll see how to leverage Spring AI to build an AI assistant. We'll demonstrate how to seamlessly reuse existing application services and functions alongside LLM capabilities. Function Calling Concept An LLM can respond to an application request in multiple ways: LLM responds from its training data LLM looks for the information provided in the prompt to respond to the query LLM has a callback function information in the prompt, that can help get the response Let's try to understand the third option, Spring AI's Function calling ...