Two recent articles reminded me of how pervasive the meme is becoming that agents are going to kill SaaS. The argument goes that centralised software services are too constraining with their one-size-fits-all feature sets and that LLM-based agents will soon be both faster to deploy and more adaptable to the ever-changing dynamics of your local environment.

There is some truth to that, agentic systems can handle the messiness of our probabilistic world in a way traditional software can’t. But the details are critical; SaaS can be agentic. In fact, many of the personal assistants emerging from big tech and challengers are delivered via calls to remote agent chains using LLMs that are too big to run locally. Smaller models are getting smarter and migrating to local devices, so local assistants, along with a wide array of other agentic processes, may still move behind the firewall eventually. But it is very unlikely that remote software services are going to disappear altogether. Many of them will simply become agentic systems delivered as a remote service, which is still a form of software as a service.

Agentic system primer

First, for the uninitiated, here is a very brief primer on agentic systems. Skip to the next section if you’re already comfortable with this.

A large language model takes text in and produces some text as output. Multimodal models extend the inputs and outputs to audio and visuals. This very general capability to generate large documents of sensible-sounding text from your inputs means you can ask an LLM to come up with almost anything you can represent as text. That could be some simple marketing message or a very detailed plan for achieving almost anything.

The chain of thought is a technique that has now pierced the public veil. You append ‘think step by step’ to the end of your request, and the LLM will return a list of intermediate steps that lead to the final answer. There are many variants of this, such as the tree of thought that generates multiple potential paths to an answer and selects the best one. Recently, people have started doing additional post-training of models to teach them to perform this extra ‘reasoning’ automatically. These are the ‘thinking’ models like O1, O3, Gemini Thinking and Deepseek R1, though ‘thinking’ versions of models are not necessarily required. The chains of steps toward some goal that LLMs produce can also be used as plans for interacting with the world to find a solution.

If you can get an LLM to produce a list of steps to achieve something, you can get it to orchestrate complex processes. For example, let’s say you have a question like, ‘How many mayors are there in Brazil?’. You’re looking for the most up-to-date answer, and while the very large models were probably trained on all the website data that has that information, they were also probably trained many months before you asked the question, and the world may have changed since then. To get up-to-date information, we can leverage this step-by-step thinking with additional calls to what are generally referred to as ‘tools’.

We want the LLM to search the web and add up all the mayors it finds in search results, but LLMs don’t natively search the web and have historically been bad at doing math. Instead of just asking the question ‘how many mayors are there in Brazil?’, you want to get the model to use tools such as

To search the web for mayors, a separate software service you wrote called ‘web_search’ that takes in a search query and the number of results you want, then sends it to Google and scrapes the results page, returning the list of URLs and the brief descriptions of each URL that Google provides.
To read the contents of each search result, another software service you wrote called ‘scraper’ that takes in some URL, crawls the site and returns all pages, perhaps as PDFs, to conveniently capture the text and images
To add up all the mayors it finds, a calculator which takes in numbers and the normal calculator operations (addition, subtraction, etc), and returns the resulting number

To make this work, you create a prompt that has the original question, and you append a description of each of these tools to the end of the prompt, including what the tool is called and what its inputs and outputs are. Then you ask the LLM to come up with a step-by-step plan that uses these tools as necessary to come up with an answer. You also tell the LLM to output the plan text as JSON with a JSON schema you give it.

In this case, the plan it generates might loosely include steps such as:
1. Search the web for ‘Mayors in Brazil’

2. Fetch the web pages for the first fifty results using the scraper tool

3. Extract any names of mayors

4. Remove duplicate names

Send +1 to the calculator for every unique name

Some of these steps, like ‘extract any names of mayors’, are prompts for an LLM that should get submitted with the copy of the website content so it can reuse itself as a tool or call a different LLM as a tool.

The LLMs output will be structured like JSON because that’s what you asked for. It’s still plain text that looks like JSON rather than a JSON object at this point, but you can then convert the JSON looking text into an actual JSON object using standard libraries, and use traditional code in any programming language you like to orchestrate the calls to each of the tools, because you now have the names of the associated services, the order in which to call them, and their inputs, in well structured JSON.

You can then take the output you receive from each call to one of these tools and pass it back into the prompt iteratively until the LLM has the results of all steps in the plan and can generate the final answer.

Modern LLM services often hide some of the details and might call tools like search in the background, but this process works with smaller models and larger models, using any tool you like. That tool can be a service that performs a query on an internal database, a Python interpreter, a Rust compiler, or a custom internal API that generates specs for some widget that goes to a production line in a factory.

Of course, LLMs sometimes make mistakes. To deal with these you can use techniques like reflection where you ask one LLM to double check the outputs of another LLM, you can execute the code the LLM generates and feed any errors back into the LLM so it can search for fixes, perhaps using the web search tool, and many other such techniques for error correction. Code generators like Replit and Bolt.new that can build whole applications from English instructions offer a good demonstration of how effective LLM agent chains can be at performing complex tasks. They appear to have an error rate that can compound in a way that makes them unsuitable for large enterprise system engineering, but the field is improving daily. The opportunities for automation are vast, there is no doubt the future is full of agentic systems.

A SaaSy fly in the ointment

There is a case for putting quite a lot of these intelligent agents behind private boundaries: hedge funds and military contractors obviously need to protect their secrets, every business needs to be mindful of protecting something to preserve pricing power, and there will likely always be security and regulatory considerations for protecting user data. This has always been true, some software should live behind the firewall.

Agentic frameworks, powered by the increasingly impressive capabilities of language models to ‘reason’ and program, are certainly well-positioned to do more of the integration of disparate data sources and existing APIs behind private firewalls. Their ability to interpret the unpredictable humans in the loop and generate intricate plans from imprecise and sometimes ambiguous input also makes it possible to automate processes that traditional software engineering could never handle.

The natural language interface of agentic systems will almost certainly democratise the ability to automate many business processes, and their incredible breadth of applicability may well help us achieve levels of automation that usher in the radical abundance the AI community envisions, assuming of course, we can figure out the socio-economic mechanisms for distributing the spoils.

However, the ‘SaaS is dead’ meme glosses over some important details. The value in SaaS is not just in the writing of the software, it is also in the centralisation of the service. Search services such as Google are effectively SaaS for the masses. If you are extremely paranoid, you might worry about the information you leak from a firewalled domain in the search terms you send Google. You could build your own search engine that crawls and indexes the entire Web regularly, but that is an incredibly expensive service to run even if you get an LLM to write all the code. The energy costs of running that process continuously in every organisation in the world would be enormous.

There are many examples like this where if you asked an all-powerful global super intelligence to design a system, it would probably design a distributed system that aggregates and specialises functionality in modules that interact with each other over a network because that is far more efficient than everything doing everything. There are clear arguments for hosting a subset of capabilities behind a firewall, where the information that process operates on is necessarily private and in these cases, you may well end up duplicating some functionality in the name of privacy.

But for SaaS to be dead, you have to believe that decentralisation is always the best solution for every problem, when it is very often an unreasonably costly option, even if you discount the cost of writing and maintaining the code itself to nil. It is likely that SaaS offerings will increasingly incorporate agentic systems that enable them to deal with more of the variability of real-world data and different client needs.

What about AGI? The simple answer is that nobody can prepare for true AGI; it would likely upend our entire economic system, assuming we survive it. I don’t have a confident view on whether that will happen next year or not in my lifetime. With such incredible progress over the last few years, it would be naive not to consider the possibility. Yet, with every new discovery, we also learn new limitations. François Chollet, for example, has recently done a great job at encoding some of the limitations of LLM reasoning in the ARC challenge, which in turn has motivated several improvements in reasoning models but it has also uncovered some of the very subtle ways we can mistake memory for reasoning. To quote Sayash Kapoor, ‘every exponential is a sigmoid looking backwards’.

We simply have no idea whether the current exponential increase in LLM capabilities is going to flatten out before it overtakes us, and there is also no way to plan for that anyway. It may be the case that AI needs to operate in the real world to become truly competitive with humans. Several decades ago, an eon in AI, Gerald Edelman, a biologist turned robotics researcher, discovered that different types of intelligence emerged with different physical forms of his machines.

And in the physical world, there are still plenty of limitations. Large LLM providers are buying up old nuclear power stations to train and serve ever larger models, but the most prominent models are too energy intensive for many industrial and other real-world applications. A drone might need millisecond response times. Even if you could get a transformer-based large language model to do that, you simply can’t fit enough GPUs into a large drone to make that work. Here, too though, progress is emerging from veterans such as Sepp Hochreiter, the inventor of LSTM, who has recently developed xLSTM to achieve much lower latency with less energy and incredible predictive power.

So, other than the unknowable threat of AGI, the biggest threat to software service providers may well be the flow of funds to the even greater opportunity of intelligent robotics. But I’m an optimist; as intelligent agents seep into every crevice of the physical world, it may simply blur the lines more between the physical and virtual world. We can already check the weather on ski slopes hundreds of miles away, made possible by models built on signals from a globally distributed network of sensors. Most organisations wouldn’t consider building weather models a core competency and would spend a lot more trying to replicate them than consuming a service, even if you managed to get agents to write all the code.

Even though LLMs are likely to enable us all to achieve a lot more with less, there will always be some opportunity cost and sometimes prohibitive processing costs that can only be justified when distributed over larger numbers of users. When we scale up to billions of intelligent devices, we may even be entering a new golden age for software providers that can synthesise all of this streamed intelligence into new super-flexible agent-driven products.

This blog was first published here on 18 February 2025. You can follow Robin on LinkedIn.