“As (companies) move from rented to owned intelligence, they’ll not only own the quality of the model, but they’ll own the performance of the model, the utilization and costs of the model, and the different tools the model needs to run.”
This Episode’s Guest
Tuhin Srivastava is the co-founder & CEO of Baseten, an AI inference platform designed to give engineering teams the tooling, expertise, and hardware needed to bring AI products to market. Baseten provides production-grade inference infrastructure to some of the fastest-growing companies in the AI space.
Should you be post-training your own AI models – and if you do, how does that change your inference needs? In this episode, learn what inference infrastructure leader Baseten is observing with their own customers, and how they’re thinking about the inference stack for the next generation of AI.
Read on for the key takeaways from this episode, or to watch the full discussion.
Key Takeaways
Companies will move away from rented to owned intelligence
Most AI companies start out “renting” their models, paying for tokens to use a model that someone else has trained. That’s fine for early-stage AI initiatives, but it has considerable downsides as the business starts to scale: you’re not accruing value, and you don’t have much control over where and how the model runs, how fast it runs, or what it’s good at.
The next step, Srivastava argues, is for companies to “own” their intelligence instead: “the idea of owned intelligence is that you are post-training your own models. You are using data from your applications and workflows to make models very good at the specific thing you are trying to get out of them. And then with that comes the ability to control all the SLAs and performance requirements and the regional requirements that come with that.”
Rented intelligence can allow an AI product to prove value; owned intelligence lets companies dial in that value to become indispensable.
With agents, inference changes from a narrow solution to a set of tools for continuous learning.
Agents are set to take on more and more AI workloads, which has implications not just for how work gets done, but also for how models learn to do the work (and how to keep doing it).
Designing agents for what Srivastava calls “long-horizon tasks” requires continuous learning, and a toolset to ensure the agents are able to carry out their assignments in changing circumstances. This means a much more expansive mindset around what inference is necessary, and what it can do.
”The big difference we have seen now is that as these models went from non-reasoning to thinking now to agentic workflows, they just need a whole ecosystem of tools,” says Srivastava. “For running tools, for running other models, for routing between different models, for how to make it very fast. How do you spin things up and down to execute code? And that’s like sandboxes. Inference goes from, ‘I need to run this model,’ to a set of tools to run these agents. And that’s where we think the inference stack is going over time: where there is this core inference runtime that is very reliable and very performant and has a great developer experience.”
Using distributed capacity can help mitigate capacity constraints
For virtually all growth-minded AI companies, capacity constraints are the elephant in the room. Hard limits on global supply and aggressive moves by the largest players to reserve capabilities means small up-and-comers are at the most risk of having their growth slowed or restricted by capacity access. The hard truth is that the situation is unlikely to change anytime soon – instead, startups need to learn to live around it.
One way Baseten has worked around the issue, Srivistava explains, is embracing distributed capacity. Rather than having all of their capacity in a single location, Baseten currently sits on “between 50 and 20 clouds, and between 80 and 100 regions around the world…and what we have built over many years is the ability to stitch this all together.”
This was a choice that Baseten made early on in constructing their software, and it’s paid off – by opening up where their capacity could be located, they could take advantage of capacity that became available worldwide. Building in flexibility in this area can help young AI companies better access the capacity they need to grow and scale, even if the wider global shortage takes much longer to resolve.
Watch the episode below for the full discussion, including how Baseten stays close to their customers, why now is the right time for post training your model, and what their most discerning customers optimize for in their models. Thank you to Tuhin Srivastava for joining us!