NVIDIA’s open AI long-term partner Deepinfra raises $107 million Series B funding to build a “token factory”

ChainNewsAbmedia

2026-05-10 00:04:08

GLM0.13%

AI startup DeepInfra announced it has completed a $107 million Series B funding round, led by 500 Global and early Google engineer Georges Harik, with strategic investors including NVIDIA (Nvidia), Samsung Next, and Supermicro participating. According to official information, this capital injection will be used to expand global data center capacity, addressing the computational cost and efficiency bottlenecks faced when current AI applications shift from “model training” to “large-scale inference.”

AI inference demand surges, becoming a key bottleneck for enterprise deployment

As AI moves toward commercialization, the focus of enterprise workloads has shifted significantly. DeepInfra observed that since its Series A funding, the amount of Tokens processed on its platform has grown by 25 times, indicating that Inference has become the main driver of enterprise AI workloads. With current open-source model performance able to match proprietary systems, the innovation barrier has been greatly lowered. However, the follow-on Agentic Systems applications mean that a single task may trigger hundreds of model runs. Because traditional general-purpose cloud platforms are not designed based on inference requirements, enterprises face challenges such as excessively high operating costs and uncontrollable latency, making inference a system constraint for workloads.

Vertical integration stack optimization for token economy efficiency

DeepInfra adopts a vertical integration strategy, arguing that high-performance inference must be achieved through coordinated design of hardware, networking, and software. The team previously has experience developing and operating a global communications application imo, running a 200 million-user distributed system. It has now established eight GPU infrastructure operations in the United States. Compared with service providers that rent capacity from third parties, DeepInfra has end-to-end control over the full stack from the chip level to the API interface. This design allows it to optimize “always-on” token generation tasks, ensuring that when executing intelligent AI workloads, it can deliver more predictable latency than a general cloud environment.

Deepinfra, NVIDIA’s long-term open AI ecosystem partner

Deepinfra is an early infrastructure collaboration partner for NVIDIA’s open AI ecosystem, supporting the Nemotron model, the NemoClaw agentic architecture, and NVIDIA Dynamo inference software. The early deployment of Blackwell GPUs and the upcoming integration with Vera Rubin and Dynamo will increase inference cost-effectiveness by up to 20 times.

Deepinfra provides competitive open-source models

In terms of cost control, DeepInfra operates optimized hardware to run more than 190 open-source models, aiming to offer highly competitive pricing to the market. Taking the open-source inference model GLM-5 as an example, its hybrid pricing is $1.24 per million tokens, about 20% lower than the industry average. For “thinking models” that require large amounts of internal token computation, the platform has developed a caching mechanism that offers discounted pricing for static text inputs with repeated content, effectively reducing the cost of multi-turn conversations and retrieval-augmented generation (RAG) pipelines. To meet enterprise requirements for security, DeepInfra provides an OpenAI-compatible API and commits to zero data retention. It also passes SOC 2 and ISO 27001 certifications, ensuring developers can apply the models directly in production environments.

The importance of dedicated inference infrastructure for the next stage of AI

Investor support for DeepInfra reflects that the importance of AI infrastructure is increasingly surpassing the models themselves. Tony Wang, Managing Partner at 500 Global, said that in an agent-driven development environment, developers need a dedicated platform that is more flexible, faster, and reliable. After completing this round of financing, DeepInfra’s total funding amount will reach $133 million. The funds will be used for expanding global compute capacity, deepening developer tools, and supporting next-generation agentic model development. With weekly token processing nearing 5 trillion, DeepInfra aims to build a highly efficient “token factory” to provide sustainable compute capacity for enterprises during the scale-up phase of AI applications.

This article “NVIDIA’s long-term open AI partner Deepinfra secures a $107 million Series B round to build a ‘token factory’” first appeared on LianNews ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.