AI startup DeepInfra announced it has completed a $107 million Series B funding round, led by 500 Global and early Google engineer Georges Harik, with strategic investors including NVIDIA (Nvidia), Samsung Next, and Supermicro participating. According to official information, this capital injection will be used to expand global data center capacity, addressing the computational cost and efficiency bottlenecks faced when current AI applications shift from “model training” to “large-scale inference.”
AI inference demand surges, becoming a key bottleneck for enterprise deployment
As AI moves toward commercialization, the focus of enterprise workloads has shifted significantly. DeepInfra observed that since its Series A funding, the amount of Tokens processed on its platform has grown by 25 times, indicating that Inference has become the main driver of enterprise AI workloads. With current open-source model performance able to match proprietary systems, the innovation barrier has been greatly lowered. However, the follow-on Agentic Systems applications mean that a single task may trigger hundreds of model runs. Because traditional general-purpose cloud platforms are not designed based on inference requirements, enterprises face challenges such as excessively high operating costs and uncontrollable latency, making inference a system constraint for workloads.
Vertical integration stack optimization for token economy efficiency
DeepInfra adopts a vertical integration strategy, arguing that high-performance inference must be achieved through coordinated design of hardware, networking, and software. The team previously has experience developing and operating a global communications application imo, running a 200 million-user distributed system. It has now established eight GPU infrastructure operations in the United States. Compared with service providers that rent capacity from third parties, DeepInfra has end-to-end control over the full stack from the chip level to the API interface. This design allows it to optimize “always-on” token generation tasks, ensuring that when executing intelligent AI workloads, it can deliver more predictable latency than a general cloud environment.
Deepinfra, NVIDIA’s long-term open AI ecosystem partner
Deepinfra is an early infrastructure collaboration partner for NVIDIA’s open AI ecosystem, supporting the Nemotron model, the NemoClaw agentic architecture, and NVIDIA Dynamo inference software. The early deployment of Blackwell GPUs and the upcoming integration with Vera Rubin and Dynamo will increase inference cost-effectiveness by up to 20 times.
Deepinfra provides competitive open-source models
In terms of cost control, DeepInfra operates optimized hardware to run more than 190 open-source models, aiming to offer highly competitive pricing to the market. Taking the open-source inference model GLM-5 as an example, its hybrid pricing is $1.24 per million tokens, about 20% lower than the industry average. For “thinking models” that require large amounts of internal token computation, the platform has developed a caching mechanism that offers discounted pricing for static text inputs with repeated content, effectively reducing the cost of multi-turn conversations and retrieval-augmented generation (RAG) pipelines. To meet enterprise requirements for security, DeepInfra provides an OpenAI-compatible API and commits to zero data retention. It also passes SOC 2 and ISO 27001 certifications, ensuring developers can apply the models directly in production environments.
The importance of dedicated inference infrastructure for the next stage of AI
Investor support for DeepInfra reflects that the importance of AI infrastructure is increasingly surpassing the models themselves. Tony Wang, Managing Partner at 500 Global, said that in an agent-driven development environment, developers need a dedicated platform that is more flexible, faster, and reliable. After completing this round of financing, DeepInfra’s total funding amount will reach $133 million. The funds will be used for expanding global compute capacity, deepening developer tools, and supporting next-generation agentic model development. With weekly token processing nearing 5 trillion, DeepInfra aims to build a highly efficient “token factory” to provide sustainable compute capacity for enterprises during the scale-up phase of AI applications.
This article “NVIDIA’s long-term open AI partner Deepinfra secures a $107 million Series B round to build a ‘token factory’” first appeared on LianNews ABMedia.
Related News
Anthorpic launches finance-dedicated AI Agent; insiders reveal the key reason Claude cannot replace analysts
Epoch AI report: Anthropic generates $9 million in revenue per employee, over 60% higher than OpenAI
Whale Lab: DeepSeek and Alibaba’s “fundraising” negotiations failed to reach an agreement
AI chip demand is booming; Cerebras IPO oversubscribed by more than 20 times
Anthropic is considering raising $50 billion in the summer, with a pre-investment valuation of about $900 billion.