Startup Gimlet Labs is addressing the AI inference bottleneck in an unexpectedly sophisticated manner.

Zain Asgar, an adjunct professor at Stanford and a successful entrepreneur, has secured an $80 million Series A funding for a startup addressing the AI inference bottleneck issue in an insightful fashion. Menlo Ventures led this investment round.

The startup, Gimlet Labs, claims to have developed the first and only “multi-silicon inference cloud,” which is software enabling simultaneous execution of AI workloads across various hardware types. It can distribute an AI application’s tasks among both conventional CPUs and AI-optimized GPUs, as well as high-memory architectures.

“In essence, we operate across all available hardware types,” Asgar shared with TechCrunch.

One agent may connect several steps together, each requiring distinct hardware: Inference is compute-bound; decoding is memory-bound; and tool calls are network-bound, explains lead investor, Tim Tully of Menlo, in a blog post regarding the funding.

No single chip currently does it all, but as new hardware is introduced and older GPUs are repurposed, “the multi-silicon fleet is prepared — it merely needs the software layer to function.” This is what Tully believes Gimlet Labs provides.

If the ongoing trend of deploying more computing resources persists, McKinsey predicts that spending on data centers will reach nearly $7 trillion by 2030. Asgar mentions that existing applications are utilizing the current hardware deployed “only between 15 to 30 percent” of the time.

“Another perspective is that you’re wasting hundreds of billions of dollars by permitting resources to sit idle,” he commented. “Our goal was essentially to determine how to make AI workloads 10x more efficient than ever before, today.”

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

As a result, he and his co-founders, Michelle Nguyen, Omid Azizi, and Natalie Serrino, began to develop orchestration software that breaks down agentic workloads, allowing them to be concurrently distributed across various hardware infrastructures.

Gimlet Labs asserts that it can enhance AI inference speed by 3x to 10x without increasing cost or power consumption. Gimlet claims it can even partition the underlying model to run across different architectures, selecting the optimal chip for each segment of the model.

The firm has established partnerships with chip manufacturers NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix.

Gimlet’s offering, available as software or via an API to its Gimlet Cloud, is not intended for the general AI application developer. It targets the largest AI model laboratories and data centers.

The company officially launched in October, reporting eight-figure revenues right from the start (at least $10 million). Asgar noted that their customer base has more than doubled in the past four months and now includes a significant model manufacturer and an extremely large cloud computing firm, though he opted not to disclose their names.

The co-founders had previously collaborated at Pixie, a startup that developed an open-source observability tool for Kubernetes. Pixie was acquired by New Relic in 2020, just two months post-launch with a $9 million Series A led by Benchmark. (Pixie’s technology is now part of the open-source organization that manages Kubernetes.)

After Asgar coincidentally met Tully about a year ago and also secured angel investments from Stanford faculty, venture capitalists began reaching out. Following the launch, a term sheet arrived on Asgar’s desk. When VCs discovered that Asgar was evaluating offers, “we received a significant influx of funding,” and the round was quickly oversubscribed, he stated.

With the prior seed funding, the startup has now raised a total of $92 million, backed by numerous angels, including Sequoia’s Bill Coughran, Stanford Professor Nick McKeown, former VMware CEO Raghu Raghuram, and Intel CEO Lip-Bu Tan. The company currently has a workforce of 30 people.

Other investors consist of Factory, which led the seed funding, Eclipse Ventures, Prosperity7, and Triatomic.

Techin' Geek

Startup Gimlet Labs is addressing the AI inference bottleneck in an unexpectedly sophisticated manner.

Leave a Reply Cancel reply