{"id":3488636,"date":"2026-03-23T16:00:00","date_gmt":"2026-03-23T16:00:00","guid":{"rendered":"https:\/\/techingeek.com\/index.php\/2026\/03\/23\/startup-gimlet-labs-is-addressing-the-ai-inference-bottleneck-in-an-unexpectedly-sophisticated-manner\/"},"modified":"2026-03-23T16:00:00","modified_gmt":"2026-03-23T16:00:00","slug":"startup-gimlet-labs-is-addressing-the-ai-inference-bottleneck-in-an-unexpectedly-sophisticated-manner","status":"publish","type":"post","link":"https:\/\/techingeek.com\/index.php\/2026\/03\/23\/startup-gimlet-labs-is-addressing-the-ai-inference-bottleneck-in-an-unexpectedly-sophisticated-manner\/","title":{"rendered":"Startup Gimlet Labs is addressing the AI inference bottleneck in an unexpectedly sophisticated manner."},"content":{"rendered":"<div><img decoding=\"async\" src=\"https:\/\/techingeek.com\/wp-content\/uploads\/2026\/03\/startup-gimlet-labs-is-addressing-the-ai-inference-bottleneck-in-an-unexpectedly-sophisticated-manner.jpg\" class=\"ff-og-image-inserted\"><\/div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Zain Asgar, an adjunct professor at Stanford and a successful entrepreneur, has secured an $80 million Series A funding for a startup addressing the AI inference bottleneck issue in an insightful fashion. Menlo Ventures led this investment round.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The startup, Gimlet\u202fLabs, claims to have developed the first and only \u201cmulti-silicon inference cloud,\u201d which is software enabling simultaneous execution of AI workloads across various hardware types. It can distribute an AI application\u2019s tasks among both conventional CPUs and AI-optimized GPUs, as well as high-memory architectures.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cIn essence, we operate across all available hardware types,\u201d Asgar shared with TechCrunch.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">One agent may connect several steps together, each requiring distinct hardware: Inference is compute-bound; decoding is memory-bound; and tool calls are network-bound, explains lead investor, Tim Tully of Menlo, in a blog post regarding the funding.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">No single chip currently does it all, but as new hardware is introduced and older GPUs are repurposed, \u201cthe multi-silicon fleet is prepared \u2014 it merely needs the software layer to function.\u201d This is what Tully believes Gimlet\u202fLabs\u202fprovides.<\/p>\n<p class=\"wp-block-paragraph\">If the ongoing trend of deploying more computing resources persists, McKinsey predicts that spending on data centers will reach nearly $7 trillion by 2030. Asgar mentions that existing applications are utilizing the current hardware deployed \u201conly between 15 to 30 percent\u201d of the time.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cAnother perspective is that you\u2019re wasting hundreds of billions of dollars by permitting resources to sit idle,\u201d he commented. \u201cOur goal was essentially to determine how to make AI workloads 10x more efficient than ever before, today.\u201d\u00a0<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco, CA<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 13-15, 2026<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">As a result, he and his co-founders, Michelle Nguyen, Omid Azizi, and Natalie Serrino, began to develop orchestration software that breaks down agentic workloads, allowing them to be concurrently distributed across various hardware infrastructures.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Gimlet Labs asserts that it can enhance AI inference speed by 3x to 10x without increasing cost or power consumption. Gimlet\u202fclaims it can even partition the underlying model to run across different architectures, selecting the optimal chip for each segment of the model.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The firm has established partnerships with chip manufacturers NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Gimlet\u2019s offering, available as software or via an API to its Gimlet Cloud, is not intended for the general AI application developer. It targets the largest AI model laboratories and data centers.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The company officially launched in October, reporting eight-figure revenues right from the start (at least $10 million). Asgar noted that their customer base has more than doubled in the past four months and now includes a significant model manufacturer and an extremely large cloud computing firm, though he opted not to disclose their names.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The co-founders had previously collaborated at Pixie, a startup that developed an open-source observability tool for Kubernetes. Pixie was acquired by New Relic in 2020, just two months post-launch with a $9 million Series A led by Benchmark. (Pixie\u2019s technology is now part of the open-source organization that manages Kubernetes.)\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">After Asgar coincidentally met Tully about a year ago and also secured angel investments from Stanford faculty, venture capitalists began reaching out. Following the launch, a term sheet arrived on Asgar\u2019s desk. When VCs discovered that Asgar was evaluating offers, \u201cwe received a significant influx of funding,\u201d and the round was quickly oversubscribed, he stated.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">With the prior seed funding, the startup has now raised a total of $92 million, backed by numerous angels, including Sequoia\u2019s Bill Coughran, Stanford Professor Nick McKeown, former VMware CEO Raghu Raghuram, and Intel CEO Lip-Bu Tan. The company currently has a workforce of 30 people.<\/p>\n<p class=\"wp-block-paragraph\">Other investors consist of Factory, which led the seed funding, Eclipse Ventures, Prosperity7, and Triatomic.<\/p>\n","protected":false},"excerpt":{"rendered":"<div><img decoding=\"async\" src=\"https:\/\/techingeek.com\/wp-content\/uploads\/2026\/03\/startup-gimlet-labs-is-addressing-the-ai-inference-bottleneck-in-an-unexpectedly-sophisticated-manner.jpg\" class=\"ff-og-image-inserted\"><\/div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Zain Asgar, an adjunct professor at Stanford and a successful entrepreneur, has secured an $80 million Series A funding for a startup addressing the AI inference bottleneck issue in an insightful fashion. Menlo Ventures led this investment round.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The startup, Gimlet\u202fLabs, claims to have developed the first and only \u201cmulti-silicon inference cloud,\u201d which is software enabling simultaneous execution of AI workloads across various hardware types. It can distribute an AI application\u2019s tasks among both conventional CPUs and AI-optimized GPUs, as well as high-memory architectures.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cIn essence, we operate across all available hardware types,\u201d Asgar shared with TechCrunch.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">One agent may connect several steps together, each requiring distinct hardware: Inference is compute-bound; decoding is memory-bound; and tool calls are network-bound, explains lead investor, Tim Tully of Menlo, in a blog post regarding the funding.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">No single chip currently does it all, but as new hardware is introduced and older GPUs are repurposed, \u201cthe multi-silicon fleet is prepared \u2014 it merely needs the software layer to function.\u201d This is what Tully believes Gimlet\u202fLabs\u202fprovides.<\/p>\n<p class=\"wp-block-paragraph\">If the ongoing trend of deploying more computing resources persists, McKinsey predicts that spending on data centers will reach nearly $7 trillion by 2030. Asgar mentions that existing applications are utilizing the current hardware deployed \u201conly between 15 to 30 percent\u201d of the time.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cAnother perspective is that you\u2019re wasting hundreds of billions of dollars by permitting resources to sit idle,\u201d he commented. \u201cOur goal was essentially to determine how to make AI workloads 10x more efficient than ever before, today.\u201d\u00a0<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco, CA<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 13-15, 2026<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">As a result, he and his co-founders, Michelle Nguyen, Omid Azizi, and Natalie Serrino, began to develop orchestration software that breaks down agentic workloads, allowing them to be concurrently distributed across various hardware infrastructures.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Gimlet Labs asserts that it can enhance AI inference speed by 3x to 10x without increasing cost or power consumption. Gimlet\u202fclaims it can even partition the underlying model to run across different architectures, selecting the optimal chip for each segment of the model.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The firm has established partnerships with chip manufacturers NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Gimlet\u2019s offering, available as software or via an API to its Gimlet Cloud, is not intended for the general AI application developer. It targets the largest AI model laboratories and data centers.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The company officially launched in October, reporting eight-figure revenues right from the start (at least $10 million). Asgar noted that their customer base has more than doubled in the past four months and now includes a significant model manufacturer and an extremely large cloud computing firm, though he opted not to disclose their names.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The co-founders had previously collaborated at Pixie, a startup that developed an open-source observability tool for Kubernetes. Pixie was acquired by New Relic in 2020, just two months post-launch with a $9 million Series A led by Benchmark. (Pixie\u2019s technology is now part of the open-source organization that manages Kubernetes.)\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">After Asgar coincidentally met Tully about a year ago and also secured angel investments from Stanford faculty, venture capitalists began reaching out. Following the launch, a term sheet arrived on Asgar\u2019s desk. When VCs discovered that Asgar was evaluating offers, \u201cwe received a significant influx of funding,\u201d and the round was quickly oversubscribed, he stated.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">With the prior seed funding, the startup has now raised a total of $92 million, backed by numerous angels, including Sequoia\u2019s Bill Coughran, Stanford Professor Nick McKeown, former VMware CEO Raghu Raghuram, and Intel CEO Lip-Bu Tan. The company currently has a workforce of 30 people.<\/p>\n<p class=\"wp-block-paragraph\">Other investors consist of Factory, which led the seed funding, Eclipse Ventures, Prosperity7, and Triatomic.<\/p>\n","protected":false},"author":2,"featured_media":3488637,"comment_status":"open","ping_status":"closed","sticky":false,"template":"Default","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3488636","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/posts\/3488636"}],"collection":[{"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/comments?post=3488636"}],"version-history":[{"count":0,"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/posts\/3488636\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/media\/3488637"}],"wp:attachment":[{"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/media?parent=3488636"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/categories?post=3488636"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techingeek.com\/index.php\/wp-json\/wp\/v2\/tags?post=3488636"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}