TL;DR
- Preview Launch: Thinking Machines Lab unveiled an AI preview on May 11 that is designed to keep listening while it speaks.
- System Design: The company says the model uses micro-turn processing and a 0.40-second response target, but outside users have not tested those claims.
- Access Test: A limited preview later in 2026 should show whether the system stays responsive during real interruptions and multimodal exchanges.
Mira Murati’s Thinking Machines Lab has moved into public view with interaction models unveiled on May 11, a full-duplex system for overlapping conversation in real time rather than rigid turn-taking. Outside users still cannot test the latency and quality claims because the preview remains closed.
Thinking Machines is trying to move from startup anticipation to a concrete argument about how conversational AI should behave. Instead of treating interruptions, pauses, and changing context as edge cases, it is pitching them as the normal state a voice system should handle. Developers and enterprise buyers are not being asked only to admire a smoother demo. They are being asked to judge whether the model can keep its rhythm once real users start talking over it.
Outside testing will begin with a limited research preview in the next few months, ahead of a wider release expected later in 2026. Early testers will be the first people outside the company able to see whether interruption handling stays smooth in ordinary conditions. They will also be the first to test whether switching among audio, video, and text remains coherent once network delay and messy human timing enter the loop.
The closed rollout also gives the lab a controlled way to collect the failure data it needs for deployment. Testers should be able to expose where the model loses context, how quickly it recovers after an interruption, and whether tool use slows the exchange while the background reasoning layer tries to keep pace. Buyers evaluating voice deployments need answers on all three points before a research preview can become a product plan.
How the System Is Supposed to Work
At the system level, the preview is built around a model that takes in audio, video, and text while it is thinking, responding, and acting in real time. In plain language, the design is meant to keep absorbing new information after it has already started speaking. If the approach works as presented, conversation would move away from strict back-and-forth turns and closer to a live exchange where interruptions do not automatically break the flow.
Thinking Machines Lab describes a model trained from scratch for this mode with a micro-turn design. Processing in 200-millisecond chunks instead of waiting for a speaker to finish a full turn is part of that pitch. A model that waits too long to process new input will sound hesitant even if its underlying reasoning is strong. That is why the preview focuses so heavily on overlap, timing, and recovery instead of treating them as cosmetic polish.
Its architecture also splits the job in two. A stream built around time-aligned micro-turns handles immediate interaction, while a separate background model is meant to take on deeper reasoning, tool use, and longer-horizon work. Together those layers are supposed to let the system react quickly without collapsing into shallow answers once a conversation becomes complicated. Separating fast interaction from slower reasoning also points to the engineering tradeoff behind the preview: speed on the front end only matters if the deeper layer does not fall behind.
Short slices matter because they let the system react before a speaker has fully finished. Pressure around that problem has already appeared in adjacent work on conversational adaptation, which helps frame this as a broader product problem rather than a one-company quirk. Voice quality also depends on everything around the model. Audio has to move with little delay. Speech has to resume cleanly after a cut-in. Users also need the system to stay coherent when a conversation changes direction mid-sentence.
Latency is what users will notice first. OpenAI’s 2024 multimodal launch of GPT-4o established an earlier expectation for live multimodal systems. OpenAI later explained how low and stable media round-trip time shapes natural voice interaction at scale, and the same discussion described delayed barge-in and clipped interruptions as common signs that a voice system cannot keep up.
Thinking Machines presents TML-Interaction-Small as a model that responds in 0.40 seconds. In the company’s benchmark framing, it is faster than comparable OpenAI and Google models. Outside users have not yet shown whether the system can hold that pace once conversations wander off-script. They also have not shown whether the same behavior survives quick interruptions or speech mixed with text and video over less predictable connections.
Natural voice systems also need fast connection setup and infrastructure that can support its more than 900 million weekly active users. The number does not prove Thinking Machines can meet the same demands, but it does show why a convincing voice assistant is as much a networking and systems problem as a model problem. Capacity, routing stability, and media transport can break a conversation even when the model itself sounds capable in a lab test.
Competition and Next Availability
Thinking Machines does not frame the preview as a category of one. Its post points to Moshi, PersonaPlex, Nemotron VoiceChat, and GPT-Realtime-Translate as smaller-scale or specialized full-duplex systems. Rival names like those place the preview in a broader assistant-platform contest instead of a one-off demo lane.
The release also enters a push to make voice interaction feel more fluid and natural than turn-taking allows. Google, OpenAI, and a widening set of startups are all trying to reduce the friction between listening and responding. Even a closed research preview lands in an active market race where responsiveness, reliability, and interruption handling are starting to matter as much as raw model quality.
Enterprise buyers add a second layer to that competition. Teams exploring voice interfaces are comparing responsiveness, tool invocation in mid-conversation, multimodal switching, reliability under load, and the amount of extra orchestration they still need to build around the model. A fast demo is not enough if buyers still have to patch around unstable handoffs, tool delays, or context loss. Spoken interaction turns silence, clipping, and timing mistakes into visible defects as soon as they happen, which is why this preview raises deployment questions before broad access begins. Procurement teams will also want to know how much custom work remains once the preview leaves the lab’s own controlled environment. Security reviews, logging requirements, and support for enterprise tools could matter almost as much as latency once buyers move from a demo to a pilot.
Murati, Funding, and the Company Backdrop
Thinking Machines Lab was founded in 2025 by former OpenAI chief technology officer Mira Murati, and Murati’s startup launch quickly made the company one of the more closely watched projects in the post-OpenAI startup wave. That history helps explain why a closed research preview is drawing attention beyond a normal early-stage product teaser.
Murati left OpenAI in September 2024 before building the new lab. Murati, the former OpenAI chief technology officer said at the time: “I want to create the time and space to do my own exploration.” Her explanation gives the company story a human motive, but the product case still depends on whether the system performs outside a controlled demo.
A $2 billion seed round in 2025 raised expectations around what the lab should be able to ship. A system that has to stay responsive under real load is not only a model problem. It is also a networking, orchestration, and deployment problem, which is why a research preview can carry business weight even before outside users can touch it.
One concrete checkpoint now sits ahead of the company: the first limited preview later in 2026 has to show that TML-Interaction-Small can keep its claimed 0.40-second pace when outsiders interrupt it in real time.


