The Missing Data Layer for Physical AI | James Kujareevanich, Vision Lab

James Kujareevanich is the co-founder and CEO of Vision Lab, a company building the data infrastructure for physical AI - going into factories around the world, capturing first-person video of real operators performing real tasks, and turning that into structured training data for frontier labs and robotics companies. They recently closed a $6M round to scale.

Watch to the episode here:

Summary

I asked James why robotics doesn't have its equivalent of the internet - the massive existing corpus that LLMs were trained on. His answer was simple: you have to be intentional about collecting this data. It doesn't exist in the wild. YouTube and TikTok have billions of hours of video, but almost none of it is structured in a way that helps a robot learn to do a task. The camera angles are wrong, the hands are out of frame, the process context is missing, and the footage is edited into clips that break the continuous task episodes models need.

That's the gap Vision Lab sits in. And the way they got there is one of the better origin stories I've come across.

A PhD, a camera, and an accidental business

James's co-founder was an MIT PhD student who strapped a camera to his head to automate his own lab documentation. This was back in 2024, before egocentric data was a thing anyone was talking about. He'd capture his lab experiments on video and let vision models auto-populate the documentation. James, coming from McKinsey Consulting in Bangkok, saw the commercial potential and convinced him to take it into factories.

Their first product was a human training tool - an iPad app where new factory operators could watch first-person video of experienced workers and ask questions about procedures. They were selling it to factories as a training platform. Then around Christmas, they met an AI researcher who told them something that changed the trajectory of the company: whatever you're teaching humans is going to be very valuable for teaching robots as well.

They started sharing data with that lab. The demand turned out to be massive. They pivoted entirely. And in a neat twist, their former customers - the factories - became their data suppliers. Instead of factories paying Vision Lab for training tools, Vision Lab now pays factories for access.

What the data product looks like

I asked James what I'd get if I came to him tomorrow as a frontier lab. His answer surprised me with how un-standardized the space still is.

Some labs want egocentric first-person video. Others want side cameras, 360 cameras, or depth cameras. Some want teleoperation data with full rig setups. Others want tactile pressure sensor data. And the annotation requirements vary wildly - some want hand pose extraction with keypoint identification on every frame, others want coarser action labeling every ten seconds.

Vision Lab has to run what amounts to a custom project for each client. They've made a bet on where the consensus is heading - the research approach that roughly 80% of labs are converging on - and invested their hardware and pipeline around that. But there's no industry standard for robotics training data. Not yet.

Boots on the ground, cameras in the mail

The factory capture process started the way YC tells you to start anything - doing things that don't scale. James's dad runs a factory, so that's where they ran their first captures. They showed up personally, strapped cameras to operators' heads, transferred footage to laptops, and uploaded everything manually.

Now they've built a playbook. They ship cameras to a factory, run a two-to-three-week sprint with a project manager on the client side, and handle all the downstream processing - cleaning, annotation, hand pose extraction - from their end. When the sprint is done, the factory ships the cameras back.

The hardest part isn't the logistics. It's getting in the door. Factory owners are understandably wary of letting outsiders film their operations. James described building trust through strict data consent processes and what he calls "industrial influencers" - people embedded in manufacturing networks who can vouch for the team and help open doors. They've scaled this contractor network across India, Thailand, Vietnam, Indonesia, and the Philippines.

Labs are still vibe-checking the data

This was one of the more striking things James said. I asked him what he learned from pilot engagements with multiple frontier labs. His answer: nobody knows what the gold standard of data actually is.

Every lab has a different definition of "good." Some want atomic, frame-by-frame action labeling. Others are fine with rough annotations every ten seconds. When James asked one lab how they evaluate data quality, the answer was essentially vibes. They eyeball it.

That tells you something about where the field is. The models are hungry for data, the labs are spending real money on it, but the evaluation methodology is still remarkably informal. It's a project-by-project business right now, with each client running parallel data pipelines with different requirements.

Why synthetic data isn't the replacement people think it is

I put the synthetic data question to James directly - companies are building simulation environments and arguing you don't need real-world footage. His response was sharp.

If synthetic data could replace real data, he said, he wouldn't be sitting here talking to me. The frontier labs are buying real data. That tells you what you need to know.

He made a chaos theory argument that I found compelling. Even if a simulated frame is 99.99% physically accurate, the errors compound across frames. The first frame might be 1% off. The second frame inherits that error and adds its own. By the fifth minute, the synthetic environment has drifted into something that no longer represents reality. How many nines of fidelity do you need before the compounding stops mattering? Nobody has a good answer yet.

His position is pragmatic: he's following the money. Real data is where the value is today. If synthetic data becomes genuinely useful in three years, Vision Lab will go there too. But right now, the demand signal from labs is clear.

80% of raw footage is unusable

When I asked about defensibility, James brought up an interesting number... even raw footage they source from outside their own operations - 80% of it isn't usable. Devices aren't up to standard. The camera is pointed the wrong way. Hands leave the frame. The capture doesn't cover a complete task episode.

And that's just the video. The real work is downstream - cleaning, annotating, extracting hand poses accurately, making sure the action labels are correct. People underestimate how hard this is and jump in thinking they can win on price. They sell cheap data. But cheap data isn't what labs want. Labs want good data, and they'll pay a premium for it.

James drew a parallel to the LLM data space, where you don't see one dominant vendor. Scale AI exists alongside multiple other providers, each carving out a niche. Frontier labs don't want to bet everything on one supplier - they want options, and they want quality. The bar for being a credible supplier is high enough that most entrants wash out.

The Siemens of robotics

James's ten-year vision goes beyond data collection. The way he sees it, data creation is the short-term business. Once robots get trained on Vision Lab's data and need to be deployed in real environments, they'll need physical RL - reinforcement learning in the real world, not in simulation. Vision Lab's factory network, the same infrastructure they built for data collection, becomes the deployment and testing layer.

He wants to be for physical AI what Scale AI is for LLMs, but extended into the physical world. Labs focus on what they do best - the tech, the model building - and Vision Lab handles the real-world execution. He used the phrase "the Siemens of robotics" to describe the endgame: the company that connects frontier AI to the physical environments where robots need to work.

It's an ambitious frame. But the network of factories is real, the lab relationships are real, and the $6M round suggests others see the same bet.

This was a fun conversation. James is unusually candid about what's working, what isn't, and what he doesn't know yet. The fact that he's a businessman first, not a roboticist, gives him a different lens on the space. He's not attached to any particular technical approach - he's reading the demand signals and building the infrastructure to serve them.

If you're building in physical AI, investing in it, or thinking about where the robotics data supply chain is heading, this one is worth your time.

Give it a listen.

Full episode available on:

Learn more about Vision Lab: https://thevisionlab.ai

Watch the recording:

The Missing Data Layer for Physical AI | James Kujareevanich, Vision Lab

Summary

A PhD, a camera, and an accidental business

What the data product looks like

Boots on the ground, cameras in the mail

Labs are still vibe-checking the data

Why synthetic data isn't the replacement people think it is

80% of raw footage is unusable

The Siemens of robotics

Read more

Vision Lab raises $6M bringing real-world factory data to robotics foundation models

The Intelligence Layer for Precision Manufacturing

Building the Foundry for Physical AI | Mike Xia, Anvil Robotics

The Optical Networking Ecosystem, Explained