From data collection to operational intelligence: How deployed robots become your best training ground

In robotics, collecting training data dominates discussions about building truly autonomous machines. Companies build expensive custom infrastructure—domain randomization pipelines, synthetic data generation, elaborate sensor rigs—in order to produce simulated data in controlled environments. There are use cases where this level of investment makes sense: autonomous vehicles testing dangerous scenarios, surgical robots, or robotic systems in space that can't (or shouldn’t) be validated in orbit.

At the same time, companies often underinvest in operational data, gathered from live robots deployed with real customers, despite it being cheaper to collect and more relevant to real-world performance. Most teams understand that learning from deployed robots is valuable: the challenge is how to collect, query, and act on that data without building an entire data infrastructure from scratch to support it.

Operational data: learning from robots in the field

Unlike training data collected in controlled lab environments or generated through simulation, operational data comes from your robots during their normal operation in the real world. This includes performance metrics under varying conditions, failure modes you didn't anticipate, usage patterns from actual customers, and environmental variables that affect behavior.

Because this data comes from machines already in the field, it's inherently cheaper to collect than building specialized training data pipelines. More importantly, it's directly relevant to the deployment scenarios you care about—real customers, real environments, real edge cases—making it immediately applicable to improving product performance, providing efficient support, and executing predictive maintenance.

What 'good' looks like: infrastructure that gets out of your way

The infrastructure that enables operational intelligence has specific characteristics:

Automatic synchronization from edge to cloud. Data flows from deployed robots to your analysis environment without manual intervention. No SSH-ing into individual machines. No physical media retrieval. The system handles intermittent connectivity gracefully—robots operate offline, and data syncs automatically when the connection is restored.
Fleet-wide query capabilities. You need to make requests across your entire deployment, like: Show me all instances where motor temperature exceeded X in the past week. Or: What's the correlation between firmware version and collection success rate? This needs to be done using familiar tools like SQL queries, not custom scripts, and out-of-the-box visualization tools to be interpretable and sharable across a team.
Correlation across hardware, software, and environmental variables. The most valuable insights emerge from understanding relationships: How does battery voltage affect performance degradation? Which environmental conditions correlate with specific failure modes? Your infrastructure should make these explorations natural rather than requiring custom data pipeline work.‍
Low engineering overhead. If you need dedicated engineers to collect and analyze operational data, you'll never do it consistently. The best infrastructure is configured once and works automatically in the background while your team focuses on product development.

Case study: Tennibot’s weekly development cycles

Tennibot is a robotics and AI company managing thousands of robotic ball machines and ball collectors for tennis, padel, and pickleball. The team, under the direction of CEO and founder Haitham Eletrabi, captures operational data from every machine to refine their ball collection and shooting algorithms, optimize robot behavior for different court surfaces and weather conditions, and identify performance patterns across thousands of training sessions.

Tennibot uses Viam's data capture and synchronization capabilities to automatically collect sensor data, performance metrics, and operational logs from their fleet. Data flows from robots to cloud storage without manual intervention. Viam's query interface lets the team ask fleet-wide questions using SQL: At X RPM, what's the success rate for ball collection on clay courts?

They can correlate hardware performance metrics (motor temperature, battery voltage, wheel speed) with software behavior (algorithm decisions, trajectory calculations) and environmental variables (court surface, lighting conditions, player skill level) to identify optimization opportunities and predict performance across different scenarios.

When they identify an optimization—say, adjusting ball collection timing based on court surface type—they can deploy the algorithm update via over-the-air (OTA) updates through Viam's fleet management, validate the improvement by querying performance metrics across all affected robots, and iterate again. "With Viam, you flip a switch to capture data in real time,” says Haitham. “That gives our team the insights needed to roll out updates weekly instead of monthly."

Tennibot’s entire development cycle can be completed in a short period of time because their data infrastructure is in place as a default, rather than being a weekly engineering project.

‍Most existing data stacks aren’t built for robotics

If operational data is so valuable, then why aren't more robotics companies leveraging it effectively? In short, most robotics stacks weren't designed with fleet-scale operational intelligence in mind. ROS (Robot Operating System), widely adopted in robotics, has timing synchronization challenges—network latency and unguaranteed message ordering due to publish-subscribe architecture. This makes it difficult to reliably correlate events across distributed nodes without additional tooling.

If you've spent late nights debugging timestamp mismatches or building custom bag file parsers, you know this pain. Some builders have turned to generic cloud platforms as an alternative, but since these weren't built for the unique challenges of hardware operations—like intermittent connectivity, time-series sensor data, correlating physical and software states—these can also be difficult to work with. Custom solutions may seem viable, but they require substantial engineering resources that early-stage hardware companies typically don't have.

Five requirements for operational intelligence

If you're building a hardware company that will deploy robots in the field, evaluate your data infrastructure against these requirements from day one:

Automatic data synchronization from deployed robots
No manual file retrieval. No SSH-ing into individual machines. Data should flow from robot to cloud automatically, even when connectivity is intermittent.

Fleet-wide query capabilities
You need to ask questions across your entire deployment: What percentage of robots experienced error code X last week? How does performance vary by geographic region? SQL or similar querying should be straightforward—Viam, for example, provides SQL interfaces for fleet-wide queries.

Correlation of hardware, software, and environmental variables
The most valuable insights come from understanding relationships: How does motor temperature affect performance? Does firmware version correlate with failure rates? Your infrastructure should make these correlations easy to explore without custom data pipeline work.

Low overhead for your engineering team
If collecting and analyzing data requires dedicated infrastructure engineers, you'll never do it consistently. The best infrastructure requires configuration work upfront, then operates automatically while your team focuses on product development.

Offline operation with automatic sync
Robots often work in environments with poor or intermittent connectivity. Your infrastructure should handle offline operation gracefully and sync data when connection is restored without manual intervention.

These requirements will help you understand whether you're evaluating operational intelligence infrastructure or just basic data collection tools that will require a dedicated team to maintain.

The robots you've already deployed are your best teachers

The question facing hardware teams isn't whether to collect data from deployed robots. Every hardware company collects some data. The question is whether you can turn that data into product improvements faster than your competition—and whether you're spending engineering time building data infrastructure or building your actual product.

Setting up purpose-built data infrastructure early—before you have hundreds of robots in the field—saves you from retrofitting data collection onto systems that weren't designed for it. It means your team can focus on analyzing insights and iterating on product improvements rather than maintaining custom scripts to retrieve log files from deployed machines.

Your competitors can copy your hardware. They can reverse-engineer your algorithms. But they can't replicate the operational intelligence you've built from hundreds or thousands of robots learning in real-world conditions. As long as you have the right infrastructure in place from day one that lets you collect, analyze, and learn from that data.