Skip to main content

The AI Behind the Robots: Physical AI, VLA Models, and What They Mean for Safety

By Injured by Robots

The robots getting attention in headlines — humanoid machines walking through warehouses, surgical arms performing delicate procedures, autonomous vehicles navigating city streets — all depend on a layer of technology most people never see. The artificial intelligence powering these machines is advancing at a pace that has significant implications for human safety. Understanding the AI behind the robots is essential for anyone who works near them, is treated by them, or shares public spaces with them.

This article breaks down the major AI technologies now driving robotics forward, what each one does, and why they matter from a safety perspective.

Physical AI: NVIDIA’s Vision for Robots That Understand the Real World

Physical AI is a term popularized by NVIDIA to describe artificial intelligence that understands and operates in the physical world. Unlike traditional AI that processes text or images on a screen, Physical AI is designed to perceive three-dimensional environments, understand the physics of objects, and control machines that interact with real spaces and real people.

NVIDIA CEO Jensen Huang has been vocal about the significance of this moment. At CES 2026, Huang stated that the ChatGPT moment for robotics had arrived. At GTC 2026, he went further, predicting that every industrial company would become a robotics company.

NVIDIA’s Physical AI stack combines several components: world foundation models (Cosmos) that help robots understand how the physical world works, robot foundation models (GR00T) that drive robot behavior, simulation tools (Isaac Lab) for testing robot actions in virtual environments, and edge hardware (Jetson) that runs AI processing directly on the robot. We examine this broader infrastructure layer — including digital twins, edge AI, and Robot-as-a-Service models — in our guide to the infrastructure powering the robot revolution.

One notable aspect of this approach is its simulation-first safety paradigm. Rather than testing robots exclusively in the real world where failures can cause injuries, NVIDIA’s framework allows developers to test edge cases and failure scenarios at scale in simulated environments without physical risk. In theory, this means robots can encounter thousands of dangerous situations virtually before they ever operate near a human. However, simulation has inherent limitations. Virtual environments cannot perfectly replicate every variable in the real world, and a robot that performs safely in simulation may still behave unpredictably in conditions its training did not anticipate.

Agentic AI: Autonomous Decision-Making Without a Human in the Loop

Agentic AI refers to artificial intelligence that can plan, decide, and act with minimal human oversight. Rather than waiting for a specific command, an agentic AI system perceives its environment, reasons about what to do, and executes actions through a continuous perception-reasoning-action loop.

This capability is what enables a robot to do more than follow a scripted set of instructions. An agentic robot can dynamically decompose complex tasks into steps, collaborate with other robots or AI agents, and maintain persistent memory of past actions and outcomes. The market for agentic AI is projected to reach $52.62 billion by 2030, growing at a 46.3% compound annual growth rate, reflecting how rapidly this technology is being adopted across industries.

From a safety perspective, agentic AI introduces new layers of complexity. When a robot can make autonomous decisions in real time, predicting its behavior becomes more difficult. Multi-agent collaboration — where several robots or AI systems coordinate without direct human supervision — adds further unpredictability. If one agent in a multi-robot system makes an error in judgment, the cascading effects on other agents and nearby humans can be difficult to anticipate or contain.

Large Behavioral Models: Predicting the Next Action

Large Behavioral Models, or LBMs, are the robotics equivalent of the Large Language Models (LLMs) that power tools like ChatGPT. Where an LLM predicts the next word in a sentence, an LBM predicts the next physical action a robot should take.

The concept was pioneered by Toyota Research Institute (TRI), which announced the approach in September 2023. A key technique underlying LBMs is Diffusion Policy, developed in collaboration with Columbia University and MIT, which allows robots to learn smooth, continuous behaviors from demonstration data rather than rigid programmed sequences.

In October 2024, Boston Dynamics and TRI announced a partnership to apply Large Behavioral Models to the Atlas humanoid robot. By August 2025, a single LBM was demonstrated controlling the entire body of Atlas — a significant milestone showing that one unified AI model could coordinate all of a complex humanoid robot’s movements.

One of the practical safety benefits of LBMs is that new capabilities can be added to a robot without writing new code from scratch. This reduces the risk of introducing programming errors when expanding what a robot can do. Instead of a software engineer manually coding every movement sequence, the robot learns from demonstrations, and the model generalizes to new situations.

The risk, however, is that learned behaviors can be unpredictable in ways that hand-coded behaviors are not. A robot operating on a Large Behavioral Model may perform a task successfully hundreds of times, then respond in an unexpected way to a novel situation its training did not adequately cover.

VLA Models: Connecting Vision, Language, and Action

Vision-Language-Action models, or VLA models, represent one of the most active areas of robotics AI research. A VLA model takes in visual input from cameras, receives a natural language instruction from a human, and outputs the physical actions the robot should perform. In essence, you can tell a VLA-powered robot what to do in plain English, and it translates that instruction into movement.

Several major VLA models have emerged in recent years. Physical Intelligence developed pi-zero and pi-zero-point-five. Figure AI built Helix for its humanoid robots. NVIDIA created GR00T N1. Google DeepMind developed Gemini Robotics. Hugging Face released SmolVLA as a lightweight option, and the open-source community produced OpenVLA.

VLA models offer a significant potential safety advantage: they enable natural language safety instructions. A supervisor could, in principle, tell a robot to avoid a certain area, slow down near workers, or stop a task entirely using ordinary speech. But this capability comes with a corresponding risk. Natural language is inherently ambiguous. A command that seems clear to a human may be misinterpreted by a VLA model, leading the robot to take an action the instructor did not intend. The gap between what a person means and what a robot does remains a meaningful source of potential harm.

Gemini Robotics: Google DeepMind’s Entry Into Physical AI

Gemini Robotics is Google DeepMind’s robotics AI platform, built on top of Gemini 2.0. What distinguishes this system is that it treats physical actions as a new output modality alongside text, images, and code. The model does not just understand the world — it is designed to act in it.

The platform includes three tiers: Gemini Robotics, the flagship VLA model for direct robot control; Gemini Robotics-ER (Embodied Reasoning), which provides higher-level planning and reasoning for complex tasks; and Gemini Robotics On-Device, optimized for running directly on robot hardware with lower latency.

Gemini Robotics is designed to work across multiple robot platforms, including the ALOHA research robot, Franka robotic arms, and the Apptronik Apollo humanoid. Google DeepMind has established a Trusted Tester program with partners including Agile Robots, Agility, Boston Dynamics, and Enchanted Tools.

On the safety front, Google DeepMind developed the ASIMOV safety benchmark for evaluating robot behavior. In one specific category — bias-inducing pointing queries — fine-tuned models achieved a 96% rejection rate, up from a 20% baseline. While this is a strong result in that narrow category, the broader challenge of ensuring robot safety across all possible instructions remains an active area of research, particularly as deployments scale.

GR00T N2: NVIDIA’s Next-Generation Robot Foundation Model

GR00T N2, previewed at GTC 2026 (March 16-19, 2026), is NVIDIA’s latest robot foundation model. It introduces a new “world action model” architecture derived from research called DreamZero. This approach allows the model to reason about the physical world and plan actions in a more integrated way than prior architectures.

GR00T N2 currently ranks first on the MolmoSpaces and RoboArena benchmarks, and NVIDIA reports that it helps robots succeed at new tasks twice as often as leading VLA models. The breadth of its industry partnerships is notable: ABB, AGIBOT, Agility, FANUC, Figure, KUKA, Universal Robots, and YASKAWA are all working with the platform. These are major names in industrial and collaborative robotics, meaning GR00T N2 technology could reach factory floors and warehouses at significant scale.

The performance improvements that GR00T N2 offers are meaningful for safety. A robot that succeeds at tasks more reliably is, in many scenarios, a safer robot — fewer failures mean fewer unexpected movements and fewer opportunities for injury. But higher capability also means robots will be trusted with more complex and potentially more dangerous tasks, which raises the stakes when failures do occur.

Rho-Alpha: Microsoft’s Multimodal Robotics Model

Rho-alpha (stylized as the Greek letters) is Microsoft’s first robotics AI model, announced in January 2026. It extends the VLA model concept by adding tactile sensing to the standard combination of vision, language, and action. Microsoft describes it as a VLA+ model.

The addition of touch sensing enables rho-alpha to handle precision bimanual manipulation tasks — actions requiring two hands working together with fine control, such as inserting a plug into a socket or turning a knob. These are tasks that have historically been difficult for robots and are common in manufacturing, assembly, and maintenance work.

One identified risk with rho-alpha is catastrophic forgetting, a phenomenon where an AI model loses previously learned capabilities when trained on new tasks. In a robotics context, this means a robot that learns a new skill could simultaneously become worse at a skill it previously performed safely. If a robot that was reliable at one task becomes unreliable after a software update that adds new capabilities, workers who trusted the robot based on its prior track record could be put at risk.

Why This Matters for Safety and Liability

Each of these AI technologies brings genuine improvements to robot capability and, in many cases, to robot safety. Simulation-based testing, natural language safety commands, and more reliable task execution all have the potential to reduce injuries. But each technology also introduces new categories of risk that existing safety standards and regulations were not designed to address.

When a robot powered by a VLA model misinterprets a spoken instruction and injures a worker, determining liability involves questions that product liability law is only beginning to grapple with. Was the fault in the AI model, the training data, the hardware, the instruction given, or the employer’s decision to deploy the system? When a Large Behavioral Model exhibits an unexpected behavior in a situation outside its training distribution, who bears responsibility?

These questions are not hypothetical. As the technologies described in this article move from research labs into workplaces, hospitals, and public spaces, the number of people interacting with AI-powered robots will grow rapidly. The legal and regulatory frameworks governing these interactions are still catching up.

What Should You Do Next?

If you or someone you know has been injured by a robot or automated system in the workplace, in a medical setting, or in a public space, understanding the technology involved is an important first step. The AI systems controlling modern robots are complex, and establishing what went wrong often requires specialized knowledge.

You may have legal options beyond workers’ compensation, including product liability claims against the robot manufacturer, the AI developer, or the company that deployed the system. An attorney experienced in robotics injury cases can evaluate your situation and explain what remedies may be available.

To find out whether you have a case, request a free case review. There is no obligation, and the consultation can help you understand your rights.


This article is for informational purposes only and does not constitute legal advice. Injured By Robots LLC is not a law firm. Laws vary by state and may have changed since publication. Consult a licensed attorney in your state for advice about your specific situation.

Ready to Find Out If You Have a Case?

If you or a loved one was injured, disabled, or killed, submit your information for a free case review. We connect you with an attorney who can help. No cost, no obligation.

Start My Free Case Review
Free consultation No obligation Secure