Robotics pilots need incident playbooks before scale

Robotics pilots are exciting because they make AI feel physical.

A robot moves through a warehouse. A humanoid carries something across a lab. A mobile system navigates a facility. A demo shows perception, planning, motion, speech, and autonomy folded into one body that looks like it might actually be useful.

That is the point where teams should slow down.

Not forever. Just long enough to ask the boring question:

What happens when this thing behaves badly?

If the answer is vague, the pilot is not ready to scale.

The problem

Robotics pilots often get judged by task success.

Can the robot pick the object? Can it navigate the hallway? Can it respond to a person? Can it work for an hour? Can it operate outside the lab? Can it reduce one slice of labor?

Those questions matter, but they are not enough.

A pilot that succeeds nine times and fails once in a confusing, undocumented, unrecoverable way is not production-ready. It is a demonstration with an unpriced failure mode.

Physical systems need incident playbooks because failures do not stay inside a chat window.

They can affect people, equipment, property, schedules, trust, insurance, regulatory posture, and the public story around the deployment.

That does not mean robotics teams should freeze.

It means the pilot needs a response system before the pilot becomes a rollout.

The rule of thumb

Do not scale a robotics pilot until the incident playbook is more specific than the demo script.

A demo script says what should happen.

An incident playbook says what to do when it does not.

The playbook does not need to predict every possible failure. It does need to define authority, evidence, stop conditions, rollback, communication, and review.

That is the same pattern behind why autonomous agents need rollback plans and the practical safety checklist for coding agents. Useful autonomy creates momentum. Momentum needs brakes that people can actually reach.

Robots make that lesson literal.

What the playbook needs

First, define stop rules.

What makes the operator stop the system immediately? Contact with a person? Repeated navigation uncertainty? Lost localization? Unexpected object handling? Entering a restricted zone? A failed sensor? A command outside the pilot scope?

If stop rules are not written down, the operator has to negotiate with social pressure in real time. That is not fair to the operator, and it is not safe for the pilot.

Second, assign authority.

Who can pause the robot? Who can restart it? Who can override the schedule? Who can declare the pilot unsafe for the day? If the answer is “the team will decide,” the answer is not ready.

Third, capture evidence.

The playbook should say what gets preserved after an incident: logs, video, sensor state, model outputs, operator notes, environment conditions, task instructions, timestamps, and the exact version of the software or configuration.

Fourth, separate recovery from blame.

The first job is to make the scene safe, preserve evidence, and restore a known state. The root-cause review comes after that. If teams jump straight to blame, they destroy the evidence they need.

Fifth, define rollback.

Can the pilot return to manual work? Can the robot be removed from the route? Can the task queue be paused? Can the previous software/configuration be restored? Can the site continue without the robot?

Sixth, set communication rules.

Who tells the floor team? Who tells leadership? Who tells the client? Who tells the vendor? Who writes the internal incident note? Who decides whether anything public must be said?

Seventh, require a review before scaling.

Every serious incident should produce a written decision: continue as-is, continue with constraints, change the system, change the environment, retrain operators, narrow the scope, or stop.

No quiet drift.

A practical pilot checklist

Before a robotics pilot scales, the team should be able to answer:

What is the robot allowed to do?
What is explicitly out of scope?
What are the stop conditions?
Who has stop authority?
How does the operator stop it?
What logs and video are retained?
How long are they retained?
What privacy rules apply?
What is the manual fallback?
Who restarts the system?
What requires vendor review?
What requires leadership review?
What evidence must be reviewed before expansion?

This is not bureaucracy.

This is the minimum structure that lets a pilot learn without pretending every surprise is acceptable.

The traps

The first trap is demo confidence.

A good demo makes the robot feel more mature than the operating environment around it. The pilot needs the environment to be designed too.

The second trap is operator theater.

Putting a person nearby is not the same as giving them authority, training, stop tools, and permission to interrupt the rollout.

The third trap is missing evidence.

If an incident happens and nobody can reconstruct the robot state, operator action, environment, and configuration, the team is left arguing from vibes.

The fourth trap is silent scope creep.

The robot starts with one route, one task, and one environment. Then the team adds another shift, another object, another room, another condition, and another exception. The playbook has to move with the scope.

The fifth trap is public-story lag.

Physical AI deployments can become public fast if something odd happens. If the internal explanation is not disciplined, the external explanation will be worse.

Verdict

Robotics pilots need incident playbooks before scale because useful robots create real-world consequences.

That is not a reason to avoid robotics.

It is a reason to make the pilot honest.

Define stop rules. Assign authority. Preserve evidence. Plan rollback. Review incidents before expanding scope.

If the robot is ready to leave the lab, the team should be ready to explain what happens when the lab leaves the robot.

— Cara

The problem

The rule of thumb

What the playbook needs

A practical pilot checklist

The traps

Verdict

Related field notes

AI evaluations need the harness

A practical safety checklist for coding agents

AI browser agents need a safe browsing budget