Safety Guardrails for CUAs
Implement safety guardrails to prevent CUA errors from causing real-world damage — especially for consequential actions.
Why Safety Matters More for CUAs
Traditional code bugs are usually recoverable — you can redeploy. CUA errors in the real world may not be: a form submitted with wrong data, a file permanently deleted, an email sent to the wrong recipient. The autonomous nature of CUAs makes safety design non-optional.
The Principle of Least Capability
Give CUAs the minimum permissions needed:
- Read-only where possible (extract data before entering any)
- No access to production credentials until staging is validated
- Time-limited sessions so credentials can't be reused indefinitely
- Sandboxed environments for testing
Human-in-the-Loop Checkpoints
For consequential actions, require human approval:
"Before submitting any form, pause and show me:
- A screenshot of the completed form
- A summary of what will be submitted
- Wait for my explicit 'proceed' or 'cancel' before continuing"
This is the most important safety pattern. Slow is safe.
The "Dry Run" Pattern
Before executing a real workflow, run it in dry-run mode:
"Execute this workflow but do not click 'Submit', 'Confirm', or 'Delete' on any final confirmation step. Navigate to the point of each irreversible action and stop. Show me what would happen if I proceeded."
Irreversibility Classification
| Action | Reversibility | Safety Level | |--------|--------------|-------------| | Reading a page | N/A | Safe | | Filling a form (not submitted) | Reversible | Low risk | | Submitting a form | Variable | Medium risk | | Sending an email | Irreversible | Requires approval | | Deleting data | Irreversible | Requires human | | Financial transactions | Irreversible | Never automate without explicit controls |
Error Logging and Alerting
def safe_cua_execute(task, irreversible_keywords=None):
if irreversible_keywords is None:
irreversible_keywords = ["submit", "delete", "send", "pay"]
result = cua.run(task)
if any(k in result.actions for k in irreversible_keywords):
alert_human(f"CUA performed irreversible action: {result.last_action}")
log_all_actions(result.action_log)
return result
Always alert when irreversible actions are taken, even if they were authorized.