When AI Agents in ServiceNow Stop Assisting and Start Acting

The conversation companies are still having

The current enterprise AI security debate centers on data leakage: employees uploading confidential files to consumer LLMs, sensitive information reaching third-party systems. Reuters reported that courts are already treating some AI sharing as disclosure to a third party — which shows how fast the legal framing is catching up with operational reality.

Source: Reuters — Artificial intelligence tools: A third party by any other name?

This is a real problem. It is also a problem that organizations can address with access controls, approved tooling, and policy enforcement.

The harder problem is already inside the building.

OpenClaw: the most-watched AI agent of 2026

OpenClaw is an open-source autonomous AI agent created by Austrian developer Peter Steinberger. It was first released as "Clawdbot" in November 2025, renamed twice following trademark issues, and reached 247,000 GitHub stars and 47,700 forks by March 2026.

Source: OpenClaw Wikipedia

It executes tasks through large language models — Claude, DeepSeek, GPT — and integrates with messaging platforms (WhatsApp, Telegram, Discord, Signal, WeChat) as the primary interface. It can clear email inboxes, send messages, manage calendars, check in for flights, browse the web, and handle multi-step tasks without human input for each step.

Nvidia CEO Jensen Huang called it "definitely the next ChatGPT" and positioned it as "the operating system for personal AI." Nvidia backed that statement by launching NemoClaw — an enterprise-ready stack built on OpenClaw with privacy controls, policy-based guardrails, and Nemotron model support.

Sources: Nvidia NemoClaw announcement, Next Platform — Nvidia says OpenClaw is to agentic AI what GPT was to chattybots

In China, the phenomenon was dubbed "lobster fever" — referring to the claw imagery — and mass adoption followed almost immediately, with government support accelerating enterprise use. Alibaba launched Wukong, an enterprise agent platform built on OpenClaw principles. Baidu released agents for desktop, cloud, mobile, and smart-home. Tencent integrated OpenClaw with WeChat, giving it access to over 1 billion monthly active users through a contact called ClawBot.

Sources: SCMP — Lobster fever grips China, Reuters — Tencent integrates WeChat with OpenClaw AI agent amid China tech battle

Meta, Google, Microsoft, and Amazon banned OpenClaw from corporate hardware. Employees deployed it locally anyway. Shadow IT — agents running on personal machines with full user-level permissions — is now a documented, active phenomenon across major technology firms.

Source: Lyzr — Why OpenClaw is the #1 enterprise wake-up call of 2026

The signal here is not about OpenClaw specifically. It is about the category of capability it represents — and how fast that capability is moving into production environments, with or without IT approval.

Tier 1: autonomous incident management is already in production

ServiceNow is the market leader in IT service management — and it is actively building autonomous agents into its own platform. This is not a third-party integration or an experimental pilot. ServiceNow's documentation describes self-executing agentic workflows that activate once triggered, built and shipped by ServiceNow itself.

An AI agent embedded in the platform today can:

Receive an incoming incident at 2am, read and classify it by category, priority, and urgency
Identify the relevant configuration item in the CMDB without human input
Search the knowledge base and the last 90 days of similar incidents, surface a resolution path
Detect whether the pattern matches a known recurring issue — and flag a likely major incident before volume confirms it
Draft and send the first response to the affected user
Route to the correct resolver group based on classification logic
Prepare work notes, resolution summaries, and post-incident documentation
Page the on-call engineer only if escalation criteria are met

No one woke up for any of those steps. That is not a productivity helper. That is an operator inside the process.

Sources: ServiceNow — Incident Management, ServiceNow — Platform agentic workflows

The next layer in ServiceNow's roadmap adds bounded remediation: creating and updating tickets, notifying stakeholders, triggering standard runbooks, handling routine access and account requests. Each of these is a business action, not a recommendation. Each carries direct operational liability if it runs incorrectly.

The knowledge base dimension is worth a separate note. An agent that can not only search KBAs but evaluate whether they are accurate, flag outdated articles, and eventually propose updates — based on what it observed resolving incidents — is not a stretch from what is already deployed. The knowledge management function, which currently depends heavily on analyst discipline to stay current, becomes something the agent participates in maintaining.

Tier 2: where this pattern leads — and why it matters to address now

What OpenClaw demonstrated at the consumer level — and what makes Nvidia's investment significant — is not just autonomy inside a single application. It is autonomy across systems: an agent that can navigate interfaces, read what it finds, make decisions based on external context, and act — without being handed a predefined script for every situation.

This capability does not yet exist in enterprise IT management platforms as a production-ready feature. But the underlying technology is proven, the trajectory is clear, and the vendors are actively moving in this direction. This is the scenario that directors should be thinking about today, before it arrives as a pilot decision.

Imagine an agent that does not wait to receive an incident. Instead, it monitors an ERP environment continuously — navigating through transaction logs, configuration tables, and system health indicators the same way an experienced consultant would during an audit. It detects an anomaly: a configuration mismatch between two dependent parameters introduced during a recent transport. It cross-references the issue against the known SAP note database, confirms the fix, creates a transport request with the corrected values, runs it through a test system, validates the output against the expected behavior, and promotes it to production during the next maintenance window — logging every step with a human-readable rationale.

No ticket was opened. No analyst was paged. The system corrected itself.

This is hypothetical today. It is not implausible within the next three to five years. Most ERP maintenance work is not greenfield development — it is analysis, correction, configuration, and controlled change. It is structured, repetitive, and governed by rules that are already documented somewhere. That is exactly the environment where automation has historically expanded, incrementally, until one day the human's role had fundamentally changed without anyone making a single large decision to change it.

The progression follows a clear arc: first the agent supports users (answering questions, surfacing documentation). Then it assists operators (drafting changes, flagging anomalies). Then it acts within defined boundaries (creating tickets, triggering runbooks). Then it navigates, diagnoses, and proposes fixes. Then, eventually, it executes.

The distance between the first step and the last is shorter than most governance models assume — and it advances through small, reasonable decisions, not a single dramatic deployment.

The spectacular and the tragic — two sides of the same capability

The OpenClaw adoption wave in China produced some genuinely impressive results before the warnings arrived. Users reported agents that cleared weeks of backlogged email, rescheduled complex travel itineraries across multiple systems, extracted and synthesized data from dozens of documents overnight, and handled multi-step administrative workflows that previously required hours of human coordination. In enterprise pilots, teams described agents that operated continuously during off-hours, handled routine requests without queuing, and surfaced context that analysts had been too busy to find manually.

Nvidia CEO Jensen Huang called it "the next ChatGPT" for a reason. The category represents a real and meaningful productivity shift. The enthusiasm in China — dubbed "lobster fever" — was not irrational. It reflected genuine capability.

Then China's Ministry of Industry and Information Technology issued formal warnings. The enthusiasm and the caution arrived in the same month.

Source: NBC News — In China, a rush to 'raise lobsters' quickly leads to second thoughts

✅

The Security Reality Behind OpenClaw Hype

• More than 18,000 OpenClaw instances are currently exposed to internet attacks.
• Approximately 12% of packages in the OpenClaw registry contain malicious instructions.
• Testing across 47 adversarial scenarios found OpenClaw has an average defense rate of only 17%.

The security picture behind the hype is difficult:

More than 18,000 OpenClaw instances are currently exposed to internet attacks
Approximately 12% of packages in the OpenClaw registry contain malicious instructions
Testing across 47 adversarial scenarios found OpenClaw has an average defense rate of only 17% — meaning in more than four out of five attack attempts, the agent was successfully manipulated into doing something it should not have
Oasis Security documented a vulnerability allowing any website to silently take full control of a developer's agent with no user interaction required
OpenClaw stores API keys and session data in plaintext; active exploitation campaigns targeting those credential files are documented

Sources: Bitdefender — Technical Advisory: OpenClaw exploitation in enterprise networks, Oasis Security — ClawJacked: OpenClaw vulnerability enables full agent takeover, arXiv — Don't let the claw grip your hand: security analysis of OpenClaw

WIRED reported that researchers found OpenClaw agents could be guilt-tripped into self-sabotage and information leakage through adversarial prompting. Business Insider reported a user spending hours correcting fabricated financial data after running the agent on sensitive documents — data that looked plausible enough that the errors were not obvious until manual review.

Source: WIRED — OpenClaw agents can be guilt-tripped into self-sabotage

The tragic failure modes in enterprise environments are rarely the dramatic breach. They are the plausible errors that no one caught in time. An agent that misclassifies a P1 incident as P3 at 3am — no escalation fires, the SLA breach accumulates for hours, the customer calls the CEO at 8am. An agent that triggers the correct runbook against the wrong environment. An agent that produces a convincing resolution summary for an incident that was not actually resolved — closing the ticket, stopping the clock, and leaving the problem in place.

In an ERP context, the stakes are asymmetric. A fabricated resolution note in a service desk is an embarrassment. A fabricated validation record in a financial posting run is a compliance event. The same capability that makes these agents impressive — speed, plausibility, autonomy — is exactly what makes an undetected error expensive.

IBM's analysis of the 2026 Global AI Safety Report states the structural point clearly: the biggest risks in AI deployments come from the systems built around the model, not the model alone.

Source: IBM — What a new global AI safety report means for enterprise

The management question is not "do we trust the model?" It is: do we trust the workflow when the agent is inside it — and what is the cost when it is wrong?

The decision framework directors need

The principle that holds across all of this is straightforward:

Allow bounded autonomy only for tasks where the cost of a wrong automated action is lower than the cost of waiting for human confirmation.

Where to Allow Autonomous Agent Actions

	Good Candidates for Autonomy	Require Human Confirmation
	✓ Pros + Cost of a wrong action is lower than cost of human delay. + Actions are reversible or low-stakes. ✗ Cons − Still carries risk of error accumulation.	✓ Pros + Prevents high-cost, irreversible errors. + Maintains accountability for sensitive actions. ✗ Cons − Slows down processes that could be automated.

Good candidates for autonomous execution: incident categorization, enrichment, routing, draft communications, trend detection, knowledge search, first-pass resolution plans. These are reversible or low-stakes enough to absorb occasional errors.

Actions that should remain human-confirmed: production-impacting remediation, access changes, customer-facing commitments, anything involving sensitive records, any action that is difficult or impossible to reverse.

The challenge with ERP is that the asymmetry is steep. A wrong configuration in a production SAP environment is not the same as a misrouted support ticket. The cost of a wrong action does not scale linearly with the sophistication of the agent — it scales with the criticality of the system the agent is operating inside.

The organizations that will handle this well are not the ones blocking all agentic tooling out of caution, and not the ones deploying without governance because the pilots look impressive. They are the ones that treat process selection and control design as the first deliverable — before the agent goes into production.

My read

The OpenClaw phenomenon is worth taking seriously as a signal, not just as a product. The fact that employees at Meta, Google, Microsoft, and Amazon are running banned agents locally to stay productive tells you where the pressure is coming from. The fact that Nvidia positioned it as "the operating system for personal AI" tells you where the investment is going. The fact that China went from "lobster fever" to formal government warnings in the same month tells you how fast the consequences arrive.

ServiceNow is already building autonomous incident management into its platform — bounded, native, and in production. OpenClaw shows what the broader agent category is capable of when the guardrails are removed or misapplied. The two are separate tracks, but they are converging toward the same question: what happens when an agent is wrong inside a system that matters?

The practical sequence: map your incident workflows against one test — what is the cost of a wrong automated action at each step? Build the human confirmation layer in parallel with the technical pilot, not after. Start the governance conversation about ERP now, before the first autonomous agent is deployed against it.

The wrong question is "should we use AI in ServiceNow?" The right question is "which parts of our incident workflow are we willing to let an agent act on directly — and what do we need to verify before we find out the hard way?"