Problem statement
Current AIOps prototypes excel in demos and break in production. They hallucinate commands, correlate text without grounding, and collapse into consensus when run as multi-agent debates. SentinelCloud is a closed-loop system that treats every step as a measurable contract.
Approach
A six-agent state machine separates analysis, dissent, planning, validation, safety, and outcome prediction. Actions pass a deterministic policy gate, a calibrated confidence gate, and a blast-radius gate before execution. Every run is logged as an episode for memory recall.
Reproducibility
Seven scenarios are seeded fixtures. Same input, same orchestrator, same KPIs. The LLM gateway has a deterministic stub fallback so the demo runs offline. Source code, prompts, and the policy constitution all live in the repo.
Baselines we compare against
- Single-LLM zero-shot, No tools, no debate, free-form chain-of-thought.
- Single-LLM with tools, Tool calling but no critic, no policy gate.
- Naive debate (no devil), Multiple agents but no contractually-pinned dissenter.
- SentinelCloud (this work), Adversarial debate, blast radius, semantic policy, calibration, memory.
- Oracle upper bound, Cheats with the ground-truth scenario answer; reports the ceiling.
Honest limitations
- Demo runs against simulated topologies. Connector mode against a real GCP / K8s project is implemented but disabled by default.
- LLM cost depends on provider; the gateway falls back to a stub when no provider is reachable.
- The Process Reward Model is heuristic; a learned PRM is future work.
- Evaluation set is seven scenarios, not hundreds. Scale-out is straightforward but out of scope for the capstone window.
Cite this work
@misc{kumar2026sentinelcloud,
title = {SentinelCloud: A Closed-Loop Multi-Agent System for Autonomous Cloud DevOps},
author = {Kumar, Rohit},
year = {2026},
note = {BTech CSE Cloud Computing capstone, Shoolini University},
url = {https://sentinelcloud.dmj.one}
}