What did the AI agent actually do? Real failures, examined. Open-source code that records what should have been logged.