Context
Accenture Federal Services wants to extend AFS AI Refinery with an internal agentic assistant that helps software teams turn requirements into implementation plans, draft code changes, generate tests, and summarize pull-request risk. The goal is to improve developer throughput without allowing the agent to invent requirements, misuse tools, or leak sensitive program data.
Constraints
- p95 end-to-end latency: < 8 seconds for a single-turn task such as test generation or PR review summary
- Cost ceiling: < $0.18 per task and < $45K/month at 250K tasks/month
- Hallucination ceiling: < 2% on a labeled offline set for requirement-grounded outputs
- Prompt injection success rate from tool outputs or retrieved artifacts: < 0.5%
- Must preserve human approval before any code merge, ticket update, or deployment action
- All outputs must remain within approved AFS environments and respect repo/document access controls
Available Resources
- 120K internal artifacts: Jira-style requirements, ADRs, API specs, design docs, code review comments, test plans, and runbooks
- Read-only tools for repository search, issue lookup, CI test results, static-analysis findings, and document retrieval inside AFS AI Refinery
- Approved LLMs: one higher-quality model for planning/review and one lower-cost model for simple transformations
- 40 senior engineers and tech leads available to label a golden set of tasks and failure cases
Task
- Design an agentic workflow for standard software development tasks (requirements analysis, code assistance, test generation, PR review) and explain where autonomy should stop and human approval should begin.
- Define the evaluation plan first: offline golden-set metrics, adversarial testing for prompt injection, and online success/guardrail metrics.
- Write a system prompt that constrains tool use, grounded reasoning, and refusal behavior when requirements or evidence are insufficient.
- Propose the architecture, including retrieval/tool orchestration, model routing, and controls for latency and cost.
- Identify major failure modes and mitigations, especially hallucinated requirements, unsafe code suggestions, prompt injection from artifacts, and permission boundary violations.