Scout Agent, agentic recommendations

LangGraph, Neon pgvector, Lambda Function URL, SSE, FastAPI, CloudFront.

Scout Agent architecture: browser opens an SSE connection to CloudFront, CloudFront forwards to a Lambda Function URL with a shared-secret header, the Lambda runs a four-node LangGraph state machine over Neon pgvector and curated S3 data. — Browser, CloudFront, Function URL, LangGraph (planner, tool_executor, reflector, recommender), pgvector plus curated S3.

A conversational transfer-recommendation agent on top of the FPL Pulse pipeline, live at fpl.isseikuzuki.co.uk/chat since April 2026. The user pastes their FPL team ID; the agent reads curated enrichment data, queries pgvector for similar players, weighs upcoming fixtures, and streams a reasoned transfer recommendation back to the browser. Interactive, multi-step, branching, loops back on itself when the first plan doesn't survive contact with the tool results. A typical conversation costs around 10p per turn end-to-end.

The shape

Four nodes. planner turns the user query into a sequence of tool calls. tool_executor runs them (pgvector similarity search, player history lookup, fixture difficulty, head-to-head comparison). reflector reads the results and decides whether the plan needs another pass. recommender emits the final answer. The reflector-to-planner edge is conditional and capped at three iterations, which is enough room to recover from a bad first plan without letting cost run away.

Embeddings are all-MiniLM-L6-v2 at 384 dimensions, stored in Neon pgvector. Cheap, small, runs on the always-free Neon tier; the quality lift from anything bigger would be invisible at this corpus size (around 700 players times a handful of facets each).

The framework choice

Six weeks before building this, I'd written an ADR on another project choosing Pydantic AI over LangGraph for an orchestration layer. Same engineer, opposite call. The framing that made both right (use-case shape, not capability count, picks the framework) is the spine of the LangGraph and Pydantic AI post. Short version: parallel-fan-out with deterministic joins doesn't need a graph DSL; stateful conditional loops with iteration caps do.

The CloudFront landmine

Putting the Function URL behind CloudFront with the textbook OAC plus AWS_IAM pattern returned 403 on 100% of POST requests. The cause is that SigV4 needs the SHA256 of the request body in the canonical request, and CloudFront cannot compute that on a streaming POST in time to sign the request. Every AWS example for OAC plus Function URL uses GET, where the issue doesn't surface. The fix is to set the Function URL to auth_type = NONE and inject a shared-secret header on every CloudFront origin request, validated by a FastAPI middleware on the Lambda. The full evening is on the blog.

Streaming and safety

The agent streams thinking tokens back to the browser over Server-Sent Events. Lambda Function URLs support response streaming up to 20MB of body and a 15-minute response duration, which is generous. Lambda Web Adapter sits in front of the FastAPI app and converts the Lambda invocation model to a normal ASGI request. CloudFront's AllViewerExceptHostHeader origin-request policy forwards everything except the host header (the Function URL rejects requests whose host doesn't match its own domain, so the host header has to be stripped).

Safety-wise: the agent's tools are fixed paths to Neon and S3, no user-controlled URLs, no shell, no eval. Parameters are validated through Pydantic before being assembled into the SQL or HTTP call. DynamoDB holds a per-request £1 budget cap and per-IP rate limiter, defence in depth against the three-iteration cap failing to bind. At a typical 10p per turn the cap is theatre; the cost discipline is in the planner deciding when the loop has converged, not in the runtime stopping it.

How I measure it

A 29-case golden eval set lives next to the agent at services/agent/src/fpl_agent/evaluation/. Each case declares the tools the planner must call, the tools it must not, the players the response has to mention (word-boundary regex), and the output-shape constraints (a comparison question must populate ScoutReport.comparison; an unknown-player question must leave ScoutReport.players empty). Hard checks pass-fail; on top of them a Haiku judge scores per-case rubric bullets 1 to 5 for narrative quality. The runner uses a pandas-mirrored fixture tool registry over a committed parquet snapshot of the Neon player table, so scores are reproducible run-over-run irrespective of Premier League results, and regenerating the snapshot is gated by a pre-write assertion that refuses to commit one that would silently break the eval set.

First baseline came back at 24/29 hard-check pass (82.8%), mean judge 4.36/5. The fail cases that mattered most were two unknown-player questions where the agent wrote excellent prose around a fabricated PlayerAnalysis; a judge-only eval would have scored those as marginal passes at 4.25 and 3.70, which is exactly what the dual-grading layer is for. A full run is around £1.50 on Anthropic end-to-end; the writeup is at docs/evals/baseline.md and the framework shipped in PR #146.

What I'd do differently

The first iteration of the graph had a fetch_user_squad tool that the planner could call to load the user's current FPL team. Logically clean: the agent decides what data it needs, the tool provides it. In practice every conversation needs the squad (a transfer recommendation makes no sense without knowing what's already on the team), so every planner emitted that tool call as its first step. The "agent decides" branch was always taken. I moved squad-loading out of the tool surface and into the HTTP layer: the /chat endpoint resolves the squad before the graph runs, and the squad is passed in as ChatRequest context. The planner now starts with information it always needs, not a decision it always makes.

Tools are for decisions, not retrieval. If the agent always fetches X, X belongs in the request context and the planner shouldn't have to think about it. If the agent sometimes fetches Y depending on what it's seen, Y belongs on the tool surface. The mistake on the first pass was treating "the agent should be able to" as the same question as "the agent should decide whether to".