A debugging and refinement process for a simplified agentic workflow that isn't performing as expected involves identifying issues, proposing changes to the prompts, and discussing how these changes lead to better outcomes.Imagine we have an agent, ReportCraftAI, designed to help create summaries of recent news articles on a specific topic. Its intended workflow is:Accept a topic from the user.Use a web search tool to find three relevant news articles published in the last week.For each article, extract the main headline and a brief summary.Compile these into a single report.However, users have reported that ReportCraftAI often returns only one article, sometimes irrelevant ones, and the summaries can be poorly extracted.Our task is to diagnose the problems by examining its prompts and refine them to improve its performance. We'll assume the agent has access to a search_web(query: str, time_period: str, num_results: int) tool and an extract_summary(article_text: str) tool.Initial Problematic Workflow and PromptsLet's assume the core system prompt for ReportCraftAI includes something like this when tasked with a topic, say "advancements in renewable energy":Initial System Prompt Segment:You are ReportCraftAI, a helpful assistant. Your goal is to create a report on recent news. Current Task: User wants a report on "advancements in renewable energy". Available Tools: 1. search_web(query: str, time_period: str, num_results: int): Searches the web for news. time_period can be "past_week", "past_month". 2. extract_summary(article_text: str): Extracts a concise summary from an article. Plan: 1. Search for news on the topic. 2. Extract information. 3. Compile the report. Begin.The agent then uses this to generate further prompts for itself or to decide on tool use. A likely first internal prompt or thought process for step 1 (Search) might be influenced by:Initial Agent "Thought" for Search: "I need to find news on 'advancements in renewable energy'. I should use the search_web tool."Resulting Tool Call Prompt (Implicit or Explicit): search_web(query="advancements in renewable energy news", time_period="recent", num_results=1)Observed Problems:Problem 1 (Number of Articles): The agent often defaults to num_results=1 or another low number if not explicitly guided, or the term "recent" might be too vague for the time_period parameter if the tool expects specific values.Problem 2 (Relevance): The query "advancements in renewable energy news" might be too broad.Problem 3 (Summary Extraction): If the prompt for extract_summary is just "Summarize this article," the quality can be inconsistent.Analyzing Agent Action Sequences (Simulated)If we had logs, we might see something like this (simplified):Thought: User wants a report on "advancements in renewable energy". I need to find articles. Action: search_web(query="advancements in renewable energy news", time_period="recent", num_results=1) Observation: [Article1: "Solar Power Breakthrough...", Content: "Detailed text..."] Thought: I have one article. Now I need to summarize it. Action: extract_summary(article_text="Detailed text...") Observation: "Solar power is advancing." Thought: I have the summary. I will compile the report. Output: Report: Article 1: Solar Power Breakthrough. Summary: Solar power is advancing.This trace clearly shows the agent only fetching one article and producing a very brief summary.Step 1: Refining the Search GuidanceThe initial system prompt is too high-level and doesn't provide enough constraints for the search task. We need to be more specific.Revised System Prompt Segment (Focus on Task Definition):You are ReportCraftAI, a helpful assistant. Your goal is to create a report on recent news. Current Task: User wants a report on "advancements in renewable energy". Specific Instructions: - Find exactly three (3) relevant news articles. - Articles must be published within the "past_week". - Focus on significant developments or announcements. Available Tools: 1. search_web(query: str, time_period: str, num_results: int): Searches the web for news. time_period must be "past_week" or "past_month". 2. extract_summary(article_text: str, desired_length_words: int): Extracts a concise summary from an article to a desired word length. Plan: 1. Formulate a precise search query based on the topic and instructions. 2. Use search_web to find 3 articles from the past_week. 3. For each article, use extract_summary to get a 50-word summary. 4. Compile the headlines and summaries into a report. Begin.Reasoning for Changes:Specificity in Instructions: We explicitly state "Find exactly three (3) relevant news articles" and "Articles must be published within the 'past_week'." This directly addresses the num_results and time_period parameters for the search_web tool.Guidance on Query Formulation: "Focus on significant developments or announcements" nudges the agent to create a better search query than just the topic + "news".Tool Description Update: We made extract_summary more controllable by adding desired_length_words.With these changes, the agent's internal "thought" process for the search becomes more constrained, leading to a better tool call:Improved Agent "Thought" for Search: "I need to find 3 recent articles (past_week) on 'significant advancements in renewable energy'. I will use search_web."Resulting Tool Call Prompt (Implicit or Explicit): search_web(query="significant advancements in renewable energy", time_period="past_week", num_results=3)This is a significant improvement and directly addresses Problem 1 and helps with Problem 2.Step 2: Improving Summary ExtractionPreviously, the extract_summary tool might have been called with minimal instruction. The revised system prompt now guides the agent to use the new desired_length_words parameter.Original Implicit Prompt to extract_summary (derived from "Extract information"): "Summarize this article text: [article content]"Revised Prompt for extract_summary (derived from "use extract_summary to get a 50-word summary"): "Extract a summary of approximately 50 words from the following text, focusing on the main findings: [article content]" Or, if the agent directly calls the tool based on the plan: extract_summary(article_text="[article content]", desired_length_words=50)Reasoning for Changes:Explicit Length Control: Requesting a "50-word summary" gives the LLM behind the tool a clear target.Focus Instruction: Adding "focusing on the main findings" (if we prompt the summarization step itself) helps the LLM prioritize important information.This directly addresses Problem 3, leading to more consistent and useful summaries.Step 3: Iteration and Testing with VariationsLet's say after these changes, the agent now reliably gets three articles and the summaries are better, but sometimes one of the articles is an opinion piece rather than a news report. The initial instruction "Focus on significant developments or announcements" was a good start, but we can refine the prompt for query generation further.We could try a variation in the system prompt's instructions for search:System Prompt Variation (Search Instruction): "...Focus on factual news reports about significant developments or announcements, avoiding opinion pieces or blog posts."Comparing Variations: To test this, you would run the agent with both the previous prompt and this new variation on several different topics. You'd then compare the outputs:Metric 1: Number of relevant news articles (target: 3).Metric 2: Number of opinion pieces/blogs returned (target: 0).Metric 3: Subjective quality of summaries.This is where techniques like A/B testing prompts become useful. If this variation consistently reduces irrelevant articles without harming other aspects, it's a good candidate for adoption.A simple diagram can illustrate the shift in the agent's process:digraph G { rankdir=TB; node [shape=box, style="rounded,filled", fillcolor="#e9ecef", fontname="sans-serif"]; edge [fontname="sans-serif"]; subgraph cluster_before { label = "Problematic Workflow"; bgcolor="#ffc9c9"; u0 [label="User Topic:\n'Renewable Energy'"]; p0_sys [label="Initial System Prompt:\nVague instructions"]; p0_search [label="Agent thought:\n'Find news'"]; p0_tool [label="Tool Call:\nsearch_web(query=..., num_results=1)"]; p0_sum [label="Agent thought:\n'Summarize'"]; p0_out [label="Output:\n1 poor summary", fillcolor="#ffa8a8"]; u0 -> p0_sys -> p0_search -> p0_tool -> p0_sum -> p0_out; } subgraph cluster_after { label = "Refined Workflow"; bgcolor="#b2f2bb"; u1 [label="User Topic:\n'Renewable Energy'"]; p1_sys [label="Refined System Prompt:\nSpecific instructions (3 articles, past_week, 50-word summary)"]; p1_search [label="Agent thought:\n'Find 3 specific articles'"]; p1_tool [label="Tool Call:\nsearch_web(query=..., time_period='past_week', num_results=3)"]; p1_sum_loop [label="For each article:\nAgent thought:\n'Summarize to 50 words'"]; p1_out [label="Output:\n3 good summaries", fillcolor="#8ce99a"]; u1 -> p1_sys -> p1_search -> p1_tool -> p1_sum_loop -> p1_out; } }This diagram shows the contrast between the initial, less effective workflow and the refined workflow achieved by improving the agent's guiding prompts.Logging for Continuous ImprovementThroughout this debugging process, detailed logging would be invaluable. Imagine logs capturing:The exact prompt used for each tool call.The raw output from each tool.The agent's "internal monologue" or reasoning steps (if your architecture supports this, like in ReAct).For example, if the search_web tool returned an error message or unexpected data, logs would help pinpoint if the issue was the tool itself or how the agent prompted it. If extract_summary consistently produced summaries that were too short despite the 50-word request, logs would help investigate if the tool respected the parameter or if the input text was too brief.By reviewing these logs, you can systematically identify which parts of your prompt chain are weak and require further refinement. Organizing your prompts with version control (e.g., using Git for prompt files or a dedicated prompt management system) allows you to track changes, revert if a new prompt performs worse, and manage different versions for A/B testing.This hands-on exercise simulated a common scenario in developing agentic workflows. The core aspect is not just to write prompts, but to treat them as a core part of your system that requires testing, analysis, and iterative refinement. By applying the principles from this chapter, you can significantly enhance the reliability and performance of your AI agents.