This hands-on exercise applies iterative design and evaluation to a practical scenario for prompt engineering. Often, a first attempt at a prompt won't yield the ideal results. The process involves taking a suboptimal prompt, analyzing its shortcomings, and systematically refining it to improve the quality, consistency, and structure of the Large Language Model's (LLM) output.The Challenge: Extracting Structured InformationImagine you have customer feedback emails and need to extract specific pieces of information: the customer's main sentiment (Positive, Negative, Neutral), the product mentioned (if any), and a brief summary of the feedback's core issue or compliment.Initial Scenario:You start with a large block of text containing multiple customer emails. Here's a snippet representing one email:Subject: Loving the new Xylos feature! Hi team, Just wanted to say the recent update to the Xylos platform, especially the dashboard customization, is fantastic! It makes my workflow so much smoother. I did notice a small glitch where the date filter sometimes resets unexpectedly, but overall, a huge improvement. Keep up the great work! Best, Alex ChenSuboptimal Prompt (Attempt 1):Here is some customer feedback: [Insert Email Text Here] What is this feedback about?Typical Output (Attempt 1):The LLM might respond with something like:This feedback is about the Xylos platform update. The customer likes the dashboard customization but found a glitch with the date filter. They think it's a big improvement overall.Analysis of Attempt 1:While the LLM understood the basic content, the output has several problems concerning our goal:Lack of Structure: It's a free-form text summary. We wanted specific fields (Sentiment, Product, Summary).Implicit Information: Sentiment ("Positive") is implied but not explicitly stated as requested. The product ("Xylos") is mentioned, but not specifically extracted as a data point.Missing Constraints: The prompt didn't guide the LLM on how to present the information.Iteration 1: Adding Instructions and StructureLet's refine the prompt to be more specific about the task and the desired output format. We'll use clear instructions and request a structured format like JSON, which is easier for applications to parse.Improved Prompt (Attempt 2):Analyze the following customer feedback email. Extract the main sentiment (Positive, Negative, or Neutral), the specific product mentioned (if any, otherwise use "None"), and a concise summary (1-2 sentences) of the core feedback point. Format the output as a JSON object with keys: "sentiment", "product", and "summary". Feedback Email: ''' Subject: Loving the new Xylos feature! Hi team, Just wanted to say the recent update to the Xylos platform, especially the dashboard customization, is fantastic! It makes my workflow so much smoother. I did notice a small glitch where the date filter sometimes resets unexpectedly, but overall, a huge improvement. Keep up the great work! Best, Alex Chen ''' Output:Expected Output (Attempt 2):{ "sentiment": "Positive", "product": "Xylos", "summary": "Customer appreciates the dashboard customization in the Xylos platform update but reported a minor bug with the date filter resetting." }Analysis of Attempt 2:This is significantly better!Structured Output: The JSON format is exactly what we need for downstream processing.Explicit Extraction: Sentiment and product are clearly identified.Conciseness: The summary focuses on the core points.Clear Instructions: The prompt clearly defines the task, the required fields, and the output format.Iteration 2: Handling Ambiguity and Edge Cases (Optional Refinement)What if the feedback was more ambiguous or didn't mention a product? Let's consider feedback like this:Subject: Problem logging in I can't seem to access my account this morning. It just keeps spinning. Is there an issue? - SamUsing Prompt 2 might produce:{ "sentiment": "Negative", "product": "None", "summary": "Customer is unable to log into their account, encountering an indefinite loading issue." }This is good, but perhaps we want to ensure the summary always captures the primary problem for negative feedback or the primary highlight for positive feedback. We can refine the instructions slightly.Further Improved Prompt (Attempt 3):You are a customer support assistant analyzing feedback. Analyze the following customer feedback email. Determine the main sentiment (classify as strictly "Positive", "Negative", or "Neutral"). Identify the specific product mentioned (use "None" if no specific product is named). Create a concise summary (1-2 sentences) focusing on the core issue if sentiment is Negative/Neutral, or the main compliment if Positive. Output the result as a JSON object with keys: "sentiment", "product", and "summary". Feedback Email: ''' [Insert Email Text Here] ''' JSON Output:This version adds a 'role' (customer support assistant) and slightly refines the instruction for the summary based on the sentiment. This adds robustness, guiding the LLM more precisely for different feedback types.EvaluationHow do we know our prompts are getting better?Accuracy: Does the extracted information correctly reflect the email content? Is the sentiment classification accurate? Is the correct product identified?Completeness: Are all requested fields present in the output?Format Adherence: Does the output consistently match the requested JSON structure?Conciseness: Is the summary brief and to the point?Consistency: Does the prompt work reliably across different example emails (positive, negative, neutral, product mentioned, no product mentioned)?You can create a small test suite of diverse emails and run each prompt version against them, comparing the outputs against a manually created "ideal" extraction. This forms the basis for systematic evaluation, which is essential for building reliable LLM applications.digraph G { rankdir=LR; node [shape=box, style=rounded, fontname="Arial", fontsize=10, margin=0.2]; edge [fontname="Arial", fontsize=9]; subgraph cluster_0 { label = "Prompt Refinement Cycle"; bgcolor="#e9ecef"; P1 [label="Attempt 1:\nVague Prompt", style="filled", fillcolor="#ffc9c9"]; O1 [label="Output 1:\nUnstructured Summary", style="filled", fillcolor="#ffc9c9"]; A1 [label="Analyze:\n- Lack of Structure\n- Implicit Info\n- No Constraints", style="filled", fillcolor="#ffe066"]; P2 [label="Attempt 2:\nAdd Instructions,\nRequest JSON", style="filled", fillcolor="#b2f2bb"]; O2 [label="Output 2:\nStructured JSON", style="filled", fillcolor="#b2f2bb"]; A2 [label="Analyze:\n- Handles Ambiguity?\n- Edge Cases?", style="filled", fillcolor="#ffe066"]; P3 [label="Attempt 3:\nRefine Summary Logic,\nAdd Role", style="filled", fillcolor="#a5d8ff"]; O3 [label="Output 3:\nJSON", style="filled", fillcolor="#a5d8ff"]; E [label="Evaluate:\n- Accuracy\n- Format\n- Consistency", style="filled", fillcolor="#74c0fc"]; P1 -> O1 [label="Generates"]; O1 -> A1 [label="Review"]; A1 -> P2 [label="Refine"]; P2 -> O2 [label="Generates"]; O2 -> A2 [label="Review"]; A2 -> P3 [label="Refine (Optional)"]; P3 -> O3 [label="Generates"]; O3 -> E [label="Final Check"]; E -> P1 [style=dashed, label="Repeat if needed"]; } }The iterative process of prompt optimization involves generating output, analyzing its weaknesses, refining the prompt based on that analysis, and evaluating the new output.This hands-on process demonstrates that prompt engineering isn't always about finding a single "magic" prompt immediately. It's often a methodical cycle of crafting, testing, analyzing, and refining to steer the LLM towards generating the precise output your application requires. Keep these principles of iterative refinement and careful evaluation in mind as you build your own prompts.