This practical exercise focuses on building a cohesive LLM application by combining models, prompts, and parsers. This application demonstrates a common and powerful pattern in LLM development: taking unstructured text as input and producing structured, machine-readable data as output. The goal is to build an application that can read a short biography and extract specific details like a person's name, title, and company.This process follows the standard invocation sequence you've learned about: a prompt guides a model, and a parser structures the model's response.digraph G { rankdir=TB; graph [fontname="Arial"]; node [shape=box, style="rounded,filled", fillcolor="#e9ecef", fontname="Arial"]; edge [fontname="Arial"]; input [label="Unstructured Text\n(e.g., 'Sarah is a lead...')", fillcolor="#a5d8ff"]; prompt [label="Prompt Template\n+ Format Instructions"]; model [label="LLM\n(ChatOpenAI)", fillcolor="#bac8ff"]; parser [label="PydanticOutputParser"]; output [label="Structured Data\n(PersonProfile Object)", shape=ellipse, fillcolor="#b2f2bb"]; input -> prompt [label="query"]; prompt -> model; model -> parser [label="JSON String"]; parser -> output; }The data extraction workflow. Unstructured text is combined with format instructions in a prompt, processed by the model, and then parsed into a structured Python object.Step 1: Define the Data Schema with PydanticBefore we can extract information, we must first define the structure of the data we want. A schema acts as a contract for our output, ensuring consistency and predictability. The Pydantic library is the standard for data validation in Python and integrates perfectly with LangChain.Let's define a PersonProfile schema that includes a name, job title, and company. We can also add descriptions to guide the LLM in correctly identifying each piece of information.from pydantic import BaseModel, Field from typing import Optional class PersonProfile(BaseModel): """A structured representation of a person's professional profile.""" name: str = Field(description="The full name of the person.") title: str = Field(description="The professional title or role of the person.") company: str = Field(description="The name of the company the person works for.") years_of_experience: Optional[int] = Field( None, description="The total number of years of professional experience." )By creating this class, we have established a clear target format for our LLM. The descriptions within Field are not just for our own reference; the output parser will use them to generate more effective instructions for the model.Step 2: Configure the Model, Parser, and PromptWith our data schema defined, we can now set up the three core components of our extractor.First, we create an output parser from our Pydantic model. The PydanticOutputParser will inspect our PersonProfile class and generate instructions on how to format the output as a JSON object that matches the schema.Second, we define a prompt template. This template will instruct the LLM on its task, taking two inputs:query: The raw, unstructured text we want to process.format_instructions: The auto-generated instructions from our parser.Finally, we instantiate our chat model. For this example, we'll use OpenAI's model.from langchain_openai import ChatOpenAI from langchain.prompts import PromptTemplate from langchain_core.output_parsers import PydanticOutputParser # Set up the parser parser = PydanticOutputParser(pydantic_object=PersonProfile) # Define the prompt template prompt = PromptTemplate( template="Extract information from the following text.\n{format_instructions}\nText: {query}\n", input_variables=["query"], partial_variables={"format_instructions": parser.get_format_instructions()}, ) # Initialize the model model = ChatOpenAI(temperature=0, model="gpt-4o-mini")Notice the use of partial_variables in our PromptTemplate. This is a useful technique that pre-fills a part of the prompt. Since the format instructions from the parser are static for our defined schema, we can inject them directly into the template, simplifying the final chain invocation.Step 3: Combine Components into a ChainNow we connect our components into a processing pipeline using the LangChain Expression Language (LCEL). The pipe symbol (|) connects the elements, creating a sequence where the output of one step becomes the input to the next.Our chain follows the logical flow we designed: the prompt formats the input, the model generates a response, and the parser structures the final output.# Create the chain extractor_chain = prompt | model | parser # Let's inspect the final prompt that will be sent to the model # You don't need to run this part, it's for understanding # formatted_prompt = prompt.format(query="Some example text.") # print(formatted_prompt)Running this chain is now straightforward. We only need to provide the query variable, as the format_instructions are already handled.Step 4: Run the ExtractorLet's test our extractor with a sample piece of text. We will invoke the chain and inspect the output.# Input text text_input = """ Alex Thompson is the Senior Data Scientist at InnovateCorp, where he has been leading the AI research division for the past 5 years. """ # Invoke the chain result = extractor_chain.invoke({"query": text_input}) # Print the structured output print(result) print(f"\nType of result: {type(result)}")The expected output will be a PersonProfile object, not a simple string or dictionary.name='Alex Thompson' title='Senior Data Scientist' company='InnovateCorp' years_of_experience=5 Type of result: <class '__main__.PersonProfile'>Success. The chain correctly parsed the unstructured sentence and returned a Pydantic object. We can now access the data reliably using standard object attributes, such as result.name or result.company. This structured output is immediately usable in any downstream application logic, such as saving to a database or feeding into another system, without needing fragile string manipulation or regular expressions. This example highlights how combining prompts, models, and parsers creates a dependable bridge from unstructured language to structured data.