Let's put the theory into practice. In the previous sections, we discussed how LlamaIndex helps connect your LLM applications to external data by loading, indexing, and providing interfaces for querying. Now, we'll walk through a concrete example of indexing text documents and retrieving information from them.
This practical exercise assumes you have a working Python environment set up (as covered in Chapter 2) and have installed the necessary LlamaIndex library. If you haven't already, install llama-index
and a foundational LLM integration like llama-index-llms-openai
:
pip install llama-index llama-index-llms-openai python-dotenv
You will also need an API key from an LLM provider like OpenAI. Remember to set this up securely using environment variables (e.g., in a .env
file), as discussed in Chapter 2.
OPENAI_API_KEY="sk-..."
First, let's create some simple text files to use as our data source. Create a directory named data
in your project folder. Inside this data
directory, create two text files:
File 1: data/llm_features.txt
Large Language Models (LLMs) possess several notable capabilities.
They excel at natural language understanding, allowing them to process and interpret human text.
Generation is another core strength, enabling them to produce coherent and contextually relevant text.
LLMs can also perform translation between languages and summarize long documents effectively.
Some advanced models show emergent abilities in reasoning and problem-solving.
File 2: data/rag_systems.txt
Retrieval-Augmented Generation (RAG) enhances LLM performance by integrating external knowledge.
The core idea is to retrieve relevant information from a specified dataset before generating a response.
This process helps ground the LLM's output in factual data, reducing hallucinations.
RAG systems typically involve a retriever component (often using vector search) and a generator component (the LLM).
Building effective RAG requires careful consideration of data indexing and retrieval strategies.
With our data files ready, we can use LlamaIndex's SimpleDirectoryReader
to load them. This reader automatically detects files in a specified directory and parses their content.
import os
from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI # Or your preferred LLM
# Load environment variables (for API keys)
load_dotenv()
# Configure the LLM - Ensure OPENAI_API_KEY is set in your environment
# You can adjust the model and temperature as needed
# Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
# Note: As of recent LlamaIndex versions, it's often better to pass
# the LLM explicitly when needed, or configure globally if preferred.
print("Loading documents...")
# Point SimpleDirectoryReader to the directory containing our text files
documents = SimpleDirectoryReader("./data").load_data()
print(f"Loaded {len(documents)} document(s).")
# Optionally, inspect the loaded documents
# print(documents[0].get_content()[:100] + "...") # Print start of first doc
Executing this code will load the text content from both llm_features.txt
and rag_systems.txt
into a list of Document
objects. Each Document
object contains the text content and associated metadata.
Now that the documents are loaded, the next step is to index them. Indexing transforms the raw text into a structure optimized for fast retrieval. The most common index type for question-answering is the VectorStoreIndex
, which creates vector embeddings for chunks of your text.
print("Creating index...")
# Create a VectorStoreIndex from the loaded documents
# This process involves chunking the documents, generating embeddings,
# and storing them in a vector store (in-memory by default).
index = VectorStoreIndex.from_documents(documents)
print("Index created successfully.")
Behind the scenes, LlamaIndex performs several steps:
With the index built, we can now ask questions related to the content of our documents. LlamaIndex provides a query_engine
interface for this purpose.
print("Setting up query engine...")
# Create a query engine from the index
query_engine = index.as_query_engine()
print("Query engine ready.")
# Define your query
query_text = "What are the core ideas behind RAG systems?"
print(f"\nQuerying: {query_text}")
# Execute the query
response = query_engine.query(query_text)
# Print the response
print("\nResponse:")
print(response)
# You can also inspect the source nodes used for the response
# print("\nSource Nodes:")
# for node in response.source_nodes:
# print(f"Node ID: {node.node_id}, Score: {node.score:.4f}")
# print(f"Content: {node.get_content()[:150]}...") # Print snippet of source text
When you run the query:
query_engine
takes your query text (query_text
).VectorStoreIndex
to find the text chunks (Nodes) whose vectors are most similar to the query vector. This is the "retrieval" step.The response
object contains the LLM's generated answer. You can also access response.source_nodes
to see which specific chunks of text from your original documents were retrieved and used to generate the answer. This is useful for understanding the basis of the response and for debugging.
Let's try another query:
query_text_2 = "What are some capabilities of LLMs mentioned?"
print(f"\nQuerying: {query_text_2}")
response_2 = query_engine.query(query_text_2)
print("\nResponse:")
print(response_2)
This query should primarily retrieve information from the llm_features.txt
document, demonstrating the index's ability to route queries to the relevant source material.
The diagram below outlines the basic indexing and querying workflow we just implemented.
Basic LlamaIndex workflow: Documents are loaded, processed into an index, and then queried via a query engine which retrieves relevant context before generating a response with an LLM.
This hands-on example demonstrates the fundamental cycle of using LlamaIndex: loading data, creating an searchable index, and querying that index to get context-aware answers. As you build more complex applications, you'll explore different loaders, index types, retrievers, and query engine configurations, but this core pattern remains central.
© 2025 ApX Machine Learning