All Courses

Hands-on Practical: Indexing and Querying Documents

Let's put the theory into practice. In the previous sections, we discussed how LlamaIndex helps connect your LLM applications to external data by loading, indexing, and providing interfaces for querying. Now, we'll walk through a concrete example of indexing text documents and retrieving information from them.

This practical exercise assumes you have a working Python environment set up (as covered in Chapter 2) and have installed the necessary LlamaIndex library. If you haven't already, install llama-index and a foundational LLM integration like llama-index-llms-openai:

pip install llama-index llama-index-llms-openai python-dotenv

You will also need an API key from an LLM provider like OpenAI. Remember to set this up securely using environment variables (e.g., in a .env file), as discussed in Chapter 2.

OPENAI_API_KEY="sk-..."

1. Preparing Your Data

First, let's create some simple text files to use as our data source. Create a directory named data in your project folder. Inside this data directory, create two text files:

File 1: data/llm_features.txt

Large Language Models (LLMs) possess several notable capabilities.
They excel at natural language understanding, allowing them to process and interpret human text.
Generation is another core strength, enabling them to produce coherent and contextually relevant text.
LLMs can also perform translation between languages and summarize long documents effectively.
Some advanced models show emergent abilities in reasoning and problem-solving.

File 2: data/rag_systems.txt

Retrieval-Augmented Generation (RAG) enhances LLM performance by integrating external knowledge.
The core idea is to retrieve relevant information from a specified dataset before generating a response.
This process helps ground the LLM's output in factual data, reducing hallucinations.
RAG systems typically involve a retriever component (often using vector search) and a generator component (the LLM).
Building effective RAG requires careful consideration of data indexing and retrieval strategies.

2. Loading Documents

With our data files ready, we can use LlamaIndex's SimpleDirectoryReader to load them. This reader automatically detects files in a specified directory and parses their content.

import os
from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI # Or your preferred LLM

# Load environment variables (for API keys)
load_dotenv()

# Configure the LLM - Ensure OPENAI_API_KEY is set in your environment
# You can adjust the model and temperature as needed
# Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
# Note: As of recent LlamaIndex versions, it's often better to pass
# the LLM explicitly when needed, or configure globally if preferred.

print("Loading documents...")
# Point SimpleDirectoryReader to the directory containing our text files
documents = SimpleDirectoryReader("./data").load_data()
print(f"Loaded {len(documents)} document(s).")

# Optionally, inspect the loaded documents
# print(documents[0].get_content()[:100] + "...") # Print start of first doc

Executing this code will load the text content from both llm_features.txt and rag_systems.txt into a list of Document objects. Each Document object contains the text content and associated metadata.

3. Indexing the Documents

Now that the documents are loaded, the next step is to index them. Indexing transforms the raw text into a structure optimized for fast retrieval. The most common index type for question-answering is the VectorStoreIndex, which creates vector embeddings for chunks of your text.

print("Creating index...")
# Create a VectorStoreIndex from the loaded documents
# This process involves chunking the documents, generating embeddings,
# and storing them in a vector store (in-memory by default).
index = VectorStoreIndex.from_documents(documents)
print("Index created successfully.")

Behind the scenes, LlamaIndex performs several steps:

Chunking: Breaks down the documents into smaller text segments (Nodes).
Embedding: Uses an embedding model (often coupled with your chosen LLM or a separate model) to convert each text chunk into a numerical vector representation. These vectors capture the semantic meaning of the text.
Storing: Stores these vectors and their corresponding text chunks in a vector store. By default, LlamaIndex uses a simple in-memory store, but this can be configured to use persistent stores like Chroma, FAISS, Pinecone, etc. (which we'll touch upon in the RAG chapter).

4. Querying the Index

With the index built, we can now ask questions related to the content of our documents. LlamaIndex provides a query_engine interface for this purpose.

print("Setting up query engine...")
# Create a query engine from the index
query_engine = index.as_query_engine()
print("Query engine ready.")

# Define your query
query_text = "What are the core ideas behind RAG systems?"
print(f"\nQuerying: {query_text}")

# Execute the query
response = query_engine.query(query_text)

# Print the response
print("\nResponse:")
print(response)

# You can also inspect the source nodes used for the response
# print("\nSource Nodes:")
# for node in response.source_nodes:
#     print(f"Node ID: {node.node_id}, Score: {node.score:.4f}")
#     print(f"Content: {node.get_content()[:150]}...") # Print snippet of source text

When you run the query:

The query_engine takes your query text (query_text).
It converts the query into an embedding vector using the same embedding model used during indexing.
It searches the VectorStoreIndex to find the text chunks (Nodes) whose vectors are most similar to the query vector. This is the "retrieval" step.
The retrieved text chunks, along with the original query, are passed to the LLM.
The LLM synthesizes an answer based on the retrieved context and the query. This is the "generation" step.

The response object contains the LLM's generated answer. You can also access response.source_nodes to see which specific chunks of text from your original documents were retrieved and used to generate the answer. This is useful for understanding the basis of the response and for debugging.

Let's try another query:

query_text_2 = "What are some capabilities of LLMs mentioned?"
print(f"\nQuerying: {query_text_2}")
response_2 = query_engine.query(query_text_2)
print("\nResponse:")
print(response_2)

This query should primarily retrieve information from the llm_features.txt document, demonstrating the index's ability to route queries to the relevant source material.

Summary of the Process

The diagram below outlines the basic indexing and querying workflow we just implemented.

Basic LlamaIndex workflow: Documents are loaded, processed into an index, and then queried via a query engine which retrieves relevant context before generating a response with an LLM.

This hands-on example demonstrates the fundamental cycle of using LlamaIndex: loading data, creating an searchable index, and querying that index to get context-aware answers. As you build more complex applications, you'll explore different loaders, index types, retrievers, and query engine configurations, but this core pattern remains central.

Was this section helpful?