As discussed earlier in this chapter, managing conversation history effectively becomes challenging as interactions lengthen. Basic buffer memory eventually overflows the context window, and summarizing memory can lose important details. Vector Store Memory offers a compelling alternative by storing past interactions as embeddings in a vector database and retrieving the most relevant ones semantically when generating a new response. This approach allows the model to recall pertinent information from potentially very long histories, even if it wasn't mentioned recently.In this practical section, we will implement VectorStoreRetrieverMemory using FAISS, a popular library for efficient similarity search, along with OpenAI's embedding models.Prerequisites and SetupFirst, ensure you have the necessary libraries installed. We'll need langchain, the specific integration (langchain-openai), a vector store implementation (faiss-cpu or faiss-gpu), and tiktoken for text processing.pip install langchain langchain-openai faiss-cpu tiktokenYou will also need an OpenAI API key configured in your environment, typically as OPENAI_API_KEY.Now, let's import the required components:import os from langchain_openai import OpenAIEmbeddings, OpenAI from langchain.memory import VectorStoreRetrieverMemory from langchain.chains import ConversationChain from langchain.vectorstores import FAISS from langchain.prompts import PromptTemplate # Ensure your OPENAI_API_KEY is set in your environment variables # Example: os.environ["OPENAI_API_KEY"] = "your_api_key_here" # Check if the API key is available if not os.getenv("OPENAI_API_KEY"): raise ValueError("OPENAI_API_KEY environment variable not set.") Implementing Vector Store Memory with FAISSThe core idea is to use a vector store to hold the conversation history. Each turn of the conversation (input and output) will be embedded and stored. When generating the next response, we'll use the current input to query the vector store for relevant past exchanges.Initialize Components: We need an embedding model and an empty FAISS vector store. The vector store requires the embeddings model and an index name (just a label here).# 1. Initialize the embedding model embedding_model = OpenAIEmbeddings() # 2. Initialize an empty FAISS vector store # The dimensionality depends on the embedding model (OpenAIEmbeddings uses 1536) embedding_size = 1536 index = FAISS.from_texts(["_initial_"], embedding_model, metadatas=[{"hnsw:space": "ip"}]) # Use inner product space for OpenAI embeddings Note: We initialize FAISS with a dummy text _initial_ because it cannot be initialized completely empty via from_texts. This initial entry won't significantly impact retrieval. We specify "hnsw:space": "ip" (inner product) in metadata, which is often recommended for OpenAI embeddings, although cosine similarity is the default and also works well.Create the Retriever: The memory module doesn't interact with the vector store directly; it uses a LangChain Retriever. We create a retriever from our FAISS index. The search_kwargs={'k': 2} parameter tells the retriever to fetch the top 2 most relevant documents (conversation snippets) based on semantic similarity to the current input.# 3. Create the retriever # We'll retrieve the top 2 most relevant conversation snippets retriever = index.as_retriever(search_kwargs=dict(k=2))Choosing the right value for k is important. A larger k brings more context but increases token usage and the risk of including irrelevant information. A smaller k is more concise but might miss useful context. Experimentation is often required.Instantiate VectorStoreRetrieverMemory: Now, we create the memory object itself, passing in the retriever.# 4. Instantiate the memory module memory = VectorStoreRetrieverMemory(retriever=retriever, memory_key="history")The memory_key="history" specifies the variable name that will hold the retrieved context within the prompt.Integrating with a Conversation ChainLet's integrate this memory into a standard ConversationChain. We need an LLM and a prompt template that includes the history variable (managed by our memory module) and the input variable (the user's current message).# 5. Initialize the LLM llm = OpenAI(temperature=0) # Use a deterministic setting for predictability # 6. Define the Prompt Template # Note the "{history}" variable, which will be populated by VectorStoreRetrieverMemory _DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. Relevant pieces of previous conversation: {history} (You do not need to use these pieces of information if not relevant) Current conversation: Human: {input} AI:""" PROMPT = PromptTemplate( input_variables=["history", "input"], template=_DEFAULT_TEMPLATE ) # 7. Create the ConversationChain conversation_with_vectorstore_memory = ConversationChain( llm=llm, prompt=PROMPT, memory=memory, verbose=True # Set to True to see the internal steps )Running the ConversationNow, let's simulate a conversation. Notice how the memory automatically saves the input/output and retrieves relevant history for subsequent turns.# First interaction response = conversation_with_vectorstore_memory.predict(input="My favorite programming language is Python because it's versatile.") print(response) # Second interaction - unrelated response = conversation_with_vectorstore_memory.predict(input="The weather today is sunny.") print(response) # Third interaction - refers back to the first statement implicitly response = conversation_with_vectorstore_memory.predict(input="Why did I mention I liked Python?") print(response)If you run this with verbose=True, you'll see output similar to this (simplified) for the third interaction:> Entering new ConversationChain chain... Prompt after formatting: The following is a friendly conversation between a human and an AI. ... Relevant pieces of previous conversation: Human: My favorite programming language is Python because it's versatile. AI: That's great! Python is indeed known for its versatility, readability, and extensive libraries. It's used in web development, data science, AI, scripting, and much more. (You do not need to use these pieces of information if not relevant) Current conversation: Human: Why did I mention I liked Python? AI: > Finished chain. You mentioned you liked Python because of its versatility.Notice how the Relevant pieces of previous conversation: section was populated by the VectorStoreRetrieverMemory retrieving the first interaction based on the semantic content of the third input ("Why did I mention I liked Python?"). The second, unrelated interaction about the weather was likely not retrieved (or ranked lower) because it was semantically dissimilar.How Vector Store Memory Works: Retrieval FlowThe process within the chain when using VectorStoreRetrieverMemory can be visualized as follows:digraph G { rankdir=LR; node [shape=box, style=rounded, fontname="Arial", fontsize=10, margin="0.2,0.1"]; edge [fontname="Arial", fontsize=9]; UserInput [label="User Input"]; SaveMemory [label="Save Input/Output\n(Embed & Store in Vector DB)"]; RetrieveMemory [label="Retrieve Relevant History\n(Query Vector DB with Current Input)"]; FormatPrompt [label="Format Prompt\n(Inject Retrieved History)"]; LLM [label="LLM Call"]; Output [label="Generate Response"]; FinalOutput [label="Final Output", shape=ellipse]; UserInput -> SaveMemory [label=" After response generation"]; UserInput -> RetrieveMemory [label=" Before prompt formatting"]; RetrieveMemory -> FormatPrompt; UserInput -> FormatPrompt; FormatPrompt -> LLM; LLM -> Output; Output -> SaveMemory; Output -> FinalOutput; }Flow diagram illustrating the steps involved when using Vector Store Memory in a ConversationChain. User input triggers retrieval before prompt formatting, and the input/output pair is saved after the response is generated.Tuning and PersistenceRetrieval Parameter (k): The k value in as_retriever(search_kwargs=dict(k=k)) is a primary tuning parameter. Increasing k provides more context but increases prompt size and cost. Decreasing it saves tokens but might omit relevant information. You might also explore other search_type options like "mmr" (Maximal Marginal Relevance) to balance relevance and diversity in retrieved documents.Persistence: The FAISS index in our example is in-memory and will be lost when the script ends. For production use, you'd typically want persistence. You can save and load a FAISS index locally:# To save the index index.save_local("my_faiss_index") # To load the index later (requires the embedding model) loaded_index = FAISS.load_local("my_faiss_index", embedding_model, allow_dangerous_deserialization=True) retriever = loaded_index.as_retriever(search_kwargs=dict(k=2)) memory = VectorStoreRetrieverMemory(retriever=retriever, memory_key="history") # ... re-create the chain using this memorySecurity Note: Loading FAISS indexes saved with save_local can be a security risk if the index file comes from an untrusted source, hence the allow_dangerous_deserialization=True flag. For production systems interacting with potentially untrusted data, consider more secure serialization methods or managed vector database services. Alternatively, use cloud-based vector stores (Pinecone, Weaviate, etc.) discussed earlier, which handle persistence and scaling automatically.Complete Example ScriptHere is the full script combining the steps:import os from langchain_openai import OpenAIEmbeddings, OpenAI from langchain.memory import VectorStoreRetrieverMemory from langchain.chains import ConversationChain from langchain.vectorstores import FAISS from langchain.prompts import PromptTemplate # Ensure your OPENAI_API_KEY is set if not os.getenv("OPENAI_API_KEY"): raise ValueError("OPENAI_API_KEY environment variable not set.") # 1. Initialize Embeddings embedding_model = OpenAIEmbeddings() # 2. Initialize FAISS Vector Store # Use a small trick to initialize with from_texts as it requires some text try: # Try loading if it exists index = FAISS.load_local("my_faiss_index", embedding_model, allow_dangerous_deserialization=True) print("Loaded existing FAISS index.") except Exception: print("Creating new FAISS index.") # embedding_size = 1536 # Usually inferred from embeddings index = FAISS.from_texts(["_initial_"], embedding_model, metadatas=[{"hnsw:space": "ip"}]) # 3. Create Retriever (retrieve top 2 relevant snippets) retriever = index.as_retriever(search_kwargs=dict(k=2)) # 4. Instantiate Memory memory = VectorStoreRetrieverMemory(retriever=retriever, memory_key="history") # 5. Initialize LLM llm = OpenAI(temperature=0) # 6. Define Prompt Template _DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. Relevant pieces of previous conversation: {history} (You do not need to use these pieces of information if not relevant) Current conversation: Human: {input} AI:""" PROMPT = PromptTemplate( input_variables=["history", "input"], template=_DEFAULT_TEMPLATE ) # 7. Create Conversation Chain conversation_with_vectorstore_memory = ConversationChain( llm=llm, prompt=PROMPT, memory=memory, verbose=False # Set to True to see detailed logs ) # --- Run Conversation --- print("Starting conversation (type 'quit' to exit):") while True: user_input = input("Human: ") if user_input.lower() == 'quit': break response = conversation_with_vectorstore_memory.predict(input=user_input) print(f"AI: {response}") # --- Save the index before exiting --- try: index.save_local("my_faiss_index") print("Saved FAISS index.") except Exception as e: print(f"Error saving FAISS index: {e}") print("Conversation ended.") Memory ConsiderationsEmbedding Cost: Every input/output pair saved incurs the cost of generating its embedding. This can add up over long conversations.Retrieval Relevance: The quality of the memory depends heavily on the effectiveness of the retrieval step. Poor retrieval (due to suboptimal k, weak embedding model, or noisy history) will lead to irrelevant context being fed to the LLM. Techniques like re-ranking or query transformation (discussed in Chapter 4) can sometimes help.Context Size: While vector store memory helps select relevant history, the retrieved snippets still need to fit within the LLM's context window along with the current input and prompt instructions.This practical exercise demonstrated how to implement VectorStoreRetrieverMemory, providing a powerful mechanism for maintaining long-term, semantically relevant context in conversational applications. By storing history in a vector store, you overcome the limitations of simple buffers and enable more coherent and knowledgeable interactions over extended periods. Remember to tune the retrieval parameters and consider persistence strategies based on your application's requirements.