> ## Documentation Index
> Fetch the complete documentation index at: https://staging.docs.vocode.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Vector databases

> Give your agents knowledge via embeddings

## Introduction

For many practical use cases it may be necessary to give bots information; but, due to limited
context windows, there may not be enough room in the prompt. Thus, Vocode allows you to plug into
vector databases that contain embeddings.

Each time the bot receives a message, it can query for the most similar embeddings; these embeddings will
be shown to the agent to guide its responses.

Currently, we support [Pinecone](https://www.pinecone.io/). Under the hood, we use an approach similar
to Langchain to store the documents in Pinecone. Each vector in Pinecone must have two pieces of metadata
to be compatible with Vocode:

* `text`: The text that will be shown to the agent.
* `source`: The name of the document where the text comes from. This could be the title of an article,
  for example.

## Connecting to Pinecone

To add your vector database to your agent, you need to add a vector database config to your agent:

<CodeGroup>
  ```python update_agent.py theme={null}
  from vocode import PineconeVectorDatabaseUpdateParams

  agent = vocode_client.agents.update_agent(
      id="AGENT_ID",
      request=PineconeVectorDatabaseUpdateParams(
          type="vector_database_pinecone",
          index="PINECONE_INDEX",
          api_key="PINECONE_API_KEY",
          api_environment="PINECONE_API_ENVIRONMENT",
      ),
  )
  ```

  ```javascript updateAgent.js theme={null}
  const agent = await vocode.agents.updateAgent({
    id: "AGENT_ID",
    body: {
      vector_database: {
        type: "vector_database_pinecone",
        index: "PINECONE_INDEX",
        api_key: "PINECONE_API_KEY",
        api_environment: "PINECONE_API_ENVIRONMENTr",
      },
    },
  });

  main();
  ```
</CodeGroup>

Now your agent will query the vector database for each user message.

## Document loading script

You can manaully add documents to Pinecone any way you like as long as you include the required metadata.
If you have a folder of PDFs, docx files, text files, etc. that you want to add to pinecone, you can use
the below script which uses [Unstructured](https://github.com/Unstructured-IO/unstructured) to parse
many kinds of files types, extract the text, and add it to pinecone.

The script was tested with these package versions:

```
python = "^3.10"
langchain = "^0.0.237"
spacy = "^3.6.0"
unstructured = {extras = ["local-inference"], version = "^0.8.1"}
layoutparser = {extras = ["layoutmodels", "tesseract"], version = "^0.3.4"}
pinecone-client = "^2.2.2"
openai = "^0.27.8"
torch = ">=2.0.0, !=2.0.1"
tiktoken = "^0.4.0"
```

```python theme={null}
import os
import pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import SpacyTextSplitter
from langchain.vectorstores import Pinecone
from langchain.document_loaders import DirectoryLoader, UnstructuredFileLoader

PINECONE_API_KEY = os.environ["PINECONE_API_KEY"]
PINECONE_ENVIRONMENT = os.environ["PINECONE_ENVIRONMENT"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]


loader = DirectoryLoader('./docs', glob="**/*.*", show_progress=True, loader_cls=UnstructuredFileLoader)
print("Loading documents...")
documents = loader.load()
text_splitter = SpacyTextSplitter(chunk_size=1000)
print("Splitting documents...")
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_ENVIRONMENT,
)

index_name = "your_index_name"

print("Creating index...")
docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)
```
