chat.html

Out of Date
The steps in this article are out of date. An updated version is coming soon.

Using an LLM to enhance the discoverability of your content is quickly becoming a baseline requirement for documentation. Thankfully, it’s not too hard to do thanks to Hugo’s output to JSON.

At a high level, you’ll need to provide some server-side code in Python or JS that routes user questions to chatGPT after being passed some embeddings (created from your docs JSON) for context.

How it works

This partial sends an API request to a GCP cloud function you’ll need to set up that uses Flask (built in) to:

  1. Search a Pinecone vector database filled with embeddings created from your documentation.
  2. Perform a similarity search and return the 4 most relevant chunks.
  3. Forward those chunks to the OpenAI API via LangChain to perform RAG services.
  4. Return an answer based on the question and content provided.
have it your way
There are several ways to implement a RAG LLM UX — this is just the way that currently works for me. It seems like in the future people may shift from LangChain to the official Assistant API. Hopefully sharing this implementation helps you achieve yours!

Set up

To use this feature, you’re going to need to:

  1. Set up a Vector DB (doesn’t have to be Pinecone, LangChain supports multiple options).
  2. Convert your site index.json into embeddings and save them to the DB.
  3. Deploy a cloud function that can accept and route questions.
python 3.12
The tiktoken requirement runs into issues on Python 3.12; for now, I recommend using 3.10 if deploying with a GCP function.

Create & store embeddings

import os
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import JSONLoader
from langchain_pinecone import Pinecone as PC_Pinecone
from langchain_openai import OpenAIEmbeddings
from pinecone import Pinecone, ServerlessSpec, PodSpec
import time
from pprint import pprint

load_dotenv()

api_key = os.environ.get('PINECONE_API_KEY')
openai_key = os.environ.get('OPENAI_API_KEY')
pinecone_region = os.environ.get('PINECONE_REGION')
pinecone_cloud = os.environ.get('PINECONE_CLOUD')

pc = Pinecone(api_key=api_key)
use_serverless = True

if use_serverless:
    spec = ServerlessSpec(cloud=pinecone_cloud, region=pinecone_region)
else:
    spec = PodSpec()

# check for and delete index if already exists
index_name = 'hpeai-all'
if index_name in pc.list_indexes().names():
    pc.delete_index(index_name)

# we create a new index
pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of text-embedding-ada-002
        metric='dotproduct',
        spec=spec
    )

# wait for index to be initialized
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

index = pc.Index(index_name)
index.describe_index_stats()

# Format and Chunk the Data 
embeddings = OpenAIEmbeddings(openai_api_key=openai_key)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0,)

def metadata_func(record: dict, metadata: dict) -> dict:
    metadata["title"] = record.get("title")
    metadata["relURI"] = record.get("relURI")
    metadata["description"] = record.get("description")
    metadata["productPath"] = record.get("productPath")
    return metadata

loader = JSONLoader(
    file_path="../../../public/index.json",
    jq_schema=".[]",
    metadata_func=metadata_func,
    content_key="body"
) 

data = loader.load()
pprint(data)
texts = text_splitter.split_documents(data) 
doc_search = PC_Pinecone.from_documents(texts, embeddings, index_name=index_name, namespace="hpeai-all")

pprint(doc_search)

index.describe_index_stats()

Deploy cloud function

import os 
import functions_framework
from dotenv import load_dotenv
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore as LC_Pinecone
from pinecone import Pinecone

load_dotenv()

PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY')
OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
INDEX_NAME = "milodocs"
NAMESPACE = "milodocs"
PC = Pinecone(api_key=PINECONE_API_KEY)
INDEX = PC.Index(INDEX_NAME)
EMBEDDINGS = OpenAIEmbeddings(api_key=OPENAI_API_KEY)


TEMPLATE = """You are a technical writer and Hugo Site Generator expert named Milo. You help people answer questions about the Milo Docs theme. Answer the question based only on the following context:
{context}

Question: {question}
"""
PROMPT = ChatPromptTemplate.from_template(TEMPLATE)
MODEL = ChatOpenAI(model="gpt-4", api_key=OPENAI_API_KEY)
OUTPUT_PARSER = StrOutputParser()

@functions_framework.http
def start(request):
    HEADERS = {
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Methods': 'GET',
        'Access-Control-Allow-Headers': 'Content-Type',
        'Access-Control-Max-Age': '3600'
    }
    if request.method == 'OPTIONS':
        return ('', 204, HEADERS)

    request_json = request.get_json(silent=True)
    request_args = request.args

    question = request_json.get('query') if request_json else request_args.get('query', 'What is Hugo?')
    productFilter = request_json.get('productFilter') if request_json else request_args.get('productFilter')

    # Create a retriever instance for each request with the current productFilter
    VS = LC_Pinecone(
        index_name=INDEX_NAME,
        embedding=EMBEDDINGS,
        namespace=NAMESPACE,
        pinecone_api_key=PINECONE_API_KEY,
    )
    if productFilter:
        # If a productFilter is provided, use it to filter the results based on that metadata
        RETRIEVER = VS.as_retriever(
            search_kwargs={'filter': {'productPath': productFilter or None}}
        )
    else:
        # Search all documents if no productFilter is provided
        RETRIEVER = VS.as_retriever()

    # Setup the chain dynamically
    SETUP_AND_RETRIEVAL = RunnableParallel(
        {"context": RETRIEVER, "question": RunnablePassthrough()}
    )
    CHAIN = (SETUP_AND_RETRIEVAL | PROMPT | MODEL | OUTPUT_PARSER)

    answer = CHAIN.invoke(question)
    response = { "answer": answer }

    return (response, 200, HEADERS)

Source code

Help Wanted
If you know how to successfully separate this JS into its own file in assets/js, please submit a PR. It doesn’t work for me!
<div id="chatContainer" class="hidden sticky top-16 h-[calc(100vh-5rem)] flex flex-col flex justify-end">
    <div id="chat-messages" class="flex flex-col overflow-y-auto text-base">
    </div>
    <div id="chat-controls" class="flex flex-row text-xs mt-2">
        <form id="chat-form" class="flex flex-row">
            <input id="question" type="text" aria-label="Question Input" placeholder="Ask the docs" class="h-10 border rounded-lg p-1 mr-1 focus:outline-none focus:ring-2 focus:ring-brand" />
            <button id="sendButton" aria-label="Send" class="flex items-center bg-brand my-1  hover:bg-black text-white p-1 mr-1 rounded-lg shadow-lg transition duration-300"><img src="/icons/send.svg" alt="Send" class="w-5 h-5"></button>
        </form>
        <button id="clearAll" aria-label="Delete All"  class="flex items-center bg-black my-1 hover:bg-red-600 text-white p-1 rounded-lg shadow-lg transition duration-300"><img src="/icons/delete.svg" alt="Delete" class="w-5 h-5"></button>
    </div>
</div>