• she/her

Principal engineer at Mercury. I've authored the Dhall configuration language, the Haskell for all blog, and countless packages and keynote presentations.

I'm a midwife to the hidden beauty in everything.

💖 @wiredaemon


discord
Gabriella439
discord server
discord.gg/XS5ZDZ8nnp
location
bay area
private page
cohost.org/newmoon

i created her as part of our company's hackweek and i'm gonna enter her into the hackweek competition; i'm aiming to win!

I ended up training her on our codebase so she can answer specific questions about our code. One of my friends explained a low-tech way to do this using retrieval-augmented generation to me and it basically works like this:

  • Take your corpus of documents and chop it up into chunks small enough to embed

    In this case the corpus of documents i trained her on is our code, but it can also be Slack messages, git history, or Notion documents. Anything you want, really. The chunk size you use depends on the model that you pick for the next step.

  • Map each document onto a high-dimensional vector space using an embedding model

    I used text-embedding-3-large for Senko.

  • Map the user's query onto the same high-dimensional vector space

    … using the same embedding model

  • Find the N documents in the corpus that are "nearest" to the user's query

    For example, you can use a nearest neighbors algorithm to find documents which are close by to the user's query in this high-dimensional vector space.

  • Add the N nearest documents to the prompt

This gives you a prompt that looks something like this:

You are Ada, a foxgirl modeled after Senko from The Helpful Fox Senko-san (Sewayaki Kitsune no Senko-san)…

Here are some documents we've retrieved that might be helpful for answering the question:

{insert the N nearest documents here verbatim, separated by headers}

Here is the user's question:

{insert the user's question}

… and then the model can give really good answers that make use of specific knowledge about our corpus of documents (e.g. our codebase).

So it ends up being a giant prompt but this is fine because newer models can actually work with pretty enormous prompts (~128K tokens) and extending the prompt tends to work much better than fine-tuning the model.


You must log in to comment.

in reply to @fullmoon's post:

Thanks for sharing this Gabby! It’s funny, I’ve been looking for ways to do this, but all the guides I’ve found online have been of the “draw the rest of the owl” variety. Would be really interesting to read more updates!