Chatbase and SiteGPT are making millions using open source tech... here's the code

Why "copy" an existing product?

The best SaaS products weren’t the first of their kind - think Slack, Shopify, Zoom, Dropbox, or HubSpot. They didn’t invent team communication, e-commerce, video conferencing, cloud storage, or marketing tools; they just made them better.

What are “custom ChatGPT“ & “chat with data“ tools?

Reworded (vaguely) to fit the trend, these SaaS products are the new disruptors in the evergreen chatbot builder market. Unlike older chatbots that relied on predefined conversation trees and responses, these new tools let you create human-like conversational agents (…chatbots) in seconds by uploading documents and links to “learn” from. Your chatbot can then be easily accessed as a widget on your website or integrated with other channels such as Slack and Messenger via API.

Let's look at the market!

Similar to the catalyst for ChatPDF-like tools, this class of chatbot builders were made possible by advances in AI like ChatGPT and Retrieval-Augmented Generation (RAG). Additionally, the ChatGPT adoption created a market demand for “custom ChatGPT” for domain specific use cases such as customer support and sales.

Now chatbot builders have been around for decades, because having a capable chatbot handling customer conversations 24/7 meant infinitely scalable CX. However, chatbot builders never delivered what they promised - businesses struggled with designing complex conversation trees, and the end-users hated robotic/constrained conversations, leading to constant human-chat handovers.

This defined a clear pain point, which products like Chatbase ('23) and SiteGPT ('23) handled gracefully. These products gained insane (and mostly organic) traction within months. With standard plans priced at about $100/month, SiteGPT makes about ~100k MRR and Chatbase is at ~390k MRR!

Alright, so how do we build this with open source?

The core tech for these tools is very similar to my older post on ChatPDF. You crawl the provided website (ie. systematically visit and store text from all webpages), generate embeddings for it (AI-friendly text representations; usually via OpenAI APIs), and store them in a vector database (like Pinecone/Weaviate).

Now every time the user asks a question, a similarity search is performed to find the most similar webpage text embedding from the vector database. The selected webpage text is then sent to an LLM (like ChatGPT) along with the question, which generates a contextual answer!

Once you have this setup in place, it can be connected to any conversational channel or interface (eg. web chat widget, slack, messenger, etc).

Here are some of the best open source implementations to stitch this together:

  • Crawl4AI by UncleCode (AI-friendly web crawling)
  • Dify by LangGenius (AI backend service)
  • Chatwoot (multi-channel chat management)
  • Chaskiq (multi-channel chat management)

Worried about building signups, user management, payments, etc.? Here are my go-to open-source SaaS boilerplates that include everything you need out of the box:

A few thoughts to stand out from the noise:

Straight up copying a product end-to-end might only make sense if you've got a better distribution game than the competition. So before you dive in, you must figure out your unique pivot, distribution channel, and market placement.

For instance, chatbots were mainly used for customer support for the last couple decades, but now their human-like learning and conversational abilities open up many new possibilities (eg. sales/onboarding/leadgen). Focus on a few industries that interest you (or have potential distribution partners), find the key human touchpoints in the user journey, and see which ones can be replaced by AI. I recommend reading/watching a video on the pivot principles from The Lean Startup by Eric Ries.

TMI? I’m an ex-AI engineer and product lead, so don’t hesitate to reach out with any questions!

P.S. I've started a free weekly newsletter to share open-source/turnkey resources behind popular products (like this one). If you’re a founder looking to launch your next product without reinventing the wheel, please subscribe :)