We wanted an AI assistant on our own website that could answer questions about our work, services, and products - without sending visitor data to third-party services. Here is how we built it, what we learned, and what we would change next time.
Requirements and Constraints
The assistant needed to do three things well: answer questions about TANGISON's services and products, direct visitors to the right pages, and acknowledge when it doesn't know something. It also needed to run on infrastructure we control, respond within a few seconds, and work on mobile connections with high latency.
We ruled out embedding a third-party chatbot early. The data flows through external servers, the customization is limited, and the pricing scales poorly when you want to maintain context across conversations. More importantly, we build AI infrastructure for a living - using someone else's hosted chatbot felt inconsistent with the work we do.
The architecture needed to be simple enough to maintain with a small team, but flexible enough to improve over time as we learn how visitors actually use it.
Architecture Overview
The assistant runs on our own infrastructure using the Hermes Agent framework, which we developed internally for building production AI agents. When a visitor sends a message, it flows through three stages: retrieval, reasoning, and response.
In the retrieval stage, the system pulls relevant context from a knowledge base that contains our service descriptions, product documentation, and published articles. We use semantic search over embeddings stored in a vector database that runs alongside our application server.
The reasoning stage passes the retrieved context and the conversation history to a language model. We currently use a hosted model through OpenRouter, which gives us access to capable models without maintaining GPU infrastructure ourselves. The system prompt constrains the model to answer based on the retrieved context and to say when it doesn't have sufficient information.
The response stage formats the output and delivers it to the frontend widget. We render a small amount of markdown for structure, and include links to relevant pages when the context suggests them.
The entire round-trip - from user message to displayed response - typically completes in 2-4 seconds on a standard connection.
What We Learned
The retrieval stage matters more than the model. Early on, we spent time tweaking the system prompt and evaluating different models. But the biggest improvements in answer quality came from improving the knowledge base: adding more specific content, removing redundant entries, and structuring documents so that semantic search returns the right passages.
Saying "I don't know" is a feature, not a limitation. We deliberately tuned the assistant to acknowledge uncertainty rather than guess. Visitors trust it more when it admits the limits of its knowledge, and it reduces the risk of providing incorrect information about our services.
Mobile latency is the real performance constraint. Server-side response time is important, but on mobile connections in Namibia, the network round-trip adds significant overhead. We mitigated this by keeping the payload sizes small and using streaming responses so the first token appears quickly.
Analytics drove iteration. We log anonymized conversation topics (not content) to understand what visitors ask about. This data directly informs which knowledge base articles we write or improve next.
What We Would Change
We would start with the knowledge base, not the model. Our initial approach was to get a working prototype with a general system prompt and then refine the knowledge base. In retrospect, investing more in the knowledge base upfront would have produced better results faster.
We would build the analytics layer sooner. Understanding what visitors actually ask - not what we assumed they would ask - changed our priorities. If we had started with basic topic logging, we could have focused the knowledge base on the right content from the beginning.
We would make the widget more accessible from the start. The initial design worked well on desktop but had interaction issues on smaller screens. A mobile-first design for the widget would have saved a round of revisions.
“The retrieval stage matters more than the model. The biggest improvements came from improving the knowledge base, not tweaking the prompt.”
Want to discuss this topic?
Talk to our team about how these ideas apply to your organization.
Get in Touch