How Our Support Team Uses Honeycomb to Debug Honeycomb

Our support team leverages our own tools, Canvas and the Honeycomb MCP, to navigate vast and complex internal telemetry when debugging customer issues. This shift means we spend far less time on query mechanics and detective work, and customers get answers faster.

By: Sara Cave

| June 2, 2026

Incident Response

Debugging

Dogfooding

Webinars

August 14, 2025

Introducing Honeycomb MCP: Your AI Agent’s New Superpower

Watch Now

How Honeycomb's Support Team Uses Honeycomb to Debug Honeycomb

You'd think that working at an observability company means everyone knows exactly where to find everything in the data. It doesn't. Especially not on the support team.

We're the ones who get the tickets. We're in the telemetry every day trying to figure out what went wrong for a customer, and we do that by pointing Honeycomb at itself. Here's how that actually works, and how it's changed.

Everyone else's telemetry

If you're on a development team, you generally know your part of the system. You named the fields, you know the datasets, you've got a feel for where to look when something breaks.

Support is different because a ticket could be about literally anything, and the reports we get aren't usually detailed. Someone sends a screenshot of a graph that looks off. Or they write "something seems wrong with my queries" and that's the entire bug report. We have to take whatever that is and turn it into something we can investigate.

That translation step was, for a long time, the hardest part of the work.

Honeycomb's internal telemetry has close to 200 datasets. The system has been around for years, and naming isn't always consistent. You go to look up a team ID, and it could be app.team_id or app.team.id or team-id. All three exist as separate fields. Multiply that across a decade of telemetry decisions and it adds up. Each one was probably fine at the time, but when your job requires navigating all of it (not just your team's slice), it gets heavy.

Before Canvas, the blank page problem was constant. You'd open the query builder, genuinely not know which dataset to start with, and ping a teammate. Sometimes they'd send a dog food link. Sometimes they'd walk you through a query they remembered using for something similar. We'd even built shared boards so people had somewhere to start. Query URLs flew back and forth in Slack all day, and it worked well enough, but we were spending a lot of time just figuring out the mechanics of how to query before we could get to the why.

Canvas killed the blank page

Canvas was the first thing that changed this. You don't need to know the exact dataset or remember the right column name anymore. You describe what you're looking for, and Canvas builds the query and runs it. You get a visualization back that you can refine as you go. Need to BubbleUp a time window where things looked bad compared to when they were healthy? Just ask.

We're still the ones driving things. We still decide which thread to pull, what to rule out, and where to look next. But all the energy that used to go into getting the query syntax right now goes into the investigation instead. You can run three or four queries across different datasets and follow where the data takes you, rather than spending all your effort on getting one query to work.

The research stack

Canvas works well when you're already inside Honeycomb. But tickets don't just need telemetry. There's a whole research layer around every investigation.

This is where the MCP (Model Context Protocol) comes in. It lets an AI assistant pull context from Honeycomb and the other tools we use during an investigation. The interesting part happens when you wire up several MCPs together.

Say a customer writes in about something weird with their triggers. We need to figure out if anyone internally has seen this before, whether engineering already knows about it, and how the feature works under the hood. That used to mean a lot of tabs: Slack, Linear, the codebase, our docs, maybe Snowflake too if the data lives outside telemetry. You'd copy things between them and try to hold the whole thread in your head while jumping around.

Now, the entire loop happens in one place. You start with "What is this customer describing?" and end up at "OK, here's the code path, engineering already has a Linear ticket, and the telemetry confirms it" without ever losing your train of thought.

What escalations look like now

A customer reported that something wasn't working right. No error message, nothing obvious. The kind of ticket that used to get passed to engineering with "Can you take a look?" and not much else. Engineering would then have to reproduce it, figure out where the problem was, and work backwards from there. The customer is waiting the whole way through.

With the MCP and Canvas, we can do most of that investigation before engineering even gets involved. On this particular ticket, we searched the codebase, found the root cause, confirmed the customer's data was fine, and found a previous fix for the same pattern. By the time we escalated, engineering had everything they needed to go straight to a fix.

That's the part that changed the most: escalations used to be the beginning of an investigation. Now they're closer to the end of one.

This is more than just a support story

If you've ever had to debug a service you didn't build, or investigate an alert for a system you're still learning, you've hit the same wall we did. Canvas and the MCP work for anyone who needs to investigate something in unfamiliar territory. You can start doing useful work before you've memorized the whole system.