Teaching AI How to Refinery

We released a new tool for our MCP server which helps your AIs understand Refinery, and how Honeycomb handles sampling. In this blog, I’ll walk you through the new feature and show how it can help you run Refinery.

By: Tyler Helmuth

| February 17, 2026

Sampling

AI & LLMs

Guides

March 11, 2025

The Engineer’s Guide to Sampling

At the beginning of February, we released v3.1 of Refinery, our advanced, tail-based sampling solution. The new version comes with more performance enhancements, bug fixes, and a few new pieces of telemetry.

In tandem with the 3.1 release, we also released a new tool for our MCP server which helps your AIs understand Refinery, and how Honeycomb handles sampling. With this new tool, you can ask Canvas or your favorite AI questions, like: “What are my sampling rules doing?” “Is my refinery configured correctly?” or “Am I achieving my goal sample rate?”

In this blog, I’ll walk you through the new feature and show how it can help you run Refinery.

Explore Honeycomb Telemetry Pipeline

Make cost control feel safe, smart, and scalable.

Learn More

How we found a bug in our own sampling rules

A great example of how useful this feature can be is how it found an issue in our own sampling rules during development. Once we had written the static content about Refinery, I wanted to test it using an AI assistant. What better way to test than to ask questions about our own sampling?

We generate a lot of telemetry from our ingest pipeline, so we use Refinery to sample the data. We use a RulesBasedSampler with seven different rules. One of our rules is set up to ensure that all our end-to-end test data (which runs roughly every minute in production) is all kept.

- Name: E2E Data
  Conditions:
    - Field: service.name
      Operator: =
      Value: shepherd
    - Field: app.environment.id
      Operator: =
      Value: 49427
  SampleRate: 1

Some very astute Refinery readers may already see the issue, but at the time, I didn’t.

While testing, I asked an AI assistant that was hooked up to Honeycomb’s MCP server, Look at my production_us1_refinery_values.yaml production refinery rules. What do they do? Use the honeycomb mcp to learn about Refinery first.

It used the new Refinery tool to learn about Refinery, review the rules, and give a detailed answer. For this story, the important part of the response was:

Sampling Rules Breakdown

1. E2E Data Rule (lines 128-137)
  - Keeps: 100% of traces (SampleRate: 1)
  - When: shepherd service AND app.environment.id = "49427" (string comparison)
  - Purpose: Preserve all E2E test traces for the test environment

Again, readers who have hit this issue before may see that the AI already knows something really important about app.environment.id. But again, I didn’t see it.

This is all I needed to test; the ability for an AI to accurately review and critique Refinery rules was our goal, and goal achieved! But I’ve been going through my own journey learning how to use AI, and I was curious (and skeptical) if it would be able to apply its knowledge to the data itself.

So, I asked a follow-up question: Based on your understanding of the rules, is the data in shepherd dataset being sampled correctly? Also look in the refinery-metrics dataset. Use the honeycomb mcp server to review this data.

The AI used the Honeycomb MCP to query data and assess if it was sampling correctly based on the rules it had reviewed. Eventually, it returned an unexpected answer:

CRITICAL ISSUE: E2E Data Not Sampled Correctly ❌                                                                                                            

  Problem: E2E environment traces (app.environment.id = 49427) are matching the wrong rule.                                                                    

  - Expected: Should match "E2E Data" rule (lines 128-136) with SampleRate: 1 (100% retention)                                                                 

  - Actual: Matching "DeterministicSampler for pro plus/enterprise handle\_batched\_event" rule with SampleRate: 10                                              

  - Impact: E2E data is being sampled at 1:10 instead of keeping all traces

Since rules are evaluated in order and the E2E rule comes FIRST, it should match before the Pro/Enterprise rule. The fact that it's not matching suggests:   
                                                                                                                                                               
  1. Possible cause: The app.environment.id field may not exist or not be set to 49427 on all spans in E2E traces                                              
  2. Rule evaluation: Without Scope: span on the E2E rule, it uses Scope: trace (default), meaning conditions can match across different spans - but if        
  app.environment.id doesn't exist anywhere in the trace, the rule fails

I was very skeptical, as this rule has been in place for a long time. Surely, this was just another case of AI getting it wrong, right?

Wrong.

I told it to try again: This would be pretty serious. Rerun your analysis to confirm your math is correct. Since the e2e rule comes first it should take priority. Test your hypothesis that app.environment.id does not exist.

While it ran, I went and searched the data myself and found the same result the AI had: the E2E Data rule was never being hit. I couldn’t think of why that would be happening.

At this point, my AI assistant was done thinking and returned the answer for me. I’m including the entire output this time because it did such a good job of supporting its claim.

FOUND THE BUG\! ✅                                                                                                                                            

  Looking at the YAML, I can now provide my final analysis. The data confirms:                                                                                 

  Confirmed Issue                                                                                                                                              

  388 E2E traces exist that match BOTH conditions (service.name = shepherd AND app.environment.id = 49427), but 0 traces match the E2E rule.                   

  Root Cause                                                                                                                                                   

  The most likely issue is that the integer value 49427 on line 135 needs to be quoted as a string in Refinery's configuration:                                

  Value: "49427"  \# Should be a string, not an integer                                                                                                         

  Why This Matters                                                                                                                                             

  In Refinery's rule evaluation, field values may need to be strings regardless of the underlying Honeycomb field type. Even though app.environment.id is stored as an integer in Honeycomb, Refinery may perform string-based comparison during rule evaluation.                                                      

  Evidence                                                                                                                                                     

  1. Working rules: The Pro/Enterprise rule uses Value: handle\_batched\_event (string) and Value: shepherd (string) - both work                                 

  2. Failing rules: The E2E rule uses Value: 49427 (unquoted integer) - doesn't match                                                                          

  3. Boolean example: The API requests rule uses Value: true with Datatype: bool - also doesn't match (similar type issue)                                     

  Impact Assessment                                                                                                                                            

  - Current State: \~388 E2E traces/hour are being sampled at 1:10 instead of 1:1                                                                               

  - Data Loss: \~90% of E2E test traces are being dropped                                                                                                       

  - Severity: HIGH - E2E monitoring is significantly degraded                                                                                                  

  Recommendation                                                                                                                                               

  Change line 135 from:                                                                                                                                        

  Value: 49427                                                                                                                                                 

  To:                                                                                                                                                          

  Value: "49427"                                                                                                                                              

  This should be tested in a non-production environment first to verify it fixes the rule matching.

I couldn’t believe I had missed this bug. This is a well-known footgun in Refinery that is easy to get wrong. Back in Refinery 1.20, we even added support for Datatype so that users could explicitly set the expected type to use when comparing! It’s hard to figure out, though (hence the defensive Datatype feature), as it requires a deep understanding of both your rules and your data. In this case, when we set these rules up a long time ago, we made this mistake and never checked if we were sampling correctly (whoops).

Conclusion

This experience really opened my eyes on how good AI can be when pattern-matching correctly. It also highlights how flexible they can be when given the proper tools. We even found a second issue in our Refinery configuration by doing the same kind of “teach” and then “ask” flow.

With the power of Honeycomb’s MCP, all your high-cardinality, high-dimensionality data is readily available for your favorite AI assistant to be your pattern-matcher. Be curious, ask questions; you might be surprised what answers you’ll receive.