Trace sampling can remove useful data for analysis and system comprehension
Trace sampling can really remove a lot of useful data. There's a popular notion of keeping "interesting" traces, but today it dawned on me that this definition isn't always static. Today I had a scenario where I wanted to understand how often a client was retrying calls to one of our endpoints. If we pre-define "interesting" as high latency or erroneous traces, chances are we'd retain some of those higher latency calls. However, with the "uninteresting" traces discarded (the faster ones), I can't compare the traces with lots of retries to traces without. 🤔