Debugging across server and client with Correlation Vectors
Imagine you’re trying to figure out a bug that involves a client and several microservices. Each has their own logging and is potentially in different timezones. You’re interested in a single session, but it involves interactions from each of them.
This is where you can use a Correlation Vector. Microsoft created the spec and use them in bunch of their services. I first learned about them at a secret conference (XFest), and unfortunately none of those training materials are available publicly. But there is this spec and some implementations out there.
It’s not just an id, but a light weight vector clock. From the documentation, its key design goals are to:
- Track causality (partial order) of the flow
- Provide a simple sort independent of system clock time
- Provide the sort/causality tracking capabilities for any arbitrary subset of events in the trace
- Minimize wire cost for upstream components (that tend to use metered connections)
- Simple to understand and implement
If you browse around azure.microsoft.com you’ll see them in use as “MS-CV” HTTP header entries.
Have you used or implemented Correlation Vectors before?