Design(HLD) Zipkin - A request tracing system for distributed applications. Example - A request is flowing through multiple microservices, system should be able to trace the time spent by the reques
MediumImagine you're building a simplified request tracing system inspired by Zipkin. In distributed systems, understanding the flow of requests across multiple services is crucial for debugging and performance analysis. This system will allow us to trace individual requests as they propagate through different services, recording timing information for each hop. This allows pinpointing performance bottlenecks.
The core idea is to assign each request a unique ID and then record "spans" at each service that participates in handling the request. A span represents a single unit of work within a service, and it includes timing information (start and end timestamps), service name, and other relevant metadata.
We need to design a system that allows:
- Services to create new spans for incoming requests or to participate in existing traces.
- Services to record the start and end times of their spans.
- Services to add annotations to spans to record custom events.
- A central collector to receive and store span data.
- A way to retrieve and visualize the traces based on request ID.
For simplicity, we will focus on the core span and trace recording functionality. We will simulate inter-service communication, bypassing the complexities of actual network calls and RPC mechanisms. Our solution must demonstrate solid object-oriented design principles, with a focus on extensibility and concurrency.
Requirements
Interview Simulation
Experience a realistic interview conversation. The interviewer will ask clarifying questions,and you'll reveal your understanding of the requirements.
Let's start by understanding the scope. What are the core functionalities this system needs to provide?
💡 Interview Tip
Identify the Actors (Who uses the system?) and their Use Cases (What are they trying to achieve?). Start with the 'Happy Path' scenarios.