Problem
Our stack comprises many components/servers that all interact with each other in other to fulfil clients' requests (or prepare data to be served to clients at a later point). For any given request, requests may be spawned to other components and their responses assembled before being returned to the client. This creates the need for having a sort of a distributed stack trace that allows us to pin-point problematic links in the request chain.
A certain degree of request identification does currently exist in our infrastructure, alas only on sub-system levels:
- MediaWiki's WebRequest relies on the UNIQUE_ID env variable provided by Apache's mod_unique_id
- RESTBase and the services behind it use and propagate the X-Request-Id header
- EventBus relies on the same x-request-id header when creating events for both asynchronous updates as well as JobQueue messages
- Thumbor uses a custom Thumbor-Request-Id header
There are probably more such examples.
In order to be able to trace the requests provoked by an (initial/external) request, all of the systems in our infrastructure should identify requests in the same way, use this identifier for logging and propagate it to other links in the request chain.
Proposed Solution
Use a UUID v1/v4 x-request-id header/entity. Varnish f-e (soon ATS) is the main point of entry of external requests. Therefore, it can generate the request IDs and attach them to requests in the form of the x-request-id header, which can then be used and propagated by all entities behind it. Furthermore, entities responding to requests must log the received/generated request ID.
See Also
- T89562: RESTBase should set Request-ID and perhaps X-Forwarded-For headers for external requests
- T97226: Include the request ID in API request logs
- T97207: Forward X-Request-ID header in outgoing requests
- T117021: Request ID for debug log
- T113817: Add request_id to webrequest logs as well as other event records ingested into Hadoop
- T200594: Add client identifier to requests sent from Kartotherian to WDQS
- T193050: Include request id (if present) in a comment in DB queries
- T147101: Uniform performance insight for different services (tracking)