Reliability Engineering provide a shared platform to GDS teams comprising of tools to set up and maintain a service by:
- acquiring tools and where appropriate administers them like Logit
- running off-the-shelf services as internal SaaS such as Prometheus
- providing patterns and guidance like the PaaS incident process
The Reliability Engineering documentation found on this site is intended to help the rest of GDS find out what Reliability Engineering is and what we’re doing. If you’re a member of Reliability Engineering or just curious about our team processes and ongoing work then please take a look at our Team Manual.