Table of contents

This documentation is intended for internal use by the GDS community.

Reliability Engineering

Reliability Engineering provide a shared platform to GDS teams comprising of tools to set up and maintain a service by:

  • acquiring tools and where appropriate administers them like Logit
  • running off-the-shelf services as internal SaaS such as Prometheus
  • providing patterns and guidance like the PaaS incident process

To understand the context for our decisions and guidance refer to:

The Reliability Engineering documentation found on this site is intended to help the rest of GDS find out what Reliability Engineering is and what we’re doing. If you’re a member of Reliability Engineering or just curious about our team processes and ongoing work then please take a look at our Team Manual.