Table of contents

This documentation is intended for internal use by the GDS community.

Reliability Engineering

Reliability Engineering provide a shared platform to GDS teams comprising of tools to set up and maintain a service by:

  • acquiring tools and where appropriate administers them like Logit
  • running off-the-shelf services as internal SaaS such as Prometheus
  • providing patterns and guidance like the PaaS incident process

To understand the context for our decisions and guidance refer to the Service Manual and the GDS Way.