Skip to main content

This documentation is intended for internal use by the GDS community.

Continuous Deployment

The Reliability Engineering team uses Concourse for testing and deploying code to production.

Concourse is a versatile, open-source, continuous thing-doer which can be used for task automation, running tests and status checks against pull-requests, and deploying code.

The Reliability Engineering team help you get started with Concourse, and ensure that the infrastructure is available.

Tenancy model

The Reliability Engineering team’s Concourse is multi-tenanted:

  • Concourse is configured with many teams

  • Each teams workloads run on separate infrastructure

  • Users in one team cannot interact with the pipelines of another team

    • (unless the user is a member of both teams)

By default, all Concourse workers share the same egress IP addresses, unless configured otherwise. If your team needs separate egress IP addresses, then please ask Reliability Engineering to make the configuration changes.

The Reliability Engineering team’s Concourse is managed as follows:

  • Concourse is configured to delegate secrets management to AWS Systems Manager Parameter Store

  • Concourse worker instances are recreated every morning to ensure they have the latest security patches

  • Users authenticate using GitHub SSO

    • (if you need Google SSO instead of GitHub, ask the Reliability Engineering team)
  • Concourse is accessible from the GDS network (Whitechapel and the VPN)

Getting started with Concourse

Concourse has comprehensive documentation and interactive examples.

Teams

The Reliability Engineering team configured Concourse so that the workloads of different teams run on different infrastructure. This design decision ensures that:

  • the time taken to schedule and run workloads stays predictable
  • the workloads of each team are isolated to separate pools of resources

Teams do not have to represent actual organisational units, and can instead be used for fine-grained control over permissions.

For example: a service team could have two teams:

  • service-team-dev
  • service-team-deploy

Where:

  • service-team-dev is for pull-requests and management of development environments
  • service-team-deploy is for deploying trusted code to production

Where:

  • service-team-dev includes all service team members
  • service-team-deploy includes only team members who are trusted to deploy code

Roles

Concourse has 5 roles, which are explained in depth in the documentation.

When the Reliability Engineering team configure your Concourse team, you should decide which Concourse team role should be configured. The role you choose will be allocated to GitHub teams, so that anyone in the GitHub team is given that role.

The roles you should choose between are usually:

  • member
  • pipeline-operator

Members can manipulate pipelines and update the configuration of pipelines from the command line. This role should be used in most cases.

Pipeline-operators can only interact with pipelines, but cannot update their configuration. This role should be used where a strict 2-pairs-of-eyes code review policy is enforced.

Services

For convenience, the Reliability Engineering team provision the following resources for each Concourse team:

  • A public AWS S3 bucket (publicly readable from the internet by anyone)
  • A private AWS S3 bucket (readable only by the team’s Concourse workers)
  • A private AWS Elastic Container Registry (ECR) (readonly only by the team’s Concourse workers)

which can be accessed only by the individual Concourse team’s worker AWS IAM roles.

Each Concourse team’s worker instances have a specific AWS IAM role, this allows Concourse to assume roles in other AWS accounts, if those accounts have a role which Concourse is permitted to assume.

Secrets

The Reliability Engineering Concourse delegates secrets management to AWS Systems Manager Parameter Store and Key Management Service.

Secrets can be managed by users with the member role using the GDS CLI:

gds cd secrets add $PIPELINE/SECRETNAME $SECRETVALUE

and

gds cd secrets rm $PIPELINE/SECRETNAME

Concourse injects secrets into pipelines at runtime using double parentheses syntax:

resources:
  - name: my-repository
    type: git
    source:
      uri: git@github.com:alphagov/my-repository.git
      branch: master

      # This is a secret
      private_key: |
        ((github-ssh-key))

For your convenience when writing Concourse pipelines, there are also some variables provided for you which are generated automatically and are immutable (hence the readonly_ prefix).

  • readonly_access_key_id, readonly_secret_access_key, readonly_session_token - AWS credentials corresponding to your team. Can be used with things like the semver resource to access your S3 bucket, or with registry-image for authenticating to ECR.

  • readonly_private_bucket_name - the name of the private AWS S3 bucket

  • readonly_private_bucket_arn - the ARN of the private AWS S3 bucket

  • readonly_private_bucket_domain_name - the domain name of the private AWS S3 bucket

  • readonly_public_bucket_name - the name of the public AWS S3 bucket

  • readonly_public_bucket_arn - the ARN of the public AWS S3 bucket

  • readonly_public_bucket_domain_name - the domain name of the public AWS S3 bucket

  • readonly_private_ecr_repo_name - the name of the private AWS ECR

  • readonly_private_ecr_repo_arn - the ARN of the private AWS ECR

  • readonly_private_ecr_repo_url - the URL of the private AWS ECR

  • readonly_private_ecr_repo_registry_id - the ID of the private AWS ECR

  • readonly_codecommit_pool_uri - the URI of the pool resource git repository

  • readonly_codecommit_private_key - the private key for the pool resource git repository

  • readonly_team_name - the name of the Concourse team

  • readonly_local_user_password - the password for a local user of the Concourse team, which is used for updating pipelines

  • readonly_secrets_path_prefix - the secrets path prefix, which is used for managing secrets

  • readonly_secrets_kms_key_id - the AWS KMS Key ID, which is used for encrypting secrets at rest

Monitoring

The Reliability Engineering team’s Concourse is monitored using:

  • Prometheus - collection of metrics
  • Grafana - alerting and dashboards
  • Splunk - audit logs

Concourse emits metrics which can be queried to get metrics about pipelines.

For example, the following query shows how often the internal-apps pipeline is run over a 12 hour period.

sum
  without (instance, status) (
    increase(
      concourse_builds_finished{team="autom8", pipeline="internal-apps"}[12h]
    )
  )

Example pipeline

Deploy an application to GOV.UK PaaS

This pipeline clones a public open source git repository, and deploys it to GOV.UK PaaS

Requirements

There must be the following variables set:

  • paas-username
  • paas-password

Pipeline

---
resources:
  - name: my-git-repo
    type: git
    source:
      branch: master
      uri: https://github.com/alphagov/my-repo.git
  
  - name: my-paas-app
    type: cf
    source:
      api: https://api.london.cloud.service.gov.uk
      organization: my-service-team
      space: my-space
      username: ((paas-username))
      password: ((paas-password))

jobs:
  - name: test-and-build
    public: false
    plan:
      - get: my-git-repo
        trigger: true
  
      - task: build
        config:
          platform: linux
  
          image_resource:
            type: docker-image
            source:
              repository: ruby
              tag: 2.7

          inputs:
            - name: my-git-repo
          outputs:
            - name: my-git-repo

          run:
            path: sh
            dir: my-git-repo
            args:
              - -exc
              - |
                bundle install
                bundle exec middleman build
  
      - put: my-paas-app
        params:
          manifest: my-git-repo/manifest.yml
          path: my-git-repo

Pool resource

The pool-resource can be used for locking resources or environments between separate pipelines.

Examples

Locking an environment for testing

If you had two apps deployed to a test environment by separate pipelines, and you wanted to run functional tests in these pipelines, without running two sets of functional tests at the same time.

Preventing concurrent deploys

If you had several microservices deployed by separate pipelines, but you want to ensure that only a single microservice could be deployed at one time.

Limiting concurrency

If you have several pipelines which use the same environment, but you want to ensure that 3 deployments can be active at the same time.

How it works

The pool-resource is backed by a git repository, where each pool is a directory with two subdirectories:

  • claimed
  • unclaimed

where unclaimed locks are files inside the unclaimed directory, and claimed directory contains files representing claimed locks.

.
├── pool-1
│   ├── claimed
│   └── unclaimed
├── pool-2
│   ├── claimed
│   └── unclaimed
└── pool-3
    ├── claimed
    │   └── f3cb3823-a45a-49e8-ab41-e43268494205
    └── unclaimed

Setting up the pool-resource

You should create a pipeline to set up and manage your team’s pool-resource.

The following pipeline sets up three pools:

  • pool-1
  • pool-2
  • pool-3

⚠️ Re-running the init-pool job will reset the state of any locks.

---
resources:
  - name: pool-repo
    type: git
    icon: github
    source:
      branch: pool
      uri: ((readonly_codecommit_pool_uri))
      private_key: ((readonly_codecommit_private_key))

jobs:
  - name: init-pool
    serial: true
    plan:
      - task: init-pool
        config:
          platform: linux
          image_resource:
            type: registry-image
            source:
              repository: governmentpaas/git-ssh
              tag: latest
          outputs:
          - name: repo
          run:
            dir: repo
            path: sh
            args:
            - -euo
            - pipefail
            - -c
            - |
              git init .
              for pool in pool-1 pool-2 pool-3; do
                mkdir -p "$pool/claimed"
                mkdir -p "$pool/unclaimed"
                touch "$pool/claimed/.keep"
                touch "$pool/unclaimed/.keep"
                git add "$pool"
                git commit -m "setup $pool"
              done

      - put: pool-repo
        params:
          repository: pool
          force: true

Using the pool-resource

Once you have set up a pool-resource, you can use it to claim and release locks.

---
resource_types:
  - name: pool
    type: registry-image
    source:
      repository: concourse/pool-resource
      tag: 1.1.3

resources:
  - name: pool
    type: pool
    icon: pool
    source:
      branch: pool
      uri: ((readonly_codecommit_pool_uri))
      private_key: ((readonly_codecommit_private_key))
      pool: pool-1

  - name: my-lock-config
    type: mock
    source:
      create_files:
        name: my-lock
        metadata: ''

jobs:
  - name: add-my-lock
    serial: true
    plan:
      - get: my-lock-config

      - put: pool
        params:
          add: my-lock-config

  - name: use-my-lock
    serial: true
    plan:
      - put: pool
        params:
          claim: my-lock

      - task: do-a-thing-with-my-lock
        config: # omitted for brevity

      - put: pool
        params:
          release: pool

Workers

Each Concourse team has 1 or more worker nodes assigned to it, of potentially different instance types.

Sometimes a worker node will begin to exhibit strange behaviour and cause some pipelines to cease functioning properly. Without needing to involve the Automate team, you may find you are able to get rid of the worker by going to your team’s info pipeline and running start-worker-refresh. This should launch a job in the background for AWS to replace the worker node, though it does rely on some basic level of functionality from the existing workers and may not do the trick in all cases. Note that there may be disruption to your team’s running tasks while this is ongoing. Future work may allow us to provide a more reliable way to replace your team’s worker nodes.