drive-ops/docs/cosmos-puppet-ops.mkd
Micke Nordin 2e0ccdd92d
Add example setup_cosmos_modules script
This patch adds an example script written in python to help people get
started with writing their own implementation. The docs are also
updated.
2023-01-26 09:53:36 +01:00

17 KiB

% System Operations using Cosmos & Puppet % Leif Johansson / SUNET / 2017 / v0.0.5

Introduction

This document describes how to setup and run systems and service operations for a small to mid-sized systems collection while maintaining scalability, security and auditability for changes. The process described below is based on open source components and assumes a Linux-based hosting infrastructure. These limitations could easily be removed though. This document describes the multiverse template for combining cosmos and puppet.

Design Requirements

The cosmos system has been used to operate security-critical infrastructure for a few years before it was combined with puppet into the multiverse template.

Several of the design requirements below are fulfilled by cosmos alone, while some (eg consistency) are easier to achieve using puppet than with cosmos alone.

Consistency

Changes should be applied atomically (locally on each host) across multiple system components on multiple physical and logical hosts (aka system state). The change mechanism should permit verification of state consistency and all modifications should be idempotents, i.e the same operation performed twice on the same system state should not in itself cause a problem.

Auditability

It must be possible to review changes in advance of applying them to system state. It must also be possible to trace changes that have already been applied to privileged system operators.

Authenticity

All changes must be authenticated by private keys in the personal possession of privileged system operators before applied to system state as well as at any point in the future.

Simplicity

The system must be simple and must not rely on external services to be online to maintain state except when new state is being requested and applied. When new state is being requested external dependencies must be kept to a minimum.

Architecture

The basic architecture of puppet is to use a VCS (git) to manage and distribute changes to a staging area on each managed host. At the staging area the changes are authenticated (using tag signatures) and if valid, distributed to the host using local rsync. Before and after hooks (using run-parts) are used to provide programmatic hooks.

Administrative Scope

The repository constitutes the administrative domain of a multiverse setup: each host is connected to (i.e runs cosmos off of) a single GIT repository and derives trust from signed tags on that repository. A host cannot belong to more than 1 administrative domain but each administrative domains can host multiple DNS domains - all hosts in a single repository doesn't need to be in the same zone.

The role of Puppet

In the multiverse template, the cosmos system is used to authenticate and distribute changes and prepare the system state for running puppet. Puppet is used to apply idempotent changes to the system state using "puppet apply".

+------------+     +------+
| cosmo repo |---->| host |-----+
+------------+     +------+     |
                      ^         |
                      |         |
                   (change) (manifests)
                      |         |
                  +--------+    |
                  | puppet |<---+
                  +--------+

Note that there is no puppet master in this setup so collective resources cannot be used in multiverse. Instead 'fabric' is used to provide a simple way to loop over subsets of the hosts in a managed domain.

Private data (eg system credentials, application passwords, or private keys) are encrypted to a master host-specific PGP key before stored in the cosmos repo.

System state can be tied to classes used to classify systems into roles (eg "database server" or "webserver"). System classes can be assigned by regular expressions on the fqdn (eg all hosts named db-* is assigned to the "database server" class) using a custom puppet ENC.

The system classes are also made available to 'fabric' in a custom fabfile. Fabric (or fab) is a simple frontend to ssh that allows an operator to run commands on multiple remote hosts at once.

Trust

All data in the system is maintained in a cosmos GIT repository. A change is requested by signing a tag in the repository with a system-wide well-known name-prefix. The tag name typically includes the date and a counter to make it unique.

The signature on the tag is authenticated against a set of trusted keys maintained in the repository itself - so that one trusted system operator must be present to authenticate addition or removal of another trusted system operator. This authentication of tags is done in addition to authenticating access to the GIT repository when the changes are pushed. Trust is typically bootstrapped when a repository is first established. This model also serves to provide auditability of all changes for as long as repository history is retained.

Access to hosts is done through ssh with ssh-key access. The ssh keys are typically maintained using either puppet or cosmos natively.

Consistency

As a master-less architecture, multiverse relies on eventual consistency: changes will eventually be applied to all hosts. In such a model it becomes very important that changes are idempotent, so that applying a change multiple times (in an effort to get dependent changes through) won't cause an issue. Using native cosmos, such changes are archived using timestamp-files that control entry into code-blocks:

stamp="${COSMOS\_BASE}/stamps/foo-v04.stamp"
if ! test -f $stamp; then
# do something here
touch $stamp
fi

This pattern is mostly replaced in multiverse by using puppet manifests and modules that are inherently idempotent but it can nevertheless be a useful addition to the toolchain.

Implementation

Implementation is based on two major components: cosmos and puppet. The cosmos system was created by Simon Josefsson and Fredrik Thulin as a simple and secure way to distribute files and run pre- and post-processors (using run-parts). This allows for a simple, yet complete mechanism for updating system state.

The second component is puppet which is run in masterless (aka puppet apply) mode on files distributed and authenticated using cosmos. Puppet is a widely deployed way to describe system state using a set of idempotent operations. In theory, anything that can de done using puppet can be done using cosmos post-processors but puppet allows for greater abstraction which greatly increases readability.

The combination of puppet and cosmos is maintained on github in the 'SUNET/multiverse' project.

The Cosmos Puppet Module

Although not necessary, a few nice-to-have utilities in the form of puppet modules have been collected as the cosmos puppet module (for want of a better name). The source for this module is at https://github.com/SUNET/puppet-cosmos and it is included (but commented out) in the cosmos-modules.conf file (cf below) for easy inclusion.

Operations

Setting up a new administrative domain

The simplest way is to clone the multiverse repository. First install 'git'. On ubuntu/debian this is in the 'git-core' package:

# apt-get install git-core

Also install 'fabric' - a very useful too for multiple-host-ssh that is integrated into multiverse. Fabric provides the 'fab' command which will be introduced later on.

# apt-get install fabric

These two tools (git & fabric) are only needed on machines where system operators work.

Next clone git@github.com:SUNET/multiverse.git - this will form the basis of your cosmos+puppet repository:

# git clone git@github.com:SUNET/multiverse.git myproj-cosmos
# cd myproj-cosmos

Next rename the upstream from github - you will want to keep this around to get new features as the multiverse codebase evolves.

# git remote rename origin multiverse

Now add a new remote pointing to the git repo where you are going to be pushing changes for your administrative domain. Also add a read-only version of this remote as 'ro'. The read-only remote is used by multiverse scripts during host bootstrap.

# git remote add origin git@yourhost:myproj-cosmos.git
# git remote add ro https://yourhost/myproj-cosmos.git

Now edit .git/config and rename the 'master' branch to use the new 'origin' remote or you'll try to push to the multiverse remote!

[branch "master"]
    remote = origin
    merge = refs/heads/master

Finally create a branch for the 'multiverse' upstream so you can merge changes to multiverse:

# git checkout -b multiverse --track multiverse/master

Note that you can maintain your repo on just about any git hosting platform, including github, gitorious or your own local setup as long as it supports read-only access to your repository. It is important that the remotes called 'origin' and 'ro' refer to your repository and not to anything else (like a private version of multiverse).

Now add at least one key to 'global/overlay/etc/cosmos/keys/' in a file with a .pub extension (eg 'operator.pub') - the name of the file doesn't matter other than the extension.

# cp mykey.pub global/overlay/etc/cosmos/keys/
# git add global/overlay/etc/cosmos/keys/mykey.pub
# git commit -m "initial trust" global/overlay/etc/cosmos/keys/mykey.pub

At this point you should create and sign your first tag:

# ./bump-tag

If Git complains during the first run of bump-tag that "Your configuration specifies to merge with the ref 'master' from the remote, but no such ref was fetched." then you have run 'git push' to initialize the connection with the remote repository.

Make sure that you are using the key whose public key you just added to the repository! You can now start adding hosts.

Adding a host

Bootstrapping a host is done using the 'addhost' command:

# ./addhost -b $fqdn

The -b flag causes addhost to attempt to bootstrap cosmos on the remote host using ssh as root. This requires that root key trust be established in advance. The addhost command creates and commits the necessary changes to the repository to add a host named $fqdn. Only fully qualified hostnames should ever be used in cosmos+puppet.

The bootstrap process will create a cron-job on $fqdn that runs

# cosmos update && cosmos apply

every 15 minutes. This should be a good starting point for your domain. Now you may want to add some 'naming rules'.

To bootstrap a machine that is not yet configured in DNS, use the following options:

# ./addhost -b -n $fqdn-to-add-later-in-dns -- IP-address

Defining naming rules

A naming rule is a mapping from a name to a set of puppet classes. These are defined in the file 'global/overlay/etc/puppet/cosmos-rules.yaml' (linked to the top level directory in multiverse). This is a YAML format file whose keys are regular expressions and whose values are lists of puppet class definitions. Here is an example that assigns all hosts with names on the form ns<number>.example.com to the 'nameserver' class.

'ns[0-9]?.example.com$':
   nameserver:

Note that the value is a hash with an empty value ('nameserver:') and not just a string value.

Since regular expressions can also match on whole strings so the following is also valid:

smtp.example.com:
   mailserver:
      relay: smtp.upstream.example.com

In this example the mailserver puppet class is given the relay argument (cf puppet documentation).

Fabric integration

Given the above example the following command would reload all nameservers:

# fab --roles=nameservers -- rndc reload

Creating a change-request

After performing whatever changes you want to the repository, commit the changes as usual and then sign an appropriately formatted tag. This last operation is wrapped in the 'bump-tag' command:

# git commit -m "some changes" global/overlay/something or/other/files
# ./bump-tag

The bump-tag command will ask for confirmation before signing and will rely on the git and gpg commands to create, sign and push the correct tag.

Puppet modules

Puppet modules can be maintained using a designated cosmos pre-task that reads the file /etc/puppet/cosmos-modules.conf. This file is a simple text-format file with either three (for puppetlabs modules) or four columns:

#
# name source (puppetlabs fq name or git url) upgrade (yes/no) tag_pattern
#
apt puppetlabs/apt no
concat puppetlabs/concat no
cosmos https://github.com/SUNET/puppet-cosmos.git yes sunet-2*
#golang elithrar/golang yes
python https://github.com/SUNET/puppet-python.git yes sunet-2*
stdlib puppetlabs/stdlib no
ufw https://github.com/SUNET/puppet-module-ufw.git yes sunet-2*
vcsrepo puppetlabs/vcsrepo no
xinetd puppetlabs/xinetd no

This is an example file - the first field is the name of the module, the second is the source: either a puppetlabs path or a git URL. The third field is 'yes' if the module should be automatically updated or 'no' if it should only be installed. The fourth field is a tag pattern to use (same style as the cosmos tag pattern). As usual lines beginning with '#' are silently ignored.

This file is processed in a cosmos pre-hook so the modules should be available for use in the puppet post-hook. By default the file contains several lines that are commented out so review this file as you start a new multiverse setup.

In order to add a new module, the best way is to commit a change to this file and tag this change, allowing time for the module to get installed everywhere before adding a change that relies on this module.

As there might be a need to use different sets of modules (or different tag patterns) on different hosts in an ops-repo, the contents of this file can be controlled in different ways:

  1. If the file is present in the model, it is used as such.
  2. If there is a script called /etc/puppet/setup_cosmos_modules, that script is executed. If the file /etc/puppet/cosmos-modules.conf does not exist after this script runs, proceed to step 3, otherwise use this dynamically generated list of modules.
  3. Use a (very small) default set of modules from the pre-hook global/post-tasks.d/010cosmos-modules.

There is an example implementation of the script to help you get started with writing your own, available in global/etc/puppet/setup_cosmos_modules.example.

HOWTO and Common Tasks

Adding a new operator

Add the ascii-armoured key in a file in global/overlay/etc/cosmos/keys with a .pub extension

# git add global/overlay/etc/cosmos/keys/thenewoperator.pub
# git commit -m "the new operator" \
   global/overlay/etc/cosmos/keys/thenewoperator.pub
# ./bump-tag

Removing an operator

Identify the public key file in global/overlay/etc/cosmos/keys

# git rm global/overlay/etc/cosmos/keys/X.pub
# git commit -m "remove operator X" \
   global/overlay/etc/cosmos/keys/X.pub
# ./bump-tag

Merging new features from multiverse

The multiverse template will continue to evolve and sometimes it may be desirable to fetch a new feature from the upstream multiverse repository. If you followed the setup guide and kept the 'multiverse' remote this how you go about synchronizing with that version:

# git checkout multiverse
# git pull
# git checkout master
# git merge multiverse

Now resolve any conflicts (hopefully few and far between) and you should end up with a combination of the features in your domain and those in multiverse. Note that you can optionally add more remotes referencing other development branches of multiverse and merge changes from more than one upstream. The sky is the limit.

Changing administrative domain for a host

Below $old and $new refers to local copies (git clone) of the old and new repository.

In the $new repository add the host and use fabric to change the repository of the host to the git URL of the new repository.

# ./addhost -b $hostname
# fab -H $hostname chrepo repository:git://other/repo.git

In the $old repository:

# rsync -avz $hostname/ $new/$hostname/

In the $new repository:

# git add $hostname/
# git commit -m "transfer from $old" $hostname
# ./bump-tag

In the $old repository:

# git rm -rf $hostname
# git commit -m "remove $hostname"
# ./bump-tag

Running a command on multiple hosts

On a single host:

# fab -H $hostname -- command -a one -b another -c

On multiple hosts based on category:

# fab --roles=webserver -- ls /tmp

On all hosts:

# fab -- reboot # danger Will Robinsson!