Commit graph

108 commits

Author SHA1 Message Date
Micke Nordin 20ddc3c257
Start on cosmos-rules 2024-10-11 17:39:52 +02:00
Patrik Lundin aa88795ee0
sunet-fleetlock: also handle ReadTimeout
Turns out this was not caught by ConnectionError.
2024-07-03 14:13:22 +02:00
Patrik Lundin 01768129f0
fleetlock: configurable lock/unlock timeout
While we already support setting a healthcheck timeout it probably
makes sense to be able to control how long we wait for a
fleetlock_lock() or fleetlock_unlock() call. This becomes important if
only running cosmos once a night or something like that. In that case we
you probably want to give a physical machine more than than 1 minute to
complete a reboot etc.

This can now be controlled by setting fleetlock_lock_timeout and
fleetlock_unlock_timeout in /etc/run-cosmos-fleetlock-conf. Keep in mind
that while it can make sense to increase the time for taking a lock,
releasing a lock should always be fast (either you have it and release
it, or you dont have it and it is a no-op) so setting a long unlock
timeout should probably never be done.

Since we also potentially wait the unlock timeout at boot (if the
fleetlock server is broken etc) that is another reason to keep it
short. The default 1m is probably OK for most uses.
2024-07-03 13:27:52 +02:00
Patrik Lundin 443611dd3f
Merge pull request #49 from SUNET/john-permissions-fix
Enforce more strict permissions for files in Cosmos
2024-07-03 11:36:21 +02:00
Patrik Holmqvist 4231b4ac1d
Migrate from legacy fact
This did not work on modern puppet in ubuntu24:
Warning: Interpolation failed with '::lsbdistcodename', but compilation continuing;
New syntax inspiration from:
https://www.puppet.com/docs/puppet/8/hiera_config_yaml_5#configuring_hiera
2024-06-19 14:07:13 +02:00
Patrik Holmqvist bc9d1dc960
Use upstream puppet modules for ubuntu24+.
This is how we do it in modern debian so it
makes sense to do it on modern ubuntu as well.
2024-06-19 14:02:24 +02:00
Patrik Lundin e315282bc5
Use more strict exception checking
This is probably wide enough and we do not need weird extra handling of
our own execption etc.

Thanks to @mickenordin for keeping me honest :).
2024-06-17 12:40:12 +02:00
Patrik Lundin 4b8b8887f6
sunet-fleetlock: handle connection errors
In order to handle upgrades of the fleetlock server when running only
one server we need to handle connection errors like connection refused
or timed out errors gracefully.

Because there are several different ways the connection can fail and it
is hard to keep track of them all, just catch everything. We then also
need special handling of our own timeout execption so we are not
accidentally stuck retrying forever.

Also fix so we actually use the request_timeout arg for individual HTTP
requests instead of the global timeout.

While here run isort to keep imports tidy.
2024-06-17 12:07:22 +02:00
Johan Wassberg c72f5ccd86
Allow for hosts without class(s) 2024-04-12 15:32:40 +02:00
Patrik Lundin df5558befb
Fix another indentation mismatch 2024-01-24 15:36:52 +01:00
Patrik Lundin 4b93d9c426
run-cosmos: support fleetlock unlocking at boot
This extends run-cosmos with a new argument that calls the unlock
function already included in the script as well as using the already
existing lock() function to make sure there is no race between the
bootup process and cron starting a normal run-cosmos process at the same
time.

The oexit() function is added to support exiting with a OK exit value
the same way eexit() is used to signal something is wrong.

This change also adds the systemd unit file that runs run-cosmos with the
new fleetlock-unlock argument at boot if fleetlock is configured.

While here fix indentation that was mixed between 3 and 4 spaces: it is
now 4 spaces everywhere.
2024-01-24 15:36:34 +01:00
John Van de Meulebrouck Brendgard 8d4ce2d1b7
Make sure that COSMOS_BASE is only readable
by root since it's possible that the directory
can contain files that after applying the
overlay to / only should be read or writable
by root.
2023-11-17 15:03:47 +01:00
John Van de Meulebrouck Brendgard 75e566ab61
Make sure that /root in overlay is owned by root
as well as that /root/.ssh and its content is
only owned and readable by root. This is redundant
if the previous permissions were properly applied
and no other changes have been made by the user
or something else, but is added for good measure
as a layered defense.
2023-11-17 14:58:51 +01:00
John Van de Meulebrouck Brendgard ca353ed406
Set same permissions for /root/.ssh/authorized_keys
in post-tasks.d/010fix-ssh-perms as is done by
Puppet with sunet::ssh_keys.
2023-11-17 13:50:02 +01:00
Johan Wassberg a6a67d355f
Diffable 2023-11-14 15:28:46 +01:00
Johan Wassberg 120c4a5a93
A few more depends for Bookworm 2023-11-14 15:27:45 +01:00
Johan Wassberg 58a9ca7aa9
No need of x11 on our servers 2023-10-02 12:39:44 +02:00
Micke Nordin 3aac1f97d8
Add additional packages for use with debian 12
This patch will install three packages that is needed for normal operations of puppet using puppet-sunet with multiverse on Debian 12:

cron puppet-module-puppetlabs-cron-core puppet-module-camptocamp-augeas
2023-07-10 16:32:20 +02:00
Patrik Lundin 7baf9affb1
Add fleetlock support to run-cosmos
Makes run-cosmos request a fleetlock lock before running cosmos "update"
and "apply" steps. This is helpful for making sure only one (or several)
machine out of some set of machines runs cosmos changes at a time. This
way if cosmos (or puppet) decides that a service needs to be restarted
this will only happen on a subset of machines at a time. When the cosmos
"apply" is done a fleetlock unlock request will be performed so the
other machines can progress.

The unlock code in run-cosmos will also run the new tool
sunet-machine-healthy to decide things are good before unlocking. This
way if a restarted service breaks this will stop the unlock attempt
and in turn make it so the others should not break their service as
well, giving an operator time to figure out what is wrong.
2023-06-17 08:10:00 +02:00
Johan Wassberg cf2e6b6518
File provided by Sunet::Dockerhost 2023-04-04 15:21:15 +02:00
Johan Wassberg 5af8093338
Add support for eyaml in Hiera
And at the same time remove support for gpg.

The modern version of the configuration (v5) has been tested with 20.04 but
might work with older dists.
2023-02-16 07:44:37 +01:00
Fredrik Thulin c400bba97d
remove 'make db'
The db-file, essentially providing reverse lookup of classes to host
names, is only used by some Nagios configuration instances and causes
continuing operational headaches in those ops-repos.

It should be kept/refactored to only apply to the monitoring hosts in
the cases where it is used, but we don't want any new ops-repos to use
it hence it should be removed from upstream multiverse.
2023-02-07 14:21:29 +01:00
Fredrik Thulin 12b2412ea7
run cron at boot too, to e.g. get new firewall rules installed 2023-02-06 17:12:01 +01:00
Fredrik Thulin 79606f2a6d
check for /etc/no-automatic-cosmos in the wrapper, and allow arguments to be passed 2023-02-06 16:47:41 +01:00
Fredrik Thulin 3988f5beb0
shellcheck fixes 2023-02-06 16:47:30 +01:00
Patrik Lundin 906edf3caf
Merge pull request #32 from SUNET/feature-ft-install_eyaml
Install eyaml on newer hosts
2023-02-06 12:31:31 +01:00
Fredrik Thulin 708c6c1b64
add set -e, and do some shellcheck cleanup 2023-02-03 16:05:09 +01:00
Fredrik Thulin 25463e6013
respect COSMOS_VERBOSE 2023-02-03 16:04:51 +01:00
Fredrik Thulin f9a286fc05
install eyaml on Ubuntu from 18.04 and Debian from version 10 2023-02-03 15:40:15 +01:00
Fredrik Thulin e08346aa30
cleanup, use stamp-file, only run on old OS versions 2023-02-03 15:39:49 +01:00
Micke Nordin ba1e40ffd3
Merge pull request #31 from theseal/wrap-cosmos
Wrap cosmos
2023-02-02 13:01:02 +01:00
Johan Wassberg 84b29e4eaa
Executable 2023-02-02 11:49:10 +01:00
Johan Wassberg 49ba964897
Wrap cosmos in scriptherder if available
nunoc-ops and others has been doing this for ages by just modifing the cron
file.
2023-02-02 11:45:54 +01:00
Patrik Lundin e212b6f56f
Support master branch being renamed to main
Fixes:
```
70run-post-tasks: invoking /var/cache/cosmos/model/post-tasks.d/018packages
Your configuration specifies to merge with the ref 'refs/heads/master'
from the remote, but no such ref was fetched.
```
2023-01-31 08:52:28 +01:00
Fredrik Thulin 4601e0bf08
make sure we get clean checkouts 2023-01-30 14:56:15 +01:00
Leif Johansson d604d2fab5
set no-protection on the private key 2023-01-30 12:07:33 +01:00
Johan Wassberg bc17ee1354
Don't confuse containers to connect to them self
When the hostname pointed to loopback the containers tried to connect to them
self instead of the host.
2023-01-24 10:01:59 +01:00
Fredrik Thulin 715105aadb
add documentation for dynamically generated cosmos-modules.conf 2023-01-19 17:56:51 +01:00
Fredrik Thulin c3c6171f96
modules, not models 2023-01-19 17:30:18 +01:00
Fredrik Thulin e2e394a9af
generate /etc/puppet/cosmos-modules.conf dynamically 2023-01-19 17:19:42 +01:00
Johan Wassberg fb4849a0df
Use puppet that comes with OS
nunoc-ops does like this since 2018 so I think it will fly.

Also the package `puppet` seems to been around since at-least Ubuntu 14.04.
2023-01-17 13:53:13 +01:00
Patrik Lundin 68d0083557
Make overlay permission script global
This will make sure /root has proper permissions on our machines.
2022-12-05 15:02:37 +01:00
Patrik Lundin 3ef4e47ff6
Handle multiple versions of cosmos .deb
Before this change there was a need to keep addhost and
bootstrap-cosmos.sh in sync regarding what version of the cosmos deb to
scp over and later run.

Now we find the latest version as decided by `sort -V` in both addhost
and bootstrap-cosmos.sh.

Solution discussed with @fredrikt.
2022-11-15 18:26:36 +01:00
Patrik Lundin 020b8fe34c
Enable "set -e" again
Good idea to fail when unexpected things go wrong. Additional fixes
added to the script to not stop where we can expect a non-zero return
code.

Requested by @fredrikt who also reviewed the patch before going in,
thanks!
2022-10-12 16:47:20 +02:00
Patrik Lundin c55e5535a2
Add gpg to cosmos bootstrap script
Without this Debian 11 fails to bootstrap:
```
/etc/cosmos/gpg.d/50gpg: 36: gpg: not found
```
2022-10-10 17:27:15 +02:00
Linus Nordberg 0692cabba3
Remove that '.novalocal' line in /etc/hosts, added by cloud-init
It messes up `hostname -f` on Debian, even if there's a correct line
further down in /etc/hosts.
2022-10-10 17:26:56 +02:00
John Van de Meulebrouck Brendgard 3b80ba32c7
Set manage_etc_hosts to false for cloudimage based hosts
this is needed so that our changes in /etc/hosts
are not overwritten.
2022-10-10 17:26:45 +02:00
Fredrik Thulin b2272d409f
free-hand updates from eduid-ops 2022-10-10 17:26:18 +02:00
Leif Johansson 19304f2d79
short hostname i /etc/hosts 2022-10-10 17:23:39 +02:00
Leif Johansson 378dfe04fa
try very hard to find git 2022-10-10 17:23:23 +02:00