Commit graph

93 commits

Author SHA1 Message Date
Kristofer Hallin c714f7e2d1
nunoc::dockerhost -> sunet::dockerhost 2024-09-30 14:23:33 +02:00
Kristofer Hallin fa953f2bfc
Added karchive related manifest etc. 2024-09-30 14:22:45 +02:00
Patrik Holmqvist 0da54deef3
Fix typo 2024-09-27 13:14:10 +02:00
Patrik Holmqvist b4fb2a069b
Add missing import in setup_cosmos_modules 2024-09-27 13:12:46 +02:00
Patrik Holmqvist 2762d5ba3b
Run custom tag on metrics-cd-test-1 2024-09-27 13:11:15 +02:00
Patrik Holmqvist 9ff74eb056
Migrate metrics-cd-test-1 from nunoc-ops 2024-09-27 12:44:22 +02:00
Patrik Holmqvist d653651639
Add jonas ssh-key 2024-09-27 10:48:48 +02:00
Patrik Holmqvist 9e9a684514
Add nrpe class to cosmos-rules 2024-09-26 15:23:15 +02:00
Patrik Holmqvist b3215b1001
Remove key-id with no matching ssh-key 2024-09-26 14:28:43 +02:00
Patrik Holmqvist 27ae765fa8
Try adding nrpe class 2024-09-26 14:25:45 +02:00
Patrik Holmqvist ac248a7c4e
Remove host without class from cosmos-rules 2024-09-26 12:42:56 +02:00
Patrik Holmqvist 26417fcec3
Remove example code from cosmos-rules 2024-09-26 12:36:09 +02:00
Patrik Holmqvist 8a5dbe846e
ntpd class does not work on ubuntu24 2024-09-26 12:17:18 +02:00
Patrik Holmqvist bfe476f046
Add sunet::server conf to cosmos-rules 2024-09-26 12:13:33 +02:00
Dennis Wallberg 1a7d465cf4
renamed 2024-09-26 11:36:19 +02:00
Dennis Wallberg 03d8db76e8
renamed init class net -> netops and added generic rule for all sunet.se hosts 2024-09-26 11:27:57 +02:00
Patrik Holmqvist 3c0cc66394
Add karchive to cosmos-rules 2024-09-26 11:24:47 +02:00
Dennis Wallberg 41987ebf3b
renamed ssh key 2024-09-26 11:20:36 +02:00
Patrik Holmqvist ff1caf2c86
Add pahol-test4 to test migr from nunoc-ops 2024-09-25 15:17:33 +02:00
Dennis Wallberg 3e718a06e9
added some dubious users... 2024-09-25 14:59:18 +02:00
Dennis Wallberg 9bc5f77b73
added Kristofer 2024-09-25 14:54:22 +02:00
Patrik Holmqvist a1166db047
Add pahol ssh-key 2024-09-25 13:23:32 +02:00
Patrik Holmqvist 407e1fc280
Remove ubuntu22 exceptions, try manual module install 2024-09-25 12:49:39 +02:00
Patrik Holmqvist bb391413a8
Fix syntax in setup_cosmos_modules 2024-09-25 12:17:12 +02:00
Joao Paulo Oliveira de Araujo Rangel Pamplona b9fcddb344
added gpg keys for mifr, masv and kano 2024-09-25 11:52:04 +02:00
Joao Paulo Oliveira de Araujo Rangel Pamplona 75e48a4af9
added dennis gpg keys 2024-09-25 11:34:28 +02:00
Patrik Holmqvist ca31a0c6c0
Only add legacy sunet forked modules on ubuntu22 2024-09-24 16:34:02 +02:00
Joao Paulo Oliveira de Araujo Rangel Pamplona 9cc98cea71
added pahols gpg key 2024-09-24 15:02:52 +02:00
Joao Paulo Oliveira de Araujo Rangel Pamplona 661730027c
added jonas gpg key 2024-09-24 14:15:27 +02:00
Benedith Mulongo 454611b2b3
Add jokar GPG key 2024-09-23 22:13:08 +02:00
Benedith Mulongo 193ebc664f
Change file mod 2024-09-23 22:10:19 +02:00
Benedith Mulongo a31a71b33d
Change to common class from sunet 2024-09-23 16:01:56 +02:00
Benedith Mulongo 29bc308915
Test ops-repo on test-das-federator host 2024-09-23 14:31:47 +02:00
Benedith Mulongo e12eb05891
Add GPG keys & remove wrong ssh-key from common.yaml 2024-09-20 15:13:33 +02:00
Benedith Mulongo a2f2d2ee70
Add init.pp & common.yaml for ssh 2024-09-20 15:00:35 +02:00
Benedith Mulongo de5efe8aa7
Add setup_cosmos_modules to the repo 2024-09-20 14:06:44 +02:00
Patrik Lundin aa88795ee0
sunet-fleetlock: also handle ReadTimeout
Turns out this was not caught by ConnectionError.
2024-07-03 14:13:22 +02:00
Patrik Lundin 01768129f0
fleetlock: configurable lock/unlock timeout
While we already support setting a healthcheck timeout it probably
makes sense to be able to control how long we wait for a
fleetlock_lock() or fleetlock_unlock() call. This becomes important if
only running cosmos once a night or something like that. In that case we
you probably want to give a physical machine more than than 1 minute to
complete a reboot etc.

This can now be controlled by setting fleetlock_lock_timeout and
fleetlock_unlock_timeout in /etc/run-cosmos-fleetlock-conf. Keep in mind
that while it can make sense to increase the time for taking a lock,
releasing a lock should always be fast (either you have it and release
it, or you dont have it and it is a no-op) so setting a long unlock
timeout should probably never be done.

Since we also potentially wait the unlock timeout at boot (if the
fleetlock server is broken etc) that is another reason to keep it
short. The default 1m is probably OK for most uses.
2024-07-03 13:27:52 +02:00
Patrik Holmqvist 4231b4ac1d
Migrate from legacy fact
This did not work on modern puppet in ubuntu24:
Warning: Interpolation failed with '::lsbdistcodename', but compilation continuing;
New syntax inspiration from:
https://www.puppet.com/docs/puppet/8/hiera_config_yaml_5#configuring_hiera
2024-06-19 14:07:13 +02:00
Patrik Lundin e315282bc5
Use more strict exception checking
This is probably wide enough and we do not need weird extra handling of
our own execption etc.

Thanks to @mickenordin for keeping me honest :).
2024-06-17 12:40:12 +02:00
Patrik Lundin 4b8b8887f6
sunet-fleetlock: handle connection errors
In order to handle upgrades of the fleetlock server when running only
one server we need to handle connection errors like connection refused
or timed out errors gracefully.

Because there are several different ways the connection can fail and it
is hard to keep track of them all, just catch everything. We then also
need special handling of our own timeout execption so we are not
accidentally stuck retrying forever.

Also fix so we actually use the request_timeout arg for individual HTTP
requests instead of the global timeout.

While here run isort to keep imports tidy.
2024-06-17 12:07:22 +02:00
Johan Wassberg c72f5ccd86
Allow for hosts without class(s) 2024-04-12 15:32:40 +02:00
Patrik Lundin df5558befb
Fix another indentation mismatch 2024-01-24 15:36:52 +01:00
Patrik Lundin 4b93d9c426
run-cosmos: support fleetlock unlocking at boot
This extends run-cosmos with a new argument that calls the unlock
function already included in the script as well as using the already
existing lock() function to make sure there is no race between the
bootup process and cron starting a normal run-cosmos process at the same
time.

The oexit() function is added to support exiting with a OK exit value
the same way eexit() is used to signal something is wrong.

This change also adds the systemd unit file that runs run-cosmos with the
new fleetlock-unlock argument at boot if fleetlock is configured.

While here fix indentation that was mixed between 3 and 4 spaces: it is
now 4 spaces everywhere.
2024-01-24 15:36:34 +01:00
Patrik Lundin 7baf9affb1
Add fleetlock support to run-cosmos
Makes run-cosmos request a fleetlock lock before running cosmos "update"
and "apply" steps. This is helpful for making sure only one (or several)
machine out of some set of machines runs cosmos changes at a time. This
way if cosmos (or puppet) decides that a service needs to be restarted
this will only happen on a subset of machines at a time. When the cosmos
"apply" is done a fleetlock unlock request will be performed so the
other machines can progress.

The unlock code in run-cosmos will also run the new tool
sunet-machine-healthy to decide things are good before unlocking. This
way if a restarted service breaks this will stop the unlock attempt
and in turn make it so the others should not break their service as
well, giving an operator time to figure out what is wrong.
2023-06-17 08:10:00 +02:00
Johan Wassberg cf2e6b6518
File provided by Sunet::Dockerhost 2023-04-04 15:21:15 +02:00
Johan Wassberg 5af8093338
Add support for eyaml in Hiera
And at the same time remove support for gpg.

The modern version of the configuration (v5) has been tested with 20.04 but
might work with older dists.
2023-02-16 07:44:37 +01:00
Fredrik Thulin c400bba97d
remove 'make db'
The db-file, essentially providing reverse lookup of classes to host
names, is only used by some Nagios configuration instances and causes
continuing operational headaches in those ops-repos.

It should be kept/refactored to only apply to the monitoring hosts in
the cases where it is used, but we don't want any new ops-repos to use
it hence it should be removed from upstream multiverse.
2023-02-07 14:21:29 +01:00
Fredrik Thulin 12b2412ea7
run cron at boot too, to e.g. get new firewall rules installed 2023-02-06 17:12:01 +01:00
Fredrik Thulin 79606f2a6d
check for /etc/no-automatic-cosmos in the wrapper, and allow arguments to be passed 2023-02-06 16:47:41 +01:00