Commit graph

230 commits

Author SHA1 Message Date
Micke Nordin 763822385a
internal-dco-test-k8sw-1.streams.sunet.se added 2024-10-11 17:39:55 +02:00
Micke Nordin 0d70187c38
Further bootstrap work 2024-10-11 17:39:54 +02:00
Micke Nordin 39bff16c74
Initial trust 2024-10-11 17:39:53 +02:00
Micke Nordin 20ddc3c257
Start on cosmos-rules 2024-10-11 17:39:52 +02:00
Patrik Holmqvist 028ba3d608
Merge pull request #56 from SUNET/pahol-fix-noble-eyaml
patch for broken eyaml in ubuntu24.04.
2024-09-10 13:16:19 +02:00
Patrik Holmqvist 7941e3f970
Merge the 2 patch functions to 1. 2024-09-09 17:29:31 +02:00
Patrik Holmqvist fac9a556ba
Patch for broken eyaml in ubuntu24.04. 2024-09-09 16:52:38 +02:00
Patrik Lundin 770a5ca3cc
Merge pull request #55 from SUNET/patlu-fleetlock-lock-timeouts
fleetlock: configurable lock/unlock timeout
2024-07-04 13:07:34 +02:00
Patrik Lundin aa88795ee0
sunet-fleetlock: also handle ReadTimeout
Turns out this was not caught by ConnectionError.
2024-07-03 14:13:22 +02:00
Patrik Lundin 01768129f0
fleetlock: configurable lock/unlock timeout
While we already support setting a healthcheck timeout it probably
makes sense to be able to control how long we wait for a
fleetlock_lock() or fleetlock_unlock() call. This becomes important if
only running cosmos once a night or something like that. In that case we
you probably want to give a physical machine more than than 1 minute to
complete a reboot etc.

This can now be controlled by setting fleetlock_lock_timeout and
fleetlock_unlock_timeout in /etc/run-cosmos-fleetlock-conf. Keep in mind
that while it can make sense to increase the time for taking a lock,
releasing a lock should always be fast (either you have it and release
it, or you dont have it and it is a no-op) so setting a long unlock
timeout should probably never be done.

Since we also potentially wait the unlock timeout at boot (if the
fleetlock server is broken etc) that is another reason to keep it
short. The default 1m is probably OK for most uses.
2024-07-03 13:27:52 +02:00
Patrik Lundin 443611dd3f
Merge pull request #49 from SUNET/john-permissions-fix
Enforce more strict permissions for files in Cosmos
2024-07-03 11:36:21 +02:00
Johan Wassberg 5518048d79
Merge pull request #54 from SUNET/pahol-ubuntu24
Ubuntu-24 fixes
2024-06-19 15:07:17 +02:00
Patrik Holmqvist 4231b4ac1d
Migrate from legacy fact
This did not work on modern puppet in ubuntu24:
Warning: Interpolation failed with '::lsbdistcodename', but compilation continuing;
New syntax inspiration from:
https://www.puppet.com/docs/puppet/8/hiera_config_yaml_5#configuring_hiera
2024-06-19 14:07:13 +02:00
Patrik Holmqvist bc9d1dc960
Use upstream puppet modules for ubuntu24+.
This is how we do it in modern debian so it
makes sense to do it on modern ubuntu as well.
2024-06-19 14:02:24 +02:00
Patrik Lundin 5d88e66379
Merge pull request #53 from SUNET/patlu-fleetlock-error-handling
sunet-fleetlock: handle connection errors
2024-06-17 13:27:11 +02:00
Patrik Lundin e315282bc5
Use more strict exception checking
This is probably wide enough and we do not need weird extra handling of
our own execption etc.

Thanks to @mickenordin for keeping me honest :).
2024-06-17 12:40:12 +02:00
Patrik Lundin 4b8b8887f6
sunet-fleetlock: handle connection errors
In order to handle upgrades of the fleetlock server when running only
one server we need to handle connection errors like connection refused
or timed out errors gracefully.

Because there are several different ways the connection can fail and it
is hard to keep track of them all, just catch everything. We then also
need special handling of our own timeout execption so we are not
accidentally stuck retrying forever.

Also fix so we actually use the request_timeout arg for individual HTTP
requests instead of the global timeout.

While here run isort to keep imports tidy.
2024-06-17 12:07:22 +02:00
Johan Wassberg 646c40daf1
Merge pull request #52 from SUNET/jocar-allow-empty-hosts
Allow empty hosts
2024-04-15 11:43:21 +02:00
Johan Wassberg c72f5ccd86
Allow for hosts without class(s) 2024-04-12 15:32:40 +02:00
Micke Nordin b39960484f
Merge pull request #51 from SUNET/patlu-run-cosmos-fleetlock
run-cosmos: support fleetlock unlocking at reboot
2024-01-25 13:23:27 +01:00
Patrik Lundin df5558befb
Fix another indentation mismatch 2024-01-24 15:36:52 +01:00
Patrik Lundin 4b93d9c426
run-cosmos: support fleetlock unlocking at boot
This extends run-cosmos with a new argument that calls the unlock
function already included in the script as well as using the already
existing lock() function to make sure there is no race between the
bootup process and cron starting a normal run-cosmos process at the same
time.

The oexit() function is added to support exiting with a OK exit value
the same way eexit() is used to signal something is wrong.

This change also adds the systemd unit file that runs run-cosmos with the
new fleetlock-unlock argument at boot if fleetlock is configured.

While here fix indentation that was mixed between 3 and 4 spaces: it is
now 4 spaces everywhere.
2024-01-24 15:36:34 +01:00
Micke Nordin cacb97a22c
Allow running of bumptag with out signed commits or tags
By setting ALLOW_UNSIGNED_COMMITS_WITHOUT_TAGS you can bootstrap bumptag
on first startup of new repo
2023-12-04 14:24:34 +01:00
Johan Wassberg ecedda68e3
Merge pull request #50 from SUNET/kano-sshproxyjump
PREPARE/ADDHOST: allow the ues of proxyjump with ip address
2023-11-29 12:53:05 +01:00
Micke Nordin 71e112e009
PREPARE/ADDHOST: allow the ues of proxyjump with ip address
With this patch you can specify a ProxyJump for prepare-iaas-ubuntu,
prepare-iaas-debian and addhost. Example:

./prepare-iaas-debian 89.47.191.7 hj
./addhost -b -n node1.extern.drive.test.sunet.se -p hj -- 89.47.191.7

where hj is a host defined in my .ssh/config suitable for a proxyjump
to the host in question.

This makes it easier to use ip addresses for these scripts which might
be neccessary if dns takes a while to propagate.
2023-11-29 12:10:34 +01:00
John Van de Meulebrouck Brendgard 8d4ce2d1b7
Make sure that COSMOS_BASE is only readable
by root since it's possible that the directory
can contain files that after applying the
overlay to / only should be read or writable
by root.
2023-11-17 15:03:47 +01:00
John Van de Meulebrouck Brendgard 75e566ab61
Make sure that /root in overlay is owned by root
as well as that /root/.ssh and its content is
only owned and readable by root. This is redundant
if the previous permissions were properly applied
and no other changes have been made by the user
or something else, but is added for good measure
as a layered defense.
2023-11-17 14:58:51 +01:00
John Van de Meulebrouck Brendgard ca353ed406
Set same permissions for /root/.ssh/authorized_keys
in post-tasks.d/010fix-ssh-perms as is done by
Puppet with sunet::ssh_keys.
2023-11-17 13:50:02 +01:00
Micke Nordin 1bd6524ad3
Merge pull request #48 from SUNET/john-bump-tag-from-nunoc-ops
Merge of improved bump-tag from nunoc-ops
2023-11-16 13:55:49 +01:00
John Van de Meulebrouck Brendgard 21c0cad8a0
Consistently use [[ for if statements. 2023-11-16 12:12:36 +01:00
John Van de Meulebrouck Brendgard dc1df6671c
Shellcheck needs to have the PAGER quoted
in order to correctly interpret the meaning
according to it's wiki.
2023-11-16 12:11:09 +01:00
John Van de Meulebrouck Brendgard fd4523308f
Replaced 'egrep' that is now deprecated. 2023-11-16 12:09:02 +01:00
John Van de Meulebrouck Brendgard cb9e1f8670
Added shellcheck exceptions for misplaced warning. 2023-11-16 12:07:10 +01:00
John Van de Meulebrouck Brendgard 5a47b1a3f7
Readded this_branch=$(git rev-parse --abbrev-ref HEAD)
since it wasn't included in change to check
against the current branch instead of master
2023-11-16 12:04:30 +01:00
John Van de Meulebrouck Brendgard 826b8edf82
Changed from [[ ! -z ... to [[ -n ... 2023-11-16 11:59:33 +01:00
John Van de Meulebrouck Brendgard 53c58b413e
Changed from if [[ ${?} ]] to if cmd 2023-11-16 11:56:49 +01:00
John Van de Meulebrouck Brendgard 8a7c85dcf0
Added bump-tag from nunoc-ops that has
multiple improvements and checks for
signed commits, makes sure that important
script are not tampered with and much more.
2023-11-15 14:02:49 +01:00
Micke Nordin 083d6eda83 bump-tag: Compare against current branch
Mariah pointed out that this was lost in:

6ac9294dea

And should be reinstated
2023-11-15 12:15:46 +01:00
Micke Nordin 8a947ffa28
Merge pull request #46 from SUNET/jocar-bookworm-depends
Bookworm depends
2023-11-14 15:34:44 +01:00
Johan Wassberg a6a67d355f
Diffable 2023-11-14 15:28:46 +01:00
Johan Wassberg 120c4a5a93
A few more depends for Bookworm 2023-11-14 15:27:45 +01:00
Micke Nordin 9b6322c2ec
Merge pull request #45 from SUNET/jocar-bookwork-fix
Bookwork image runs netplan
2023-10-16 11:03:54 +02:00
Johan Wassberg 69377631a8
Bookwork image runs netplan 2023-10-16 09:25:57 +02:00
Micke Nordin b057f0aeda
Merge pull request #44 from SUNET/jocar-dont-install-x11
Don't install x11
2023-10-02 12:58:30 +02:00
Johan Wassberg 58a9ca7aa9
No need of x11 on our servers 2023-10-02 12:39:44 +02:00
Micke Nordin 6ac9294dea A newer bump-tag 2023-07-13 08:53:29 +02:00
Micke Nordin 3aac1f97d8
Add additional packages for use with debian 12
This patch will install three packages that is needed for normal operations of puppet using puppet-sunet with multiverse on Debian 12:

cron puppet-module-puppetlabs-cron-core puppet-module-camptocamp-augeas
2023-07-10 16:32:20 +02:00
Micke Nordin aebaccd5ba
Multiverse master has been renamed to main, so updating documentation to reflect that 2023-07-03 15:14:52 +02:00
Micke Nordin 9bfac2520b
Merge pull request #40 from SUNET/patlu-fleetlock
Add fleetlock support to run-cosmos
2023-06-21 08:52:24 +02:00
Patrik Lundin 7baf9affb1
Add fleetlock support to run-cosmos
Makes run-cosmos request a fleetlock lock before running cosmos "update"
and "apply" steps. This is helpful for making sure only one (or several)
machine out of some set of machines runs cosmos changes at a time. This
way if cosmos (or puppet) decides that a service needs to be restarted
this will only happen on a subset of machines at a time. When the cosmos
"apply" is done a fleetlock unlock request will be performed so the
other machines can progress.

The unlock code in run-cosmos will also run the new tool
sunet-machine-healthy to decide things are good before unlocking. This
way if a restarted service breaks this will stop the unlock attempt
and in turn make it so the others should not break their service as
well, giving an operator time to figure out what is wrong.
2023-06-17 08:10:00 +02:00