Commit graph

51 commits

Author SHA1 Message Date
3b571215ad
removed 'echo..' 2025-01-29 20:15:26 +01:00
70008219d3
fix for SC-2587 & comments 2025-01-29 20:12:35 +01:00
79b3fba04a
Merge remote-tracking branch 'multiverse/main' into multiverse-2025-01-29 2025-01-29 10:18:59 +01:00
7b4de81cbd
Remove some trailing whitespace 2024-11-21 14:33:28 +01:00
aa88795ee0
sunet-fleetlock: also handle ReadTimeout
Turns out this was not caught by ConnectionError.
2024-07-03 14:13:22 +02:00
01768129f0
fleetlock: configurable lock/unlock timeout
While we already support setting a healthcheck timeout it probably
makes sense to be able to control how long we wait for a
fleetlock_lock() or fleetlock_unlock() call. This becomes important if
only running cosmos once a night or something like that. In that case we
you probably want to give a physical machine more than than 1 minute to
complete a reboot etc.

This can now be controlled by setting fleetlock_lock_timeout and
fleetlock_unlock_timeout in /etc/run-cosmos-fleetlock-conf. Keep in mind
that while it can make sense to increase the time for taking a lock,
releasing a lock should always be fast (either you have it and release
it, or you dont have it and it is a no-op) so setting a long unlock
timeout should probably never be done.

Since we also potentially wait the unlock timeout at boot (if the
fleetlock server is broken etc) that is another reason to keep it
short. The default 1m is probably OK for most uses.
2024-07-03 13:27:52 +02:00
e315282bc5
Use more strict exception checking
This is probably wide enough and we do not need weird extra handling of
our own execption etc.

Thanks to @mickenordin for keeping me honest :).
2024-06-17 12:40:12 +02:00
4b8b8887f6
sunet-fleetlock: handle connection errors
In order to handle upgrades of the fleetlock server when running only
one server we need to handle connection errors like connection refused
or timed out errors gracefully.

Because there are several different ways the connection can fail and it
is hard to keep track of them all, just catch everything. We then also
need special handling of our own timeout execption so we are not
accidentally stuck retrying forever.

Also fix so we actually use the request_timeout arg for individual HTTP
requests instead of the global timeout.

While here run isort to keep imports tidy.
2024-06-17 12:07:22 +02:00
df5558befb
Fix another indentation mismatch 2024-01-24 15:36:52 +01:00
4b93d9c426
run-cosmos: support fleetlock unlocking at boot
This extends run-cosmos with a new argument that calls the unlock
function already included in the script as well as using the already
existing lock() function to make sure there is no race between the
bootup process and cron starting a normal run-cosmos process at the same
time.

The oexit() function is added to support exiting with a OK exit value
the same way eexit() is used to signal something is wrong.

This change also adds the systemd unit file that runs run-cosmos with the
new fleetlock-unlock argument at boot if fleetlock is configured.

While here fix indentation that was mixed between 3 and 4 spaces: it is
now 4 spaces everywhere.
2024-01-24 15:36:34 +01:00
2e3de6887c
Merge remote-tracking branch 'multiverse/main' into multiverse 2024-01-10 09:52:26 +01:00
88412797d0
fixed a mistake 2024-01-08 18:56:09 +01:00
0e7da40083
did similar for the script for QA 2024-01-08 18:54:26 +01:00
69f1acada9
added logic to check if execution of tests for more than two countries failed 2024-01-08 17:17:37 +01:00
39d6f5dc21
updated scriherder and python3 fixes 2023-07-05 11:43:59 +02:00
7baf9affb1
Add fleetlock support to run-cosmos
Makes run-cosmos request a fleetlock lock before running cosmos "update"
and "apply" steps. This is helpful for making sure only one (or several)
machine out of some set of machines runs cosmos changes at a time. This
way if cosmos (or puppet) decides that a service needs to be restarted
this will only happen on a subset of machines at a time. When the cosmos
"apply" is done a fleetlock unlock request will be performed so the
other machines can progress.

The unlock code in run-cosmos will also run the new tool
sunet-machine-healthy to decide things are good before unlocking. This
way if a restarted service breaks this will stop the unlock attempt
and in turn make it so the others should not break their service as
well, giving an operator time to figure out what is wrong.
2023-06-17 08:10:00 +02:00
79606f2a6d
check for /etc/no-automatic-cosmos in the wrapper, and allow arguments to be passed 2023-02-06 16:47:41 +01:00
3988f5beb0
shellcheck fixes 2023-02-06 16:47:30 +01:00
84b29e4eaa
Executable 2023-02-02 11:49:10 +01:00
49ba964897
Wrap cosmos in scriptherder if available
nunoc-ops and others has been doing this for ages by just modifing the cron
file.
2023-02-02 11:45:54 +01:00
cfeed88b6e
impvored nagios check scripts 2022-11-21 17:37:25 +01:00
a567d569b7
Changing the script 2022-11-11 19:37:23 +01:00
b0031a7f4d
changes in the script 2022-11-11 19:28:39 +01:00
e266fb5389
fixed scripts 2022-11-10 13:43:18 +01:00
2af8478ae1
forgot to add the command, now did 2022-11-10 12:17:43 +01:00
6cbfd488af
add new eidastest check
Updated QA check
2022-11-10 11:59:43 +01:00
714b1f7718
fixed name of the file 2022-10-04 15:17:20 +02:00
19cb5996b1
nagios check for eidastest
Ref: SC-628
2022-10-04 14:53:09 +02:00
Maria Haider
47f345a729
added the file in wrong repo 2020-09-09 12:39:30 +02:00
Maria Haider
142a1f1d73
made the nagios alarm scrip executable 2020-09-09 12:38:43 +02:00
Maria Haider
3b3091dd64
new nagios alarm script 2020-09-09 12:31:53 +02:00
Maria Haider
c1c646c927
added new check in nagios 2020-07-07 18:07:18 +02:00
Maria Haider
3abc518ec9
added script for a new nagios check 2020-07-07 17:18:41 +02:00
Maria Haider
1beafaef78
new nagios check 2020-06-02 15:57:09 +02:00
Maria Haider
a49391b915
Changed script nagios check for checking countires in eIDAS 2020-06-02 14:08:23 +02:00
5e5d8f5be2
country monitor script 2020-03-30 08:54:42 +02:00
dc65ae72e2 daily security report 2018-11-01 10:19:13 +01:00
824155e7c9 fixar 2018-11-01 09:49:23 +01:00
2bb36b6d81 added security reporting tool from Jonas Lejon 2018-11-01 09:28:15 +01:00
ff7b648324 typo 2018-02-13 09:59:31 +01:00
7f069d6d34 exec perm 2018-02-13 09:57:34 +01:00
252ecb4a15 eidas health check binary 2018-02-13 09:33:20 +01:00
04a5ca91c4 eidas health check binary 2018-02-13 09:32:57 +01:00
1aef1b6e43 scripthearder 2017-11-13 03:41:21 +01:00
4972eefdcc ny run-cosmos 2017-02-02 15:37:32 +01:00
8dc875a185 new run-cosmos 2017-02-02 11:43:36 +01:00
Fredrik Thulin
1f8733559b Merge pull request #1 from Gijutsu/master
Updated documentation and conf ...
2016-09-01 14:16:19 +02:00
John Van de Meulebrouck Brendgard
f6fe928590
new upstream release of cosmos that includes ln5 fixes for https remotes
along with a verified version of puppetlabs-release-trusty.deb
2016-08-28 21:22:48 +02:00
John Van de Meulebrouck Brendgard
f939c526e6
Changed tag from eduid-cosmos to the more generic cosmos-ops 2016-08-27 17:05:11 +02:00
e69b0f84f8 init 2015-02-23 16:02:43 +01:00