Skip to content

support per-drive crit/warning thresholds by adding metrics when needed#374

Open
inqrphl wants to merge 2 commits into
mainfrom
check-drivesize-separate-perfdata
Open

support per-drive crit/warning thresholds by adding metrics when needed#374
inqrphl wants to merge 2 commits into
mainfrom
check-drivesize-separate-perfdata

Conversation

@inqrphl
Copy link
Copy Markdown
Contributor

@inqrphl inqrphl commented May 8, 2026

By default, snclient does not add unnecessary metrics if they do not occur in a condition. This is done by checking the operand of conditions using check.HasThreshold() function

Adding used_pct adds metrics per drive: '[drive] used' and '[drive] used %' , but if it is not present these metrics will be missing.

./snclient -vvv --logfile stdout run check_drivesize "drive=/" "warn=used_pct gt 50" show-all
OK - / 428.610 GiB/935.929 GiB (45.8%) |'/ used'=460216111104B;502473211904;904451781427;0;1004946423808 '/ used %'=45.8%;50;90;0;100

But adding a bare warn='used_pct gt 90' would affect all drives. To check multiple drives while specifying different thresholds for each drive, we need to add the percentage usage metrics. Metrics are also checked when finalizing the check, and can influence the final state.

./snclient -vvv --logfile stdout run check_drivesize "drive=/" "drive=/tmp" "warn='/ used %' gt 30" "crit='/tmp used %' gt 66" show-all
WARNING - / 428.867 GiB/935.929 GiB (45.8%), /tmp 961.945 MiB/31.127 GiB (3.0%) |'/ used'=460492066816B;;;0;1004946423808 '/ used %'=45.8%;30;;0;100 '/tmp used'=1008672768B;;;0;33422544896 '/tmp used %'=3%;;66;0;100

Detect conditions where the operand is named '[drive] used %', if there is a condition using that as operator, add usage metrics for that drive as well. This only works on that drive, and since the operand '[drive] used %' is different for each drive, it wont effect other drives perfdata.

Ahmet Oeztuerk added 2 commits May 8, 2026 16:34
By default, snclient does not add unnecessary metrics if they do not occur in a condition. This is done by checking the operand of conditions using check.HasThreshold() function

Adding used_pct adds metrics per drive: '<drive> used' and '<drive> used %' , but if it is not present these metrics will be missing.
```
./snclient -vvv --logfile stdout run check_drivesize "drive=/" "warn=used_pct gt 50" show-all
OK - / 428.610 GiB/935.929 GiB (45.8%) |'/ used'=460216111104B;502473211904;904451781427;0;1004946423808 '/ used %'=45.8%;50;90;0;100
```

But adding a bare warn='used_pct gt 90' would affect all drives. To check multiple drives while specifying different thresholds for each drive, we need to add the percentage usage metrics. Metrics are also checked when building finalizing the check, and can influence the final state.

```
./snclient -vvv --logfile stdout run check_drivesize "drive=/" "drive=/tmp" "warn='/ used %' gt 30" "crit='/tmp used %' gt 66" show-all
WARNING - / 428.867 GiB/935.929 GiB (45.8%), /tmp 961.945 MiB/31.127 GiB (3.0%) |'/ used'=460492066816B;;;0;1004946423808 '/ used %'=45.8%;30;;0;100 '/tmp used'=1008672768B;;;0;33422544896 '/tmp used %'=3%;;66;0;100
```

Detect conditions where the operand is named '<drive> used %', if there is a condition using that as operator, add usage metrics for that drive as well. This only works on that drive, and since the operand '<drive> used %' is different for each drive, it wont effect other drives perfdata.
these conditions have their keyword transformed to '<drive> used %' so that the metric name matches the condition name

in condition.String() , check if the keyword is in the original, if it isnt, its likely changed. print it out separately.
@inqrphl
Copy link
Copy Markdown
Contributor Author

inqrphl commented May 11, 2026

Also supports ' used_pct' metrics as well. These have their keywords transformed to ' used %' to match what would be the metric name for usage percentages.

As their keywords are converted, and checked just like ' used %' in conditions, they trigger adding metrics for a drive. During the metrics check, they take effect and can raise warning/critical.

In addition, when calling a Condition.String() output, check if the keyword is contained in the original string. If the keyword is not there, likely due to it being transformed prior at some point, append the new keyword to output.

./snclient -vvv --logfile stdout run check_drivesize "drive=/" "drive=/tmp" "warn='/ used_pct' gt 70" "crit='/tmp used_pct' gt 1" show-all
...
[10:53:12.563][D][checkdata:154] condition  warning: (original: '/ used_pct' gt 70 | keyword: / used %)
[10:53:12.563][D][checkdata:155] condition critical: (original: '/tmp used_pct' gt 1 | keyword: /tmp used %)
...
[10:53:12.563][D][checkdata:500] metric.Name: '/tmp used %', metric.ThresholdName: '', metric.Value: '3.1', gave non-ok state: CRITICAL
...
CRITICAL - / 421.166 GiB/935.929 GiB (45.0%), /tmp 993.805 MiB/31.127 GiB (3.1%) |'/ used'=452223725568B;;;0;1004946423808 '/ used %'=45%;70;;0;100 '/tmp used'=1042079744B;;;0;33422544896 '/tmp used %'=3.1%;;1;0;100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant