Jump to content

LibreNMS

From Wikitech

LibreNMS is an autodiscovering PHP/MySQL/SNMP based network monitoring which includes support for a wide range of network hardware and operating systems including Cisco, Linux, Juniper, Foundry, and many more.

LibreNMS is a community-based fork of the last GPL-licensed version of Observium.

Service

Currently hosted on netmon1003 and netmon2002.

Replaces Observium which ran on Streber.

  • Software is not installed via Debian package
  • Software installed in: /srv/deployment/librenms/
  • RRD data stored in: /srv/librenms/
  • User creds are stored in MySQL: # grep auth_mechanism /srv/deployment/librenms/librenms/config.php
  • Authentication is done via LDAP

How to

Add a device to LibreNMS

Configure the read only v2c SNMP community on the device

Via webUI:

https://librenms.wikimedia.org/addhost/

And use the device FQDN, keep all the other fields as it (and do not force add it). Note: because of a bug, set port to "161".

The device should be discovered and polled in the next 10min.

Via CLI:

$ ssh librenms.wikimedia.org
$ cd /srv/deployment/librenms/librenms
$ sudo -u librenms ./lnms device:add --v2c -c <snmp_community> <device_fqdn>
Added device <fqdn> (XXX)
$ sudo -u librenms php discovery.php -h <fqdn> && sudo -u librenms php poller.php -h <fqdn>

Upgrade LibreNMS

Updating LibreNMS in our repositories

Let's assume your remote is configured like the following. And we're tracking new versions in different branches.

origin	ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (fetch)
origin	ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (push)
upstream	https://github.com/librenms/librenms.git (fetch)
upstream	https://github.com/librenms/librenms.git (push)
new=<new version>
old=<old version>

git fetch origin
git checkout -b upstream-$old origin/upstream-$old
git fetch upstream
git checkout -b upstream-$new $new

# If you are missing composer: apt install -y composer php-gd
composer install --no-dev --ignore-platform-reqs # (your will be prompted for any missing php requirements)
git add -f vendor
git commit -m "Add composer requirements for LibreNMS $new"

mkdir scap
git checkout upstream-$old -- scap/scap.cfg
git add scap
git commit -m "Add Scap config"

git push origin upstream-$new
WARNING: At this point you should make sure we are not leaving behind "our" changes to the old version. Check if any patches were applied on top of upstream-$old and cherry-pick them on upstream-$new. See for example an occurrence where a LibreNMS upgrade left behind patches: https://phabricator.wikimedia.org/T273716#7430992

Cherry picking commits from upstream-$old into upstream-$new

  1. Check Out the New Branch First, make sure you have the latest version of the LibreNMS repository and checkout the upstream-$new branch where you want to cherry-pick commits. git fetch origin git checkout upstream-$new
  2. List Commits Exclusive to the Old Branch You can use the git log command to list commits that are exclusive to the upstream-$old branch but not in the upstream-$new branch. This helps identify the patches that need to be cherry-picked. git log upstream-$new..upstream-$old This command shows commits from upstream-$old that are not in upstream-$new. Review these commits to decide which ones should be cherry-picked.
  3. Cherry-pick Commits For each commit you want to cherry-pick, use the git cherry-pick command. Suppose you have the commit hashes abc123 and def456 from upstream-$old that you want to apply to upstream-$new. git cherry-pick abc123 def456 If there are conflicts during the cherry-pick, Git will prompt you to resolve them. Open the conflicting files and make the necessary changes. After resolving conflicts, continue the cherry-picking process with: git cherry-pick --continue
  4. Push Changes After successfully cherry-picking the necessary commits, push your changes to the remote repository. git push origin upstream-$new

Updating LibreNMS in production

Backing up the LibreNMS database

On dbprov1002:

cd /etc/wmfbackups
cp backups.cnf librenms-backup.cnf
sed -i '/sections:/,$c\
  librenms:\
    regex: librenms[.]\
    host: '\''db1217.eqiad.wmnet'\''\
    port: 3321
' librenms-backup.cnf
chown dump:dump librenms-backup.cnf
sudo -u dump backup-mariadb --config-file librenms-backup.cnf

On deploy1002:

cd /srv/deployment/librenms/librenms/
git fetch origin
git branch # note the current branch
git checkout upstream-<version>
scap deploy Upgrade LibreNMS to <version> - <task>

Run puppet on netmon* hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet)

cumin O:netmon run-puppet-agent

On the netmon_server (git grep -h netmon_server: hieradata/)

cd /srv/deployment/librenms/librenms
sudo -u librenms ./daily.sh

Rollback

On deploy1002:

cd /srv/deployment/librenms/librenms/
git fetch origin
git checkout <previous branch>
scap deploy Rollback LibreNMS to <version> - <task>

Then run puppet again from cumin host:

cumin O:netmon run-puppet-agent

Check the logs

LibreNMS logs in 4 different locations:

  • /srv/deployment/librenms/librenms/logs/librenms.log
  • /var/log/librenms.log
  • /var/log/librenms/daily.log
  • /var/log/apache2/librenms.wikimedia.org.error.log

It would be great to have the first 3 in a single location.

Mass update PDU alerting thresholds

PDUs have automatically generated thresholds, the query bellow sets sane defaults to eqiad/codfw PDUs. And need to be run when new PDUs are being provisioned.
https://phabricator.wikimedia.org/T247358
https://phabricator.wikimedia.org/T245655

UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'power'
AND sensor_descr like "Phase%"
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 1400
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'current'
AND (sensor_descr like "%Phase%" or sensor_descr like "%Line%" )
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 12
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_descr like "Cord%"
AND sensor_class = 'power'
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 3440

Reduce pooling time

See more details in https://phabricator.wikimedia.org/T346759

In some cases it's possible to reduce the pooling time by increasing the "Max Repeaters" (for items like "bgp-peers") or the "max OIDs" (for items like sensors). This should only be done on a case by case basis, from experience routers with a high latency (far from the LibreNMS hosts).

Features

Interface grouping

LibreNMS can group interfaces based on their description's prefix, for example "Transit:", "Peering:". Which is shown under the "ports" dropdown.

Prefixes not shown in the dropdown are still reachable by editing the URL, for example:

https://librenms.wikimedia.org/iftype/type=transport-tun/

https://librenms.wikimedia.org/iftype/type=transport/

Prometheus push-gateway

Alertmanager integration

Known limitations

  • When failed over to the codfw (backup) instance (see. https://phabricator.wikimedia.org/T247967)
    • Polling time for eqiad devices increased significantly due to the added latency. For the most populated rows (eqiad B and D) this means that occasionally poll times are >5 min, resulting in alerts and potentially missed data
    • librenms web ui got significantly slower (from Europe at least) in part because of the added latency to reach codfw, in part because the database is still in eqiad