Episode 407
Old School Outages
July 10th, 2019
42 mins 31 secs
Tags
About this Episode
Jim shares his Nagios tips and Wes chimes in with some modern tools as we chat monitoring in the wake of some high-profile outages.
Plus we turn our eye to hardware and get excited about the latest Ryzen line from AMD.
Episode Links
- Third parties confirm AMD’s outstanding Ryzen 3000 numbers | Ars Technica — AMD debuted its new Ryzen 3000 desktop CPU line a few weeks ago at E3, and it looked fantastic. For the first time in 20 years, it looked like AMD could go head to head with Intel's desktop CPU line-up across the board. The question: would independent, third-party testing back up AMD's assertions?
- The Internet broke today: Facebook, Verizon, and more see major outages | Ars Technica — Last week, Verizon caused a major BGP misroute that took large chunks of the Internet, including CDN company Cloudflare, partially down for a day. This week, the rest of the Internet has apparently asked Verizon to hold its beer.
- It was a really bad month for the internet | TechCrunch — In the past month there were several major internet outages affecting millions of users across the world. Sites buckled, services broke, images wouldn’t load, direct messages ground to a halt and calendars and email were unavailable for hours at a time.
- Cloudflare outage caused by bad software deploy (updated) — For about 30 minutes today, visitors to Cloudflare sites received 502 errors caused by a massive spike in CPU utilization on our network. This CPU spike was caused by a bad software deploy that was rolled back.
- How Verizon and a BGP Optimizer Knocked Large Parts of the Internet Offline Today — Today at 10:30UTC, the Internet had a small heart attack. A small company in Northern Pennsylvania became a preferred path of many Internet routes through Verizon (AS701), a major Internet transit provider.
- Getting started | Prometheus — This guide is a "Hello World"-style tutorial which shows how to install, configure, and use Prometheus in a simple example setup.
- prometheus/node_exporter — Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.
- Using netdata with Prometheus — Prometheus is a distributed monitoring system which offers a very simple setup along with a robust data model. Recently netdata added support for Prometheus.
- prometheus/nagios_plugins — Nagios plugin for alerting on prometheus query results.
- RobustPerception/nrpe_exporter — The NRPE exporter exposes metrics on commands sent to a running NRPE daemon.
- m-lab/prometheus-nagios-exporter — The Prometheus Nagios exporter reads status and performance data from nagios plugins via the MK Livestatus Nagios plugin and publishes this in a form that can be scrapped by Prometheus.
- Comparison to alternatives | Prometheus — Prometheus is a full monitoring and trending system that includes built-in and active scraping, storing, querying, graphing, and alerting based on time series data.
- Quality server monitoring solution using NetData/Prometheus/Grafana — I’m going to quickly show you how to install both netdata and Prometheus on the client and server. We can then use grafana pointed at Prometheus to obtain long-term metrics netdata offers.
- Monitoring stack by using Grafana + Prometheus + Netdata — This monitoring stack you can monitoring in real-time by Netdata and see the history by using Grafana.
- Monitoring Agent · NCPA — New to NCPA? See some of the awesome features present in the Web GUI and API, available on any operating system.
- Nagios 101: Understanding the Fundamentals - Nagios
- Nagios Documentation