You have a server running. You SSH in to check on it. But you don’t want to SSH in every time you wonder “is the disk full?” or “did Docker crash?”.
Monitoring collects that data automatically and puts it on a dashboard. You open a browser, see everything at once, and move on.
What to Monitor
Start with these four things:
- CPU - Is it pegged? That’s a problem.
- Memory - Running out means the kernel starts killing processes.
- Disk - Full disk means services stop writing. Logs, databases, Docker images all grow over time.
- Network - Is the server reachable? How much traffic is passing through?
Add service-specific metrics later (Docker container status, database connections, Nginx request rates). Start with the basics.
Method 1: Prometheus and Grafana
This is the standard setup in the homelab world. Prometheus collects and stores metrics. Grafana turns them into dashboards. They run as Docker containers.
What You’re Setting Up
Three containers:
- Prometheus - scrapes metrics from your server and stores them
- Node Exporter - runs on the host and exposes system metrics (CPU, memory, disk)
- Grafana - reads from Prometheus and shows dashboards
Steps
1. Create a directory
mkdir ~/monitoring && cd ~/monitoring
2. Create docker-compose.yml
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
ports:
- "9090:9090"
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
network_mode: host
pid: host
restart: unless-stopped
grafana:
image: grafana/grafana:latest
volumes:
- grafana-data:/var/lib/grafana
ports:
- "3000:3000"
restart: unless-stopped
volumes:
prometheus-data:
grafana-data:
3. Create prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
Node Exporter exposes metrics on port 9100. Prometheus scrapes them every 15 seconds.
4. Start everything
docker compose up -d
5. Open Grafana
Go to http://your-server-ip:3000. Login with admin / admin. It asks you to change the password.
6. Add Prometheus as a data source
In Grafana: Configuration > Data Sources > Add data source. Select Prometheus. Set the URL to http://prometheus:9090. Click Save & Test.
7. Import a dashboard
Go to Dashboards > Import. Enter ID 1860 (that’s the standard Node Exporter dashboard by rfrail3). Click Load, select your Prometheus data source, and Import.
You now see CPU, memory, disk, and network graphs updating in real time.
What You Can See Now
- CPU usage per core
- Memory used, available, cached
- Disk space and I/O
- Network traffic in and out
- System load and uptime
Keep this tab open. Check it when something feels slow. You’ll start noticing patterns.
Useful PromQL Queries
Prometheus has its own query language. A few to start with:
# CPU usage percentage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# Disk space remaining
node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100
Paste these into Grafana’s Explore tab to try them.
Method 2: Netdata
Netdata is the opposite of Prometheus. One command, zero config, instant dashboard. Good for a single server. Great for getting started without learning PromQL.
Steps
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
That’s it. Wait a minute, then open http://your-server-ip:19999.
What You Get
- Real-time graphs for everything: CPU, memory, disk, network, processes
- Per-process monitoring (see which process is using the most CPU right now)
- Alerts pre-configured (disk filling up, high CPU, OOM events)
- A clean web UI that updates every second
Netdata uses almost no resources. It’s written in C and uses about 1% CPU and 100MB RAM on a typical server.
When to Use Each
Netdata is for quick checks and real-time debugging. Something feels slow? Open Netdata, look at the spike.
Prometheus + Grafana is for trend data and historical views. Want to know if your disk filled up last week? Prometheus has that answer.
Run both if you want. They don’t conflict.
Your First Alert
Monitoring without alerts means you still have to check the dashboard. Set up one alert: disk space.
Grafana can send alerts to email, Discord, Slack, or any webhook. The simplest: add a Discord webhook and get notified when disk space drops below 20%.
In Grafana: Alerting > Contact points > Add contact point. Select Discord, paste your webhook URL. Then Alerting > Alert rules > Create alert rule. Pick your dashboard panel for disk space, set condition to “when last() is below 20”, and route to your Discord contact point.
Now you don’t check the dashboard. The dashboard checks you.
Next Steps
You can see what your server is doing. Next: Backing Up Your Homelab so you don’t lose anything when something breaks.