Python script that SSHs into servers and checks if anything's broken. Sends email alerts when thresholds are exceeded. Basic monitoring without enterprise overhead.
What It Monitors
Connects to servers via SSH and checks:
- Disk usage
- CPU load
- Memory usage
Any metric over threshold triggers an email alert.
Implementation
The script uses Paramiko for SSH connections, runs standard Unix commands, parses output:
def check_disk(self, host: str, username: str, key_path: str) -> Dict:
"""Check disk usage"""
cmd = "df -h / | awk 'NR==2 {print $5}'"
result = self.ssh_execute(host, username, key_path, cmd)
usage = int(result.replace('%', ''))
return {
'metric': 'disk',
'value': usage,
'unit': '%',
'status': 'CRITICAL' if usage >= self.thresholds['disk'] else 'OK'
}
CPU check uses load average relative to number of cores. Single-core system at 1.0 load = 100% CPU. Four-core system at 1.0 load = 25% CPU per core:
def check_cpu(self, host: str, username: str, key_path: str) -> Dict:
cmd = "echo $(cat /proc/loadavg | awk '{print $1}') $(nproc)"
result = self.ssh_execute(host, username, key_path, cmd)
load, cores = result.split()
cpu_percent = (float(load) / int(cores)) * 100
return {
'metric': 'cpu',
'value': round(cpu_percent, 2),
'unit': '%',
'status': 'CRITICAL' if cpu_percent >= self.thresholds['cpu'] else 'OK'
}
Memory check parses free output:
def check_memory(self, host: str, username: str, key_path: str) -> Dict:
cmd = "free | awk 'NR==2 {printf \"%.0f\", ($3/$2)*100}'"
result = self.ssh_execute(host, username, key_path, cmd)
usage = int(result)
return {
'metric': 'memory',
'value': usage,
'unit': '%',
'status': 'CRITICAL' if usage >= self.thresholds['memory'] else 'OK'
}
Configuration
JSON config file defines servers, thresholds, email settings:
{
"servers": [
{
"name": "Web Server",
"host": "192.168.1.100",
"username": "admin",
"key_path": "~/.ssh/id_rsa"
}
],
"thresholds": {
"disk": 85,
"cpu": 80,
"memory": 85
},
"email": {
"smtp_server": "smtp.gmail.com",
"smtp_port": 587,
"from": "alerts@domain.com",
"to": "you@domain.com",
"username": "alerts@domain.com",
"password": "your-app-password"
}
}
Thresholds in percentages. Disk at 85% triggers alert. CPU load above 80% triggers alert. Memory usage over 85% triggers alert.
Email Alerts
When threshold exceeded, script sends SMTP email with details:
def send_alert(self, server_name: str, issues: List[Dict]):
subject = f"[ALERT] System Health Issues on {server_name}"
body = f"Critical issues detected on {server_name} at {datetime.now()}:\n\n"
for issue in issues:
body += f"- {issue['metric'].upper()}: {issue['value']}{issue.get('unit', '')}\n"
msg = MIMEText(body)
msg['Subject'] = subject
msg['From'] = email_config['from']
msg['To'] = email_config['to']
with smtplib.SMTP(email_config['smtp_server'], email_config['smtp_port']) as server:
server.starttls()
server.login(email_config['username'], email_config['password'])
server.send_message(msg)
Gmail requires app-specific passwords. Don't use your main password. Generate one at account security settings.
Setup Process
Install Paramiko:
pip install paramiko
Set up SSH key authentication to avoid password prompts:
ssh-keygen -t ed25519
ssh-copy-id user@server
Test the script manually:
python3 health_checker.py config.json
Output shows check results:
=== System Health Check - 2025-10-07 ===
Checking Web Server (192.168.1.100)...
✓ disk: 45%
✓ cpu: 23.5%
✗ memory: 87%
Alert sent to you@domain.com
Automation
Add to cron for hourly checks:
crontab -e
# Add this line:
0 * * * * /usr/bin/python3 /path/to/health_checker.py /path/to/config.json
Runs every hour on the hour. Adjust schedule as needed. Every 15 minutes: */15 * * * *. Twice daily: 0 0,12 * * *.
What This Is Good For
Small deployments with 2-10 servers. Situations where enterprise monitoring tools are overkill or too expensive. Quick visibility into system health without dashboard complexity.
What This Isn't
Not suitable for large-scale infrastructure. No metrics history. No dashboards. No complex alerting logic. No anomaly detection.
For production environments with dozens of servers, use Prometheus, Grafana, or similar. This script is for situations where you need basic alerting and don't need the overhead.
Security Considerations
SSH keys should be passwordless for automation but stored securely. Restrict key permissions to read-only where possible.
Email credentials in config file are a risk. Consider using environment variables instead or a secrets manager.
Script runs commands with the privileges of the SSH user. Don't use root. Create a dedicated monitoring user with limited permissions.
Extensions
Add more checks: failed login attempts, running processes, service status, network connectivity.
Log results to file for trend analysis. Current implementation only alerts, doesn't store history.
Integrate with Slack or Discord webhooks instead of email for faster notification.
Add retry logic for transient connection failures. Current implementation fails immediately on SSH timeout.
Full code available on request. Modify for your infrastructure.