DevOps • SRE • Reliability Engineering

I build reliable systems with automation, observability, and clean operations.

Hi, I’m Ifeanyi (Ify) Ojuh. I enjoy turning infrastructure problems into stable, measurable systems—using Linux, CI/CD, IaC, monitoring, and incident-ready runbooks.

View Projects Hire / Collaborate

GitHub • Resume (PDF) • Email

Focus

Uptime & MTTR

Strength

Automation

Tooling

Linux • Git • Cloud

Reliability Snapshot

Service Level Objective

Availability 99.9%

Latency (p95) < 250ms

Error budget Controlled

What I ship

Dashboards + alerts that catch issues early
IaC that’s repeatable and reviewable
Runbooks that reduce panic during incidents

Observability IaC Incident Response

Featured Projects

Proof of work. Each project includes architecture, decisions, and what I learned.

DevOps / SRE Homelab (Multi-Node)

VirtualBox • Ubuntu Server • Docker • Prometheus • Grafana

Built a 3-node homelab to simulate production-style SRE workflows: Linux administration, containerized services, and monitoring dashboards.

infra-node, docker-node, monitor-node with NAT port-forwarded SSH
Nginx container deployed and validated with port mapping
Prometheus + Node Exporter targets visualized in Grafana

Linux Docker Observability View screenshots → GitHub →

Monitoring Stack Lab

Prometheus + Grafana + Alertmanager (VM / Docker)

Built an end-to-end observability stack with dashboards and alerts to detect CPU, memory, disk, and service-health issues before users notice.

Dashboards: node exporter, app metrics, latency
Alerts: disk > 80%, service down, high error rate
Runbooks: step-by-step response playbooks

Prometheus Grafana Alerting GitHub →

Log Pipeline (ELK)

Filebeat → Logstash → Elasticsearch → Kibana

Centralized system and application logs, added parsing rules, and built Kibana dashboards for faster debugging and incident triage.

Structured fields for HTTP status, response time, host
Kibana views for error spikes and top failing endpoints
Retention + index lifecycle strategy (basic)

ELK Log Parsing Dashboards GitHub →

CI/CD Pipeline

GitHub Actions → Docker build → Deploy

Automated build/test/deploy for a small web service. Reduced manual steps, ensured repeatable releases, and added basic checks.

Build + test on PRs
Docker image publish on merge
Deploy to VM with rollback notes

GitHub Actions Docker Release GitHub →

Terraform AWS VPC

VPC + subnets + routing + security groups

Infrastructure-as-code example showing a clean, modular Terraform setup with variables, outputs, remote state notes, and reusable modules.

Public/private subnets + route tables
Security group patterns (least privilege)
README with “how to apply safely”

Terraform AWS IaC GitHub →

DevOps / SRE Homelab — Details

Screenshots + proof (kept out of the main project list to keep the homepage clean).

Architecture & Setup

3 Ubuntu Server VMs: infra-node, docker-node, monitor-node
NAT port forwarding for SSH + services
Prometheus scrapes Node Exporter + Prometheus
Grafana dashboards for CPU, RAM, Disk, Network

VirtualBox Ubuntu Monitoring Full repo →

Grafana Dashboard (Preview)

This dashboard is powered by Prometheus scraping Node Exporter.

Prometheus targets UP — Prometheus targets (UP)

Docker containers running — Docker containers

Back to Projects

Skills (with receipts)

I prefer showing evidence in projects, runbooks, dashboards, and repos.

Linux & Troubleshooting

Processes, networking, logs, performance basics.

systemdjournalctltop/htop ss/netstatcurlbash

Observability

Metrics, dashboards, alerting, logging pipelines.

PrometheusGrafanaAlertmanager ELK/KibanaSLOs

Automation & CI/CD

Repeatable pipelines and safer releases.

GitGitHub ActionsDocker PythonBash

Cloud & IaC

Deployments that can be reviewed and reproduced.

TerraformAWS basicsNetworking IAM basics

Labs & Incident Simulations

Small experiments that build real operational confidence.

Incident Drill: “Disk Full”

Simulated a disk saturation event, detected it via alerting, traced the culprit logs, applied cleanup, and documented prevention steps.

Runbook Alerting GitHub →

Incident Drill: “502 Bad Gateway”

Reproduced a 502 with an upstream service crash, validated Nginx behavior, added health checks, and improved logs for faster triage.

Nginx Debugging GitHub →

Vagrant: Local SRE Sandbox

Built repeatable local environments for Linux practice: provisioning, services, networking, and “break/fix” exercises.

Vagrant Linux GitHub →

Writing

Short posts that show how I think during outages and improvements.

Let’s build something reliable.

If you’re hiring for DevOps/SRE or want to collaborate, reach out. I respond fast.

Email me View Resume

Location Fort Worth, TX

Open to Remote / Hybrid

GitHub github.com/ifyojuh