Job Description

Join KMC Solutions as a Site Reliability Engineer (SRE) and play a pivotal role in ensuring the reliability, availability, and performance of our production environments. In this remote position, you’ll work a US shift schedule, collaborating with global teams to support product launches on AWS.
You’ll be responsible for designing and maintaining scalable infrastructure, implementing automation, and providing rapid incident response. By leveraging AWS Organizations and best‑in‑class monitoring tools, you’ll drive continuous improvement and help us deliver a seamless experience to our customers.
We value a proactive mindset, a love for solving complex problems, and the ability to communicate across time zones. If you thrive in a fast‑paced, collaborative environment and are passionate about reliability engineering, this is the perfect opportunity for you.
Working with us means you’ll have the chance to expand your skill set on cutting‑edge cloud technologies, contribute to high‑visibility projects, and grow your career in a supportive, innovation‑driven culture.
We offer competitive compensation, benefits, and flexible remote work options, ensuring you can balance professional growth with personal well‑being.
As part of our SRE team, you will define and enforce reliability standards, create runbooks, and conduct regular game days to test resilience. Your insights will directly influence our product roadmap and help us achieve industry‑leading uptime.
In addition to technical challenges, you’ll enjoy a collaborative culture that encourages continuous learning, regular knowledge‑sharing sessions, and access to certifications and training programs to keep your expertise at the forefront of cloud technology.

Responsibilities

Design, implement, and maintain CI/CD pipelines and automation scripts to streamline deployments and reduce manual toil.
Monitor system health, performance, and availability using tools such as CloudWatch, Prometheus, Grafana, and ELK stack.
Respond to and resolve production incidents, performing root‑cause analysis and documenting lessons learned.
Collaborate with development teams to define SLIs, SLOs, and error budgets that align with business objectives.
Manage and optimize AWS resources across multiple accounts and regions using Terraform, CloudFormation, and AWS Organizations.
Conduct capacity planning, cost optimization, and security reviews to ensure scalable, cost‑effective infrastructure.
Participate in on‑call rotation and contribute to the continuous improvement of runbooks and operational procedures.

Qualifications

Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role with hands‑on AWS experience.
Strong proficiency in scripting languages such as Python, Bash, or PowerShell.
Experience with container orchestration platforms (Docker, Kubernetes, ECS) and infrastructure as code (Terraform, CloudFormation).
Solid understanding of networking, DNS, VPN, and security best practices in cloud environments.
Excellent problem‑solving skills and a data‑driven approach to monitoring and incident response.
Ability to work a US‑based shift schedule (e.g., 9 am – 6 pm PST) while collaborating with teams in the Philippines.
Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent practical experience).

Site Reliability Engineer – Remote (US Shift)

Job Description

Responsibilities

Qualifications

Required Skills

Ready to Take on This Challenge?

Related Jobs

ERP Strategist (Power BI)

IT Executive

Project Manager