Job Description
Are you a passionate engineer dedicated to building robust, scalable, and highly available systems? Mindteck is seeking a talented Site Reliability Engineer (SRE) to join our growing team in Cyberjaya. In this pivotal role, you will bridge the gap between development and operations, ensuring our mission-critical services perform optimally under pressure.
You will be instrumental in defining how our applications are deployed, managed, and monitored. We look for individuals who treat operations as a software engineering problem. You will not only maintain system health but proactively automate manual processes, minimize toil, and champion a culture of reliability across our engineering organization.
If you thrive in a fast-paced environment and are obsessed with system performance, latency optimization, and automated infrastructure, we want to hear from you.
Responsibilities
- Design, implement, and maintain scalable infrastructure to support high-traffic production environments.
- Monitor system availability, latency, and overall health using industry-standard observability tools.
- Automate manual operational tasks to increase efficiency and reduce 'toil'.
- Drive incident response processes, conduct blameless post-mortems, and implement long-term fixes to prevent recurrence.
- Collaborate with development teams to ensure software is designed with performance, scalability, and reliability in mind.
- Participate in an on-call rotation to ensure 24/7 service availability.
- Manage cloud infrastructure using Infrastructure as Code (IaC) principles.
Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- 3+ years of experience in SRE, DevOps, or Systems Engineering roles.
- Strong proficiency in scripting languages such as Python, Bash, or Go.
- Solid experience with cloud platforms (AWS, Azure, or GCP).
- Hands-on experience with containerization and orchestration tools like Docker and Kubernetes.
- Expertise in monitoring and logging stacks (e.g., Prometheus, Grafana, ELK, or Datadog).
- Strong understanding of CI/CD pipelines and version control systems (Git).
- Excellent problem-solving skills and the ability to thrive in high-pressure environments.