Job Description
Are you a passionate engineer looking to make a significant impact at the intersection of energy technology and software innovation? Schlumberger is seeking a highly skilled Site Reliability Engineer (SRE) / Infrastructure Engineer to join our dynamic team in Kuching. In this role, you will be the backbone of our digital infrastructure, ensuring that our high-availability systems perform optimally to support global energy operations.
You will bridge the gap between development and operations by applying a software engineering mindset to system administration. Your primary focus will be to eliminate manual toil, enhance system reliability, and scale our infrastructure to meet the evolving demands of the energy sector. We are looking for a proactive problem solver who thrives in a collaborative environment and is eager to optimize complex distributed systems.
Responsibilities
- Design, build, and maintain scalable infrastructure to support mission-critical applications.
- Implement automated solutions to reduce manual toil and improve system efficiency.
- Proactively monitor system health, performance, and capacity, taking corrective actions as needed.
- Collaborate with development teams to ensure seamless deployment cycles and CI/CD pipeline integrity.
- Manage cloud-based services and on-premise infrastructure to ensure 99.9% availability.
- Lead incident response protocols and conduct post-mortem analyses to prevent future system outages.
- Develop and maintain infrastructure-as-code (IaC) using tools like Terraform or CloudFormation.
Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or a related engineering field.
- 3+ years of experience in SRE, DevOps, or large-scale systems administration.
- Proficiency in at least one scripting or programming language (Python, Go, or Bash).
- Strong hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Deep understanding of Linux/Unix internals, networking protocols, and distributed systems.
- Experience with containerization and orchestration tools (Docker, Kubernetes).
- Strong troubleshooting skills and the ability to work under pressure during critical system incidents.
- Excellent communication skills with the ability to collaborate across global teams.