Job Description
As the NOC Manager at Curran Daly & Associates, you will lead a high-performing team of performance monitoring professionals responsible for ensuring optimal application and network performance, availability, and a superior user experience across the enterprise. You will own the strategy and execution of 24/7 monitoring operations, balancing reliability with rapid incident response in a fast-paced environment.
Reporting to the Head of IT Operations, you will steward the end-to-end lifecycle of incidents—from detection to resolution—while partnering with infrastructure, security, application teams, and vendors. Your leadership will drive SLA adherence, continuous improvement, and robust disaster recovery planning. You will design runbooks, escalation pathways, and proactive monitoring initiatives that reduce mean time to detection and mean time to recovery, and you will cultivate a culture of accountability, collaboration, and data-driven decision making.
The ideal candidate brings deep experience in NOC environments, strong technical acumen with monitoring and observability tools, and a proven track record of delivering reliable services in enterprise settings. You will mentor junior staff, manage workload distribution, and recruit top talent to sustain a resilient, scalable NOC capable of supporting business growth.
Key competencies include incident management, change management, capacity planning, performance tuning, and cross-functional communication with executives and stakeholders. This role requires on-call leadership and the ability to translate complex technical issues into clear business impact, ensuring both transparency and trust across the organization.
Responsibilities
- Lead and manage a 24/7 NOC team of performance monitoring professionals responsible for enterprise-wide uptime, performance, and user experience.
- Oversee monitoring platforms (e.g., Nagios, Zabbix, SolarWinds, Datadog, New Relic) to detect incidents, alert the right teams, and drive timely resolutions.
- Own the incident management lifecycle: detection, triage, escalation, root cause analysis, communication, and post-incident reviews to prevent recurrence.
- Ensure adherence to SLAs/OLAs, drive continuous improvement, and coordinate with IT, application teams, and vendors to restore services quickly.
- Develop and maintain runbooks, playbooks, escalation matrices, change and release management processes, and disaster recovery plans.
- Lead capacity planning, performance tuning, and capacity forecasting to support current and future business demand.
- Foster automation and process improvements to reduce manual toil and accelerate MTTR.
- Mentor, coach, and develop NOC staff; lead talent acquisition and foster a culture of reliability, accountability, and teamwork.
Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or a related field; advanced degree preferred.
- 7+ years in NOC/IT operations with progressive leadership responsibilities; experience in enterprise environments.
- 3+ years in people management or team leadership; proven ability to build high-performing teams.
- Strong experience with monitoring/observability tools: Nagios, Zabbix, SolarWinds, Datadog, New Relic, or similar.
- Solid ITIL-based incident, problem, and change management experience; strong process discipline.
- Knowledge of network infrastructure, data centers, cloud platforms (AWS/Azure), and security best practices.
- Excellent communication and stakeholder management skills; ability to translate technical issues into business impact.
- Proven track record of meeting SLAs and driving resolution under pressure; data-driven mindset with strong analytical skills.