Manager, Site Reliability Operations- Walmart Energy
Walmart
Operations
USD 80k-155k / year + Equity
Posted on May 20, 2026
Position Summary...
The Manager, Site Reliability Operations leads the monitoring, incident management, and performance optimization of critical applications and infrastructure. This role ensures compliance with service level objectives by overseeing alerting systems, triaging incidents, and driving root cause analyses to enhance operational stability. Collaborating with cross-functional teams and stakeholders, the manager implements best practices in automation, change management, and continuous improvement. This position fosters a culture of accountability and resilience while supporting strategic initiatives that advance system reliability and business outcomes across varied technology environments.The Site Reliability Operations team at Walmart ensures the stability and performance of critical systems supporting retail operations. This team collaborates across functions to monitor application health, manage incidents, and implement automation for operational efficiency. Members apply expertise in incident management, DevOps, and stakeholder engagement to maintain service reliability and drive continuous improvement. Focused on proactive monitoring and rapid response, the team supports Walmart’s commitment to delivering seamless experiences for customers and associates while aligning with strategic business objectives through effective operational performance management.
What you'll do...
- Monitor and analyze system performance metrics, including availability, latency, and error rates, to ensure compliance with defined service level objectives.
- Lead incident management processes by coordinating timely responses, conducting root cause analyses, and implementing corrective actions to prevent recurrence.
- Oversee application and infrastructure health checks across multiple operating systems and environments, recommending improvements to alerting logic and instrumentation.
- Collaborate with cross-functional teams and stakeholders to drive continuous operational performance enhancements and resolve technical challenges.
- Manage change requests and workflow applications to support system stability and scalability.
- Mentor and develop team members, fostering a culture of accountability and continuous learning.
What you'll bring:
- Proven expertise in incident management, including incident response, reporting, and process improvement to meet SLA requirements.
- Strong knowledge of monitoring and alerting tools, with ability to analyze key performance indicators such as availability, MTBF, MTTR, and error rates.
- Experience in root cause analysis and troubleshooting to identify and resolve performance and availability issues independently.
- Familiarity with DevOps practices, application monitoring, and automation integration to enhance operational performance.
- Effective stakeholder engagement and management skills to collaborate across cross-functional teams and drive strategic initiatives.
- Ability to evaluate change requests and implement corrective, adaptive, and perfective maintenance activities.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.
For information about benefits and eligibility, see One.Walmart.
The annual salary range for this position is $80,000.00 - $155,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include :
- Stock
ㅤ
ㅤ
ㅤ
ㅤ
Minimum Qualifications...
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and5 years’ experience in site reliability operations, site and system administration, infrastructure management, or related area.Option 2: 7 years’ experience in site reliability operations, site and system administration, infrastructure management, or related area.2 years' supervisory experience.Preferred Qualifications...
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Experience in site reliability operations, site and system administration, infrastructure management, or related area, Master's degree in site reliability operations, site and system administration, infrastructure management, or related area and 3 years’ experience in site reliability operations, site and system administration, infrastructure management, or related area., SRE certification (for example, IBM Cloud Site Reliability Engineer)., We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.

















