.webp&w=256&q=75)
Job Responsibilities
· Provisioning and ongoing management of physical & virtual Linux machines using tools like Puppet, Ansible, and Terraform, to name a few
· Engage closely with sister teams to assume ownership of various system lifecycle tasks
· Automate away toil and/or create empowerment processes for transitioning high urgency work to the NOC’s rapid response team
· Build automated monitoring & observability using tools such as Prometheus/AlertManager, iCinga, Grafana, etc.
· Participate in all Agile/scrum ceremonies including daily stand-ups, sprint planning, backlog grooming, etc.
· Participate in the team’s on-call rotation (expected to begin late 2024, early 2025)
· Work closely with internal teams to integrate new monitoring & alerts into the NOC using Perl scripting to author custom parsing & mapping rules
· Develop metrics and observability dashboards which can be used to measure and track various success measures for the team & the business
Typical Qualifications
· 5+ years of professional experience delivering SaaS solutions, preferably in a hybrid cloud environment
· Bachelor’s or Master’s degree in a Computer Science / Engineering program
· Proven experience using query languages to deliver observability solutions
· Proficiency working with one or more configuration management tools (Puppet, Chef, Ansible, etc.)
· Admin-level expertise with a Unix-based operating system
· Proven ops background using cloud-native best practices
· Proven proficiency with one or more scripting languages (Python, Ruby, Perl, Java, etc.)
· Proficiency working with Git & Atlassian suite or similar
· Proficiency working with containerized environments is a plus
· Experience creating technical documentation & standard operating procedures (SOPs)
Senior Site Reliability Engineer
Job Function
Software Development
Industry
technology
Experience Required
5 - 8 years
Qualification
bachelor
Openings
2 positions
Healthcare
Apply now to start your journey with Athena Health Care