Job Responsibilities · Provisioning and ongoing management of physical & virtual Linux machines using tools like Puppet, Ansible, and Terraform, to name a few · Engage closely with sister teams to assume ownership of various system lifecycle tasks · Automate away toil and/or create empowerment processes for transitioning high urgency work to the NOC’s rapid response team · Build automated monitoring & observability using tools such as Prometheus/AlertManager, iCinga, Grafana, etc. · Participate in all Agile/scrum ceremonies including daily stand-ups, sprint planning, backlog grooming, etc. · Participate in the team’s on-call rotation (expected to begin late 2024, early 2025) · Work closely with internal teams to integrate new monitoring & alerts into the NOC using Perl scripting to author custom parsing & mapping rules · Develop metrics and observability dashboards which can be used to measure and track various success measures for the team & the business Typical Qualifications · 5+ years of professional experience delivering SaaS solutions, preferably in a hybrid cloud environment · Bachelor’s or Master’s degree in a Computer Science / Engineering program · Proven experience using query languages to deliver observability solutions · Proficiency working with one or more configuration management tools (Puppet, Chef, Ansible, etc.) · Admin-level expertise with a Unix-based operating system · Proven ops background using cloud-native best practices · Proven proficiency with one or more scripting languages (Python, Ruby, Perl, Java, etc.) · Proficiency working with Git & Atlassian suite or similar · Proficiency working with containerized environments is a plus · Experience creating technical documentation & standard operating procedures (SOPs)
Senior Site Reliability Engineer
Job Function
Software Development
Industry
technology
Experience Required
5 - 8 years
Qualification
bachelor
Openings
2 positions
Healthcare
Apply now to start your journey with Athena Health Care