
At least 6+ years of experience defining and implementing Monitoring solutions - alerts, Telemetry, and instrumentation for on-premises and cloud platforms for large enterprises •Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure). Responsibilities of the SRE will be: •Build and configure alerts, tracing, telemetry, and instrumentation required for Infrastructure Monitoring and Application Performance Management. •Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams, portfolio, senior management). •Support resilience engineering (application and infrastructure resilience) to meet availability requirements. •Work with development engineers, cloud engineers, product teams, and support engineers to gather requirements, implement, and evolve observability and resilience solutions. Lead Kubernotes rightsizing and ongoing monitoring management Navigate and manage the migration from Splunk to Dyntrace •Good knowledge on Observability and Application Performance Monitoring best practices, KPIs/metrics on Cloud platforms •Experience in monitoring tools such as Splunk, Dyna Trace, Prometheus, Cloud Watch, Azure Monitor, New Relic, other open-source tools. •Experience building monitoring solutions for variety of workloads such as Micro services (Java / Spring boot desirable), databases, Kafka, Kubernetes •Experience in resilience engineering, and implementing high availability solutions •Experience creating Monitoring dashboards using tools such as Grafana (Preferred), Splunk, Kibana, Power BI •Ability to work in a fast paced and agile environment
Site Reliability Engineer
Job Function
Management
Industry
technology
Experience Required
6 - 12 years
Qualification
bachelor
Openings
1 position
Management & Consulting
Apply now to start your journey with Deloitte India