Site Reliability Engineer
Location: Hyderabad
Reports to: Head – DevOps & TechOps
Type of Position: Full Time
About us:
Arrise Solutions India Pvt. Ltd. (powering PragmaticPlay) is a leading content provider to the iGaming and Betting Industry, offering a multi-product portfolio that is innovative, regulated and mobile-focused. Pragmatic Play strives to create the most engaging and evocative experience for customers globally across a range of products, including slots, live casino, sports betting, virtual sports and bingo.
Job Description:
About The Role
The team at Arrise powering Pragmatic Play deals with infrastructure provisioning automation, configuration management, tool administration, production system support, application environment maintenance, observability, release management, and internal tool development for efficient handling of the aforementioned activities. You will be actively involved in all aspects of the team activities.
What You’ll Do Here
- Analyze, troubleshoot, debug, and assist in problem solving in test and production environments within the framework of incident and change management processes.
- Lead any outage triage that impact the infrastructure/applications and collaborate closely with support & development teams by playing an active role in blameless postmortems, refine play books to reduce MTTR
- Ensure the operational health, high availability, reliability, and security of the applications & Infra.
- Maintain production services through measuring and monitoring availability, latency, and overall system health.
- Handle on-call and emergency support
What We're Looking For
- 7-9 years of experience with SRE/DevOps/System Engineer/Production Support roles
- Infrastructure/app performance engineering experience is a nice to have
- Experience with Linux administration and troubleshooting of Java applications in production
- Maintenance and development of monitoring, logging, tracing, and alerting solutions [Grafana, ELK or PagerDuty or equivalents]
- Hands-on experience with tools and techniques to diagnose and uncover container and overall system performance
- Ability to handle fast paced environment with multiple projects simultaneously and incident responses.
- Hands-on experience with handling Infrastructure on AWS / GCP / Data Center or similar cloud/on-premise platforms.
- Expertise in scripting languages: Python / Groovy / Ruby or equivalent
- Experience in DevOps, CI/CD, Configuration, and Release Management areas and relevant tools