Sre- Observability Specialist - Reigate, United Kingdom - Willis Towers Watson

Tom O´Connor

Posted by:

Tom O´Connor

beBee Recruiter


Description
SRE- Observability Specialist

  • Reigate, GB
May 12, 2023


You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we work on improving the delivery of value for customers and the business.

You will be working in the Site Reliability and Response team, whose responsibility is to deliver and manage business critical services that are used 24×7 by our clients and colleagues around the world.

This role is open to flexible and hybrid working arrangements, with presence in the Reigate office on average two days per week.


  • Develop and implement Observability strategies for SaaS products across Technology Delivery, ensuring metrics, logs and traces are effectively captured, analysed and actioned upon
  • Manage Datadog implementation, showcasing its abilities, ensuring the tool is configured correctly and being used efficiently with our services whilst maintaining cost effectiveness Advocate of Observability a big part of the role is being able to clearly communicate the benefits of Observability to technical and nontechnical stakeholders, emphasising its critical role to play in delivering successful SaaS offerings
  • Collaboration with crossfunctional teamsworking alongside engineering managers, product owners and operational teams. Understanding their requirements and ensuring our Observability strategies are aligned with the business objectives
  • Define and implement appropriate monitoring and alerting standards to proactively identify and address issues, whilst minimising alert fatigue
  • Provide training and support toa wide variety ofteams, ensuring theyare familiar with the disciplines of Observability and how to take advantage of Datadog to maximize availability,performance andreliabilityof their service
Regularly review and assess our standards and practices to ensure the effectiveness of Datadog in our SaaS platform

  • Solid experience in an Observability, Site Reliability Engineering or a similar role such as DevOps
  • Previous involvement in defining, planning and implementing Observability strategies, using Datadog or similar tools
  • Understanding of cloud infrastructure and services
  • Experience with Vendor Management
  • Strong interpersonal skills, with the ability to work effectively with many stakeholders
  • Excellent communication and presentation skills, with the ability to effectively convey complex concepts to both technical and nontechnical audiences
  • Experience with conducting Postmortems or Post Incident Reviews
  • Confidence in making decisions and taking ownership of projects
  • You're collaborative, enjoy problem solving and mentoring other

Other highly desirable, but not essential skills are:

  • Familiarity with Infrastructure as Code (IaC) tools like Pulumi, Terraform, ARM Templates, or Azure Bicep
  • Understanding of programming languages such as C# would be welcome
  • Experience with other popular monitoring tools e.g. Prometheus, Grafana, Elastic Stack is a plus
  • Awareness of ITIL
  • Understanding of Azure DevOps and CI/CD Pipelines and how to better integrate Observability into the development and deployment process
(ICT_TECH ED_2023_88R)

More jobs from Willis Towers Watson