Site Reliability Engineer - London, United Kingdom - Explore Group

Explore Group
Explore Group
Verified Company
London, United Kingdom

3 weeks ago

Tom O´Connor

Posted by:

Tom O´Connor

beBee Recruiter


Description

Site Reliability Engineer - 2 days on-site - Kensington, London

  • The SRE is required to work 2 days onsite at their offices close to Knightsbridge, London._

Main Responsibilities of the SRE:


The SRE will have a proven record in supporting production environments in a very fast-paced company with tech spread across geographic locations and cloud providers.


  • Confident in managing virtual platforms and the underpinning services such as Enterprise Storage, SD-Networking.
  • Confident in managing peripheral infrastructure services such as Backup/Restore, nonnative monitoring, Hardware monitoring.
  • Provide technology insight and support to key management staff and peers.
  • Have the ability to manage tasks to tight deadlines and mange upwards regular updates and managing expectations accordingly.
  • Have a good understanding of VMware, EMC and/or AWS cloud and other IASS solutions.
  • Be keen to develop scripting capability and API integration in one or more popular languages. (Puppet/ Python/Shell)
  • Understand and work towards a strategy set out by senior management ensuring we adhere to direction and execute tasks based on priority to meet strategy deadlines.
  • Have the drive to constantly improve and try out new technology offerings to improve Operational efficiency and execution.
  • Constantly improve functional monitoring and nonfunctional monitoring of the infrastructure, to head off any issue that might occur.
  • Be flexible when it comes to out of hours support. You will be required to be oncall evening and weekends as you will form part of an oncall system for escalating to out of core working hours and ask to carry out change controls in agreed business maintenancewindows.

Further Responsibilities of the SRE

  • Investigating and resolving incidents assigned to the infrastructure services team and ensuring incidents are closed in a timely fashion to meet defined IT Services Levels.
  • Give guidance to developers in automating routine tasks.
  • Automate server provisioning in order to support the rapid pace of development and to scale services efficiently.
  • Ensuring in house knowledge documentation used by the System team is up to date.
  • Providing clear, accurate and timely updates to the customer with the progress of any incident resolution.
  • Providing the IT Helpdesk with information about incidents and problems to assist with the identification and tracking of common and reoccurring issues.
  • Ensure adherence to all IT policies, architectural standards and guidelines, operational processes and procedures including relevant build and security standards.
  • Identify and manage problems assigned to you by mangers to conclusion
  • Identifying and managing key risks within Infrastructure Services.
  • Provide first class monitoring of the entire estate in order to facilitate autorecovery.
  • Build and deploy the tooling that enables us to deliver all of the above.
  • Being a part of a 24/7 On Call ro
**Site Reliability Engineer - 2 days on-site - Kensington, London

More jobs from Explore Group