Sre Engineer - London, United Kingdom - eFinancialCareers

Tom O´Connor

Posted by:

Tom O´Connor

beBee Recruiter


Description
TEKsystems is currently engaged with a financial services company to recruit Site Reliability Engineer. who will be responsible for delivering continuous improvement, automation and self-service offerings to operational teams across company.


Primary:


  • Develop software to make infrastructure services selfmanaging and selfservice
  • Deliver continuous service improvement by developing Infrastructure as Code
  • Eliminate manual, repetitive, automatable, tactical tasks that are devoid from value
  • Improve system performance, make effective use of resources, distribute load and reduce latency
  • Identify SLO's (Service Level Objectives) to meet availability and latency objectives
  • Develop proactive monitoring solutions that alert on symptoms and not just on outages
  • Perform detailed root cause analysis (RCA's) on incidents and outages to prevent future
  • Partner with development teams to improve services via rigorous testing and release procedures
  • Develop standard operational procedures and produce effective documentation
  • Analyse workloads and devise suitable cloud migration strategies where appropriate
  • Ensure all project / investment workloads are delivered according to plans and budget defined
  • Liaise with Infrastructure Control and IT Risk teams to satisfy internal and external audit requests
  • Deputise for team lead when required to do so and actup accordingly
  • Identify cost saving and optimisation opportunities across the group
  • Build strong working relationships across the organisation
  • Adhere to the core values of the bank

Secondary:


  • Perform daily health and compliance checks for all systems as required
  • Ensure all systems are backed up successfully and any issues are promptly resolved
  • Validate monitoring alerts and batch job failures are detected promptly and satisfactorily resolved
  • Ensure sufficient capacity is available to accommodate drive growth
  • Handle incidents and requests with efficiency and a "customer first" mindset
  • Maintain infrastructure in a highly available, reliable, secure and performant manner

Essential:


  • AWX / Ansible Tower
  • Git, Ansible, Terraform and TeamCity
  • Serena Deployment Automation (SDA) and Jenkins
  • Kubernetes and Docker
- "Infrastructure as Code" Principles and practices.
- "Continuous Integration (CI) and Continuous Development (CD)" Principles and practices

  • Agile, Site Reliability Engineering (SRE) and DevOps Principles and practices
  • Scripting and programming languages such as PowerShell, Python, Bash and C#
  • Fluent in Backup and Recovery processes and procedures
  • Advanced knowledge of Clustering, High-Availability, Replication and Disaster Recovery techniques
  • Ability to tune Network, Storage, Server and Virtualisation layers for optimal performance and reliability
  • Excellent Performance Tuning skills, indepth knowledge of system internals
  • Ability to interpret and implement CIS security hardening recommendations in a controlled manner
Acute awareness of Security and Auditing requirements in a regulated environment


Employee Value Proposition:

hybrid


Job Title:
SRE Engineer


Location:
London, UK


Rate/Salary:
GBP Daily


Job Type:
Contract

More jobs from eFinancialCareers