Sre Engineer - London, United Kingdom - eFinancialCareers

eFinancialCareers

Verified Company

London, United Kingdom

3 weeks ago

Posted by:

Tom O´Connor

beBee Recruiter

Description

TEKsystems is currently engaged with a financial services company to recruit Site Reliability Engineer. who will be responsible for delivering continuous improvement, automation and self-service offerings to operational teams across company.

Primary:

Develop software to make infrastructure services selfmanaging and selfservice
Deliver continuous service improvement by developing Infrastructure as Code
Eliminate manual, repetitive, automatable, tactical tasks that are devoid from value
Improve system performance, make effective use of resources, distribute load and reduce latency
Identify SLO's (Service Level Objectives) to meet availability and latency objectives
Develop proactive monitoring solutions that alert on symptoms and not just on outages
Perform detailed root cause analysis (RCA's) on incidents and outages to prevent future
Partner with development teams to improve services via rigorous testing and release procedures
Develop standard operational procedures and produce effective documentation
Analyse workloads and devise suitable cloud migration strategies where appropriate
Ensure all project / investment workloads are delivered according to plans and budget defined
Liaise with Infrastructure Control and IT Risk teams to satisfy internal and external audit requests
Deputise for team lead when required to do so and actup accordingly
Identify cost saving and optimisation opportunities across the group
Build strong working relationships across the organisation
Adhere to the core values of the bank

Secondary:

Perform daily health and compliance checks for all systems as required
Ensure all systems are backed up successfully and any issues are promptly resolved
Validate monitoring alerts and batch job failures are detected promptly and satisfactorily resolved
Ensure sufficient capacity is available to accommodate drive growth
Handle incidents and requests with efficiency and a "customer first" mindset
Maintain infrastructure in a highly available, reliable, secure and performant manner

Essential:

AWX / Ansible Tower
Git, Ansible, Terraform and TeamCity
Serena Deployment Automation (SDA) and Jenkins
Kubernetes and Docker

- "Infrastructure as Code" Principles and practices.
- "Continuous Integration (CI) and Continuous Development (CD)" Principles and practices

Agile, Site Reliability Engineering (SRE) and DevOps Principles and practices
Scripting and programming languages such as PowerShell, Python, Bash and C#
Fluent in Backup and Recovery processes and procedures
Advanced knowledge of Clustering, High-Availability, Replication and Disaster Recovery techniques
Ability to tune Network, Storage, Server and Virtualisation layers for optimal performance and reliability
Excellent Performance Tuning skills, indepth knowledge of system internals
Ability to interpret and implement CIS security hardening recommendations in a controlled manner

Acute awareness of Security and Auditing requirements in a regulated environment

Employee Value Proposition:

hybrid

Job Title:
SRE Engineer

Location:
London, UK

Rate/Salary:
GBP Daily

Job Type:
Contract