DevOps Engineer/SRE - London, United Kingdom - Alexander Ash Consulting

    Default job background
    Description

    Site Reliability Engineer - Global Quantitative Investment Management

    Permanent/Contract - London, UK - Competitive

    Any additional information you require for this job can be found in the below text Make sure to read thoroughly, then apply.

    We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join a leading quantitative research and technology firm specializing in leveraging innovative data science and cutting-edge technology to deliver unparalleled insights and solutions.

    You will be working at the intersection of technology and finance ensuring the reliability, availability, performance, and cost-efficiency of their critical systems and infrastructure. You will work closely with development, operations, and research teams to build and maintain robust, scalable systems using AWS, Terraform, Ansible, and Kubernetes.

    Key focuses:

    System Reliability and Performance:

    • Monitor and manage the performance and reliability of QRT's infrastructure and applications.
    • Implement and refine monitoring, logging, and alerting systems to detect and address issues proactively.
    • Conduct root cause analysis for incidents and implement solutions to prevent recurrence.

    Automation and Efficiency:

    • Develop and maintain automation scripts and tools using Ansible and Terraform to streamline operations and reduce manual intervention.
    • Optimize deployment processes and CI/CD pipelines for efficiency and reliability.
    • Implement infrastructure as code (IaC) practices to ensure scalable and reproducible infrastructure management.

    Scalability and Performance Optimization:

    • Design, deploy, and manage scalable and secure cloud infrastructure on AWS.
    • Utilize AWS services effectively to enhance system performance and reliability.
    • Implement and manage containerized applications using Docker and Kubernetes to ensure high availability and scalability.
    • Analyze system usage patterns and plan for future capacity needs.

    Cost Management:

    • Monitor and optimize cloud resource usage to ensure cost-efficiency.
    • Implement cost-saving measures and provide regular reports on cloud expenditure.
    • Evaluate and recommend new technologies and tools that offer cost-effective solutions without compromising performance.

    Qualifications:

    Education:

    • Bachelor's degree in Computer Science, Engineering, or a related field from a top tier university

    Experience:

    • 4+ years of experience in a Site Reliability Engineer, DevOps, or similar role.

    Technical Skills:

    • Proficiency in programming languages such as Python, Go, or similar.
    • Strong knowledge of AWS services and cloud architecture.
    • Experience with infrastructure as code (IaC) tools such as Terraform.
    • Expertise in configuration management tools such as Ansible.
    • Proficiency with containerization technologies like Docker and orchestration tools such as Kubernetes.
    • Strong understanding of networking, Linux/Unix systems, and database management.

    Soft Skills:

    • Excellent problem-solving and analytical skills.
    • Strong communication and collaboration abilities.
    • Ability to work in a fast-paced, dynamic environment and manage multiple priorities.

    If interested, please apply