Senior Site Reliability Engineer, Incident Response - London, United Kingdom - Box

Box London, United Kingdom

2 weeks ago

Full time

Description

WHAT IS BOX?

Box is the market leader for Cloud Content Management. Our mission is to power how the world works together. Box is partnering with enterprise organizations to accelerate their digital transformation by creating a single platform for secure content management, collaboration and workflow. We have an amazing opportunity to further establish ourselves as leaders in the space, and we need strong advocates to help us achieve that goal.

By joining Box, you will have the unique opportunity to help capture a majority of this developing market and define what content management looks like for the digital enterprise. Today, Box powers over 97,000 businesses, including 70% of the Fortune 500 who trust Box to manage their content in the cloud.

WHY BOX NEEDS YOU

Box is looking for a dynamic Global Senior Site Reliability Engineer to help lead our Global Technical Operations and oversee the continuous health, availability, and reliability of an industry-leading platforms and SaaS offerings. It is the responsibility of the TDO team to lead 24x7 GTOC teams in preventing, monitoring, identifying, troubleshooting, mitigating, and resolving issues that affect the availability and quality of Box's platforms and services.

This is an integral shift-based leader and single point of technical escalation within the GTOC organization, assuming accountability for overall production site health and the performance of core customer facing journeys. This role will help maintain total site awareness, detecting metric and service deviations, final level of change approval, and the proactive identification of potential issues; resolving them before they escalate to customer impacting incidents.

We are building a world class Operations Center and need the best talent possible to get us there. That's where you come in

WHAT YOU'LL DO

Own and direct live-site Major Incident Management from detection, identification, escalation, mitigation, and recovery.
Triage, refine, and verify the Problem Statement, notifies and coordinate the efforts of all appropriate SME resources, and lead cross-functional Incident Bridges to quickly identify and mitigate the problem and restore service. You'll be evaluated in how well you are able to reduce MTTD to MTTR.
Ensure accurate, valid and timely communication to key stakeholders and business entities.
Lead daily Incident and Change ticket reviews, coordinate and monitor change windows, and coordinate with Problem Management on TopOps Issues and action items.
Operate across organizational boundaries (Business, Dev, Ops, CS) to protect our customers, their data, and the availability of all Box services, from internal and external security threats, unanticipated volume surges, and significant performance issues.
Troubleshoot and identify critical problems in a SOA/API-based, global hybrid cloud, distributed edge architecture on multiple enterprise and public clouds regions.
Provide day to day technical expertise and experience to the organization to address issues in globally diverse, high velocity 24x7 environments - from policy and procedural decisions to key architectural and tooling insights to improve Box's Incident, Change, and Problem Management engineering capabilities.
Lead daily reviews of planned changes (CAB) in Jira; accountable for reviewing and minimizing change risk, ensuring adequate and appropriate change timing and duration, and complete rollout, validation, and rollback plans that are optimized to prevent site or service impact.
Ensure all customer-impacting Incident tickets are completely and correctly documented and augmented with appropriate metrics, timelines, actions taken, and actions still pending.
Contributes and reviews Incident postmortems to ensure adequate documentation and appropriate prioritization of action items related to reducing MTTI, MTTM and MTTR.
Participates in Problem Management scrums and Postmortems to identify leading organizational and company-wide technical issues, threats, and trends that block the ability of the organization or teams to perform their roles and provide services optimally and reliably.
Lead projects to improve tools and processes related to overall site and service manageability, observability, and resiliency.
Coordinate regularly with Infosec, Customer Success, Platform and Dev leaders to continuously access new security and customer on-boarding threats and known issues.
Continuously mentor and train Global NOC and system engineers.

WHO YOU ARE

You have 5+ years of large-scale production/platform operations experience in a large, SaaS provider environments, preferably as a Major Incident Manager, SRE team leader or Infrastructure (IaaS) or Platform (PaaS) Architecture SME in a Managed Service Provider environment.
Experience in bare metal, Openstack, and K-8 architectures supporting a large number of SOA-API-based services.
Exposure to Open Source Service-Meshes, Proxies, Caching, Message Buses (Kafka, MQS), NOSQL (Hbase, Hadoop), MYSQL clusters, and Search environments (SOLR, ES).
You should be competent in debugging global, distributed Web/API sites based on Linux systems (Ubuntu, RHL, Centos), BGP, iBGP, and IP Anycast networking in multi-vendor virtualized, Edge and hybrid public cloud architectures.
You are not expected to be an expert in all areas, but you should be familiar with common terminologies, processes, and architectures in Linux Open Source environments, as well as a thorough understanding of Virtualization, Containers, and Kubernetes.
You are confident and comfortable communicating and interacting with individual-contributors through C-level executives from multiple countries, ethnicities, and backgrounds.
You have a rock solid command presence and are calm and collected in highly stressful situations, such as a major service outage.
You're driven to continuously learn new skills and technologies.
Bachelor's degree in Computer Science or Information Systems or equivalent technical field, or similar work experience in a large-scale 24/7 production environment supporting critical, real-time applications.
Flexibility to work different shifts and provide weekend coverage depending on need.

Required Skills

Solid understanding of ITILv4 Service Lifecycle Management, Service Delivery KPIs, SLIs, SLOs, and Incident, Change, and Problem Management framework, terminology, tools (ServiceNow, Remedy, Jira Service Desk), and processes
Solid knowledge and understanding of security standards and best practices, such as: OWASP, W3C, ISO 27001, SOC1-2, PCI, and SOX
Ability to troubleshoot secured protocols such as: SSH, SSO, TLS, FTPS, WebDav, HTTPS
Solid understanding and debugging skills in TCP/IP, BGP, IP Anycast, and distributed internal and external DNS
Two years working experience and knowledge with multi-regional public cloud providers
Experience with observability tools and distributed tracing in large scale environments (Splunk, Datadog, Wavefront, Catchpoint, ThousandEyes, Sensu, SignalFX RUM, Open Telemetry, SNMP)
Good understanding and experience with configuration management tools and CI/CD pipelines - Puppet, Ansible, Terraform, Artifactory
Excellent interpersonal and communication skills

Desired Skills

Understanding of Agile methods and tools (Jira).
Experience with WAF, Bot Managers, and Content Delivery Networks (Cloudflare, Akamai)
Experience working in and transitioning into multi-regional hybrid cloud architectures (GCP preferred, AWS)
Understanding of Apache Zookeeper and Hadoop.
Experience with large production Scala, Java, Node, PHP environments helpful.
Experience working with various message bus technologies (Kafka, RabbitMQ, MQS)
Experience working with relational and non-relational databases and search engines (Mysql, Postgres, HBase, Elastic Search, SOLR)
Experience with caching apps (Squid, Redis, Memcache)
Experience with service mesh technologies in a hybrid-cloud environment (Zookeeper, Smart Stack)

BENEFITS

Box Benefits package includes pension, medical and dental coverage. We have a robust wellness program including 25 days of vacation (plus your birthday off) and subsidized gym membership. There is such a thing as a free lunch, our in-house chef prepares this daily along with lots of snacks and drinks. EMEA HQ office is located in the impressive White Collar Factory on Old Street; , European offices in Paris and Munich.

EQUAL OPPORTUNITY

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability, and any other protected ground of discrimination under applicable human rights legislation. Box strives to respect the dignity and ‎‎independence of people with disabilities and is committed to giving them the same ‎‎opportunity to succeed as all other employees. Accommodations are available ‎throughout ‎the application process and an employee's employment at Box.

For details on how we protect your information when you apply, please see our Personnel Privacy Notice.

#LI-EMEA

Incident Response Specialist

2 weeks ago

Arc IT Recruitment London, United Kingdom

**Incident Response Specialist - SOC** · **London, City/ Remote** · **Salary £90K - £ 95K plus bonus, plus benefits** · **Key Responsibilities**: · - Lead and coordinate the investigation effort for cyber security incidents from initial escalation through after-action reporting · ...
Incident Response Engineer

3 weeks ago

Akkodis London, United Kingdom

We are seeking an Incident Response Engineer to support one of our key clients. · As part of the Incident Response team, the Incident Response Engineer will be responsible for monitoring, investigating and responding to security incidents and supporting various security tools whi ...
Incident Response Officer

2 weeks ago

Cygnet Health Care London, United Kingdom

We are looking for an experienced Incident Response Officer to join our team at Cygnet Health Care. · Cygnet Health Care have been providing a national network of specialist mental health services for more than 30 years. · With us, you'll have the chance to progress your career a ...
Incident Response Manager

3 weeks ago

NonStop Consulting Ltd London, United Kingdom

**Incident Response Manager** · I am currently looking for an Incident Response Manager/ Assistant Manager to join my prestigious client. You will be working as a key part of their specialist Cyber Response team to help their prominent, large-scale Clients understand and respond ...
Cyber Incident Response

1 week ago

Talentorder London, United Kingdom

**Job Title**:Cyber Incident Response - Senior Managing Consultant · **Location**:London · **Salary**:£50,000-£65,000 · At our clients Consulting Solutions it provides you with the insights, deep technical expertise, and global resources needed to create and implement risk manage ...
Incident Response Coordinator

3 weeks ago

Transport for London London, United Kingdom

**Incident Response Coordinator** · **042914** · **Organisation** · - NETWORK MANAGEMENT CONTROL CENTRE · **Job** · - Administration · **Position Type** · - Full Time · **Location: Southwark, London** · **Salary: £33,800 (plus 24% non-pensionable shift allowance), plus benefits** ...
Incident Response Associate

3 days ago

GCS Recruitment Specialists Ltd London, United Kingdom

**Incident Response Associate, London** · **Full Time Permanent** · **The Role**: · - ** Managing incident response cases from first contact through to closure**: you will be the primary point of contact for all internal and external stakeholders, accountable for delivery in-time ...
Incident Response Manager

1 week ago

NonStop Consulting Ltd London, United Kingdom

**Responsibilities** · - Manage and co-ordinate cyber security incidents for clients working closely with the team lead. · - Digital forensics of relevant incident data (disk, volatile memory, network packets, log files). · - Provide an up to date view of the cyber threat, and ad ...
Incident Response Coordinator

3 days ago

Transport for London London, United Kingdom

**Organisation** **-** NETWORK MANAGEMENT CONTROL CENTRE · **Job** **-** Administration · **Position Type** **-** Full Time · **Incident Response Coordinator** · **Location: Southwark, London** · **Salary: £34,000 (plus 24% non-pensionable shift allowance)**: · **Contract: Perman ...
Digital Forensics Incident Response

2 weeks ago

Cypfer London, United Kingdom

About Us: · We have an exciting opening for a Digital Forensics Incident Response (DFIR) Consultant. As a Digital Forensic and Incident Response Consultant you will engage in client-facing incident response projects and offer proactive incident response services. In a collaborati ...
Incident Response Identity Consultant

3 weeks ago

Secureworks London, United Kingdom

Secureworks (NASDAQ: SCWX) is a global cybersecurity leader that protects customer progress with Secureworks Taegis, a cloud-native security analytics platform built on 20+ years of real-world threat intelligence and research, improving customers' ability to detect advanced threa ...
Senior Incident Response Associate

1 week ago

GCS Recruitment Specialists Ltd London, United Kingdom

**Overview** · My client is a global intelligence and cyber security consultancy specializing in solving complex information security challenges. Their team is comprised of sharp, curious, and driven individuals who excel at critical thinking and problem-solving. They prioritize ...
SOC Incident Response Lead

5 days ago

Anaplan London, United Kingdom

Here at Anaplan, we have reinvented how companies see, plan, and run their businesses. Our platform allows our customers to uncover new insights, connect their strategy to their plans, and work in ways they had not previously thought possible. We're growing fast, constantly innov ...
Senior Associate Incident Response

3 days ago

GCS Recruitment Specialists Ltd London, United Kingdom

My client is a global intelligence and cyber security consultancy specialising in solving complex information security challenges. Their team is comprised of sharp, curious, and driven individuals who excel at critical thinking and problem-solving. They prioritise work-life balan ...
Incident Response Consultant, Talos, Uk

3 weeks ago

Cisco Systems London, United Kingdom

**What You'll Do** · The Cisco Talos Incident Response Consultant will work with Cisco customers, using established methodologies, to perform a variety of reactive and pro-active Incident Response related activities. These may include emergency investigations of cyber incidents, ...
Cyber Security Incident Response Leader

2 weeks ago

Vanquis Bank London, United Kingdom

**About Us** · Vanquis Banking Group has a rich history dating back to 1880. The company was founded in Bradford by Joshua Kelley Waddilove as The Provident Clothing and Supply Company to help people access finance and goods who couldn't from traditional lenders. · Today, Vanquis ...
Threat & Response Officer: Threats, Incident

2 weeks ago

Cabinet Office London, United Kingdom

**Details**: · **Reference number**: · **Salary**: · - £38,250 - £42,250- A Civil Service Pension with an average employer contribution of 27%**Job grade**: · - Higher Executive Officer**Contract type**: · - Permanent**Business area**: · - CO - Government Security Group**Type of ...
Incident Response/threat Hunting Specialist

1 week ago

Barclay Simpson London, United Kingdom

**Incident Response/Threat Hunting Specialist**: · - London · - £90,000 + bens · - Sector: Professional Services, Commerce and Industry · - Job reference: 40942 · I'm working with a boutique consultancy, who are seeking to grow to their existing cyber function with another dedica ...
Principal Security Incident Response Specialist

2 weeks ago

Vanquis Bank London, United Kingdom

**Location**: Chatham/London/Bradford · **Salary**: · **Closing Date**: Sunday 30 July 2023 · **About Us** · Vanquis Banking Group has a rich history dating back to 1880. The company was founded in Bradford by Joshua Kelley Waddilove as The Provident Clothing and Supply Company t ...
Patient Safety Incident Response Framework

5 days ago

East London NHS Foundation Trust London, United Kingdom

ELFT are looking for a dynamic PSIRF Implementation Lead who will lead and support the national and local implementation of the PSIRF alongside Quality Improvement colleagues following Qi Methodologies including PDSA cycles. · PSIRF supports the three strategic aims of the nation ...

Senior Site Reliability Engineer, Incident Response - London, United Kingdom - Box

Description

Incident Response Specialist

Incident Response Engineer

Incident Response Officer

Incident Response Manager

Cyber Incident Response

Incident Response Coordinator

Incident Response Associate

Incident Response Manager

Incident Response Coordinator

Digital Forensics Incident Response

Incident Response Identity Consultant

Senior Incident Response Associate

SOC Incident Response Lead

Senior Associate Incident Response

Incident Response Consultant, Talos, Uk

Cyber Security Incident Response Leader

Threat & Response Officer: Threats, Incident

Incident Response/threat Hunting Specialist

Principal Security Incident Response Specialist

Patient Safety Incident Response Framework

Paul McAuliffe

Trisha Mehan

Scott Bailey

Joel Fierstone

Jhonatan Teixeira

srinivasa burli

for Recruiters

Information

Senior Site Reliability Engineer, Incident Response - London, United Kingdom - Box

Description

Senior Site Reliability Engineer, Incident Response professionals in London