- Design and build scalable data pipelines to ingest, parse, filter, and optimize diverse web datasets.
- Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.
- Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency.
- Research and implement innovative data curation methods, leveraging Cohere's infrastructure to drive advancements in natural language processing.
- Collaborate with cross‑functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting‑edge language models.
- Strong software engineering skills, with proficiency in Python and experience building data pipelines.
- Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
- Experience working with large‑scale web datasets like CommonCrawl.
- A passion for bridging research and engineering to solve complex data‑related challenges in AI model training.
- An open and inclusive culture and work environment
- Work closely with a team on the cutting edge of AI research
- Weekly lunch stipend, in‑office lunches & snacks
- Full health and dental benefits, including a separate budget to take care of your mental health
- 100% Parental Leave top‑up for up to 6 months
- Personal enrichment benefits towards arts and culture, fitness and well‑being, quality time, and workspace improvement
- Remote‑flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co‑working stipend
- 6 weeks of vacation (30 working days)
-
+ Work under general direction within a clear framework of accountability. · + Exercise substantial personal responsibility and autonomy. · + Use substantial discretion in identifying and responding to complex issues and assignments as they relate to the deliverable/scope of work ...
London Full time4 weeks ago
-
We're working with a tech-driven organisation looking to hire an experienced Data Engineer to help design and scale cloud-based data platforms that power analytics and reporting across the business. · Build and maintain robust data pipelines for collecting data into various data ...
London4 weeks ago
-
Are you ready to build and optimize a cutting-edge enterprise data platform? We're looking for an experienced Data Engineer to join a major technology transformation project, driving innovation and scalability through advanced data engineering practices. · ...
London1 month ago
-
We are seeking a Data Engineer to design build and operate data solutions that power mission critical analytics in a complex public sector environment. · Engineer production grade data pipelines on Microsoft Fabric One Lake/Delta Lake Data Factory Synapse Notebook Data Engineerin ...
London2 weeks ago
-
We are seeking an experienced Databricks Data Engineer to design and deliver cloud-based data integration and analytics solutions within our Insurance portfolio. · Key Responsibilities · Design and deliver scalable data pipelines using Azure Databricks,Build and maintain ETL/ELT ...
London4 weeks ago
-
Our client is a top tier global hedge fund with a strong commitment to leveraging market innovations in technology and data to deliver high-quality returns. · They are looking for a Data Engineer to join a dynamic and fast-paced environment with excellent opportunities for career ...
London1 month ago
-
The OpportunityAreti are proud to be supporting a London-based consultancy that is continuing to grow its data engineering capability as part of its wider digital and data transformation offering. · ...
London3 weeks ago
-
A Data Engineer is required for an exciting and innovative Software sports betting company based in London. · ...
London3 weeks ago
-
A leading quantitative trading firm is seeking a Data Engineer to join a high-impact data team operating at the core of a large-scale automated trading platform. · ...
London1 month ago
-
Data engineer with experience in AWS and Apache Spark to design and develop scalable data pipelines. · ...
London1 month ago
-
We are looking for an experienced Senior Data Engineer to join a fast-moving engagement and play a key role in designing and delivering the first version of a modern Data Lakehouse. · Proven experience as a Senior Data Engineer in production environments · Strong hands-on Snowfla ...
London3 weeks ago
-
This is a great opportunity to join early, have real ownership and work closely with product, engineering and leadership teams. · ...
London3 weeks ago
-
We're partnering with a leading organisation seeking a Snowflake specialist to help modernise and scale their data capability.This is a delivery-focused role where you'll shape cloud data architecture, lead migrations, and build high-performance data pipelines that directly suppo ...
London1 week ago
-
A leading research-driven organisation is growing its cloud and data engineering capability and is hiring a Cloud & Data Platform Engineer to support the development, governance and optimisation of its modern Azure-based environment. · ...
London1 month ago
-
Junior Data Engineer · Green Park, London (5 days on-site) · Up to £60k + 20% bonus · Oliver Bernard have partnered with a leading real estate brokerage operating across EMEA — a business where data directly drives underwriting, portfolio strategy and capital allocation decisions ...
London £52,000 - £95,000 (GBP) per year7 hours ago
-
Data Engineer to join a wider tech team of 25 working on a blend of new and existing projects. · Owning the build and maintenance of their Lake house. · ...
London1 month ago
-
Data Engineer - Inside IR3- Python, SQL, Spark, Airflow, Kafka, · AWS OB have once again partnered with a leading consultancy · who work with Financial Services clients, · and are looking for skilled Data Engineers to join them on a contract basis.Pays £700-£800 p/d Inside IR35 · ...
London1 month ago
-
Data Engineer – Contract (Remote) · 6-month remote contract with client interview involved. Looking for candidates with 4+ years of Data Engineering experience and strong skills in Databricks/Snowflake on AWS,PythonSparkSQLAirflowCICD(Git/Jenkins) · ...
London3 weeks ago
-
+Job summary+Building The World's Leading Digital Platform for Cruise Travellers · The Cruise Globe is building the world's leading digital platform for cruise travellers. · +Responsibilities+Build and Scale Core Data Systems · Support Data-Driven Product Features · Solving compl ...
London1 week ago
-
Data Software Engineer – London A Data Engineer is required for an exciting and innovative Software sports betting company based in London. · ...
London1 week ago
-
Data Engineer (AI&ML) based in London, UK - Permanent. · Strong experience with Python, SQL, and modern data processing frameworks · Handson experience with cloud data platforms (AWS, Azure, or GCP) · ...
London3 weeks ago
Member of Technical Staff, Data Engineering - Greater London - Cohere Inc.
Description
Who we are
Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.
We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what's best for our customers.
Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.
Join us on our mission and shape the future
Why this role
As a Data Engineer specializing in pretraining data, you will play a pivotal role in developing the data pipeline that underpins Cohere's advanced language models. Your responsibilities will encompass the end‑to‑end management of training data, including ingestion, cleaning, filtering, and optimization, as well as data modeling to ensure datasets are structured and formatted for optimal model performance. You will work with diverse data sources, such as web data, code data, and multilingual corpora, to ensure their quality, diversity, and reliability. By combining research and engineering, you will bridge the gap between raw data and cutting‑edge AI models, directly contributing to improvements in critical training metrics like throughput and accelerator utilization.
Your work will be essential to Cohere's mission of delivering efficient and reliable language understanding and generation capabilities, driving innovation in natural language processing. If you are passionate about transforming data into the foundation of AI systems, this role offers a unique opportunity to make a meaningful impact.
We have offices in London, Paris, Toronto, San Francisco and New York but also embrace being remote‑friendly There are no restrictions on where you can be located for this role between EST and EU.
Responsibilities
Qualifications
Bonus: paper at top‑tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).
We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.
Perks
#J-18808-Ljbffr
-
Data Engineer Data Engineer
Full time Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
-
Data Engineer
Only for registered members London
