Microsoft is seeking Data Research Engineers to join their Multimodal team, focused on building next-generation foundation models across various modalities. The role involves collaborating with scientists and engineers to curate and analyze multimodal data, developing data collection strategies, and ensuring dataset quality aligns with ethical standards.
Responsibilities
• Create high-quality datasets for training and evaluation; run experiments on new datasets (data ablations) to assess their impact and determine the most effective data
• Develop and maintain scalable data pipelines for multimodal ingestion, preprocessing, filtering, and annotation
• Analyze real-world multimodal datasets to assess quality, diversity, relevance, and identify areas for improvement
• Build lightweight tools and workflows for dataset auditing, visualization, and versioning
• Collaborate with Safety, Ethics, and Governance teams to ensure datasets meet standards for quality, privacy, and responsible AI practices
• Embody our culture and values
Skills
• Bachelor's Degree in AI, Computer Science, Data Science, Statistics, Physics, Engineering, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, Python and common data libraries (Pandas, NumPy, etc.) OR equivalent experience.
• Master's Degree in in AI, Computer Science, Data Science, Statistics, Physics, Engineering, or related technical discipline AND 8+ years technical engineering experience with coding in languages including, but not limited to, Python and common data libraries (Pandas, NumPy, etc.) OR Bachelor's Degree in AI, Computer Science, Data Science, Statistics, Physics, Engineering, or related technical discipline AND 12+ years technical engineering experience with coding in languages including, but not limited to, Python and common data libraries (Pandas, NumPy, etc.) OR equivalent experience.
• 2+ years of experience in data analysis or data engineering, including work with large-scale datasets that are unstructured or semi-structured.
• Proficiency in statistics and exploratory data analysis methods.
• Familiarity with data processing frameworks such as Spark, Ray, or Apache Beam.
• Ability to communicate technical findings clearly to research and product teams.
Benefits
• Certain roles may be eligible for benefits and other compensation.
Company Overview
• Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services. It was founded in 1975, and is headquartered in Redmond, Washington, USA, with a workforce of 10001+ employees. Its website is https://www.microsoft.com.
Company H1B Sponsorship
• Microsoft has a track record of offering H1B sponsorships, with 7425 in 2025, 9343 in 2024, 7677 in 2023, 11403 in 2022, 7210 in 2021, 7852 in 2020. Please note that this does not guarantee sponsorship for this specific role.