Senior Machine Learning Engineer, Computer Vision - Robotics
Scale's Robotics business unit is dedicated to solving the data bottleneck in Physical AI. This position will be a key contributor in conducting applied research in Robotics and developing ML pipelines for training and fine-tuning on data collected by Scale. In this role, you will have the opportunity to advance Robotic research, shape Scale's robotics offerings, and expand the frontier of Robotics data and model evaluation.
We are seeking an exceptionally motivated and experienced Senior Machine Learning Engineer, Computer Vision to drive cutting-edge research and development in real-time and offline 2D and 3D algorithms. The successful candidate will be a hands-on technical leader responsible for translating complex computer vision algorithms from research papers into robust, production-ready systems that power our next-generation products.
This role requires a deep theoretical background combined with substantial practical experience working with spatial and temporal data.
You will:
- Pioneer Core CV Algorithms: Lead the research, design, and implementation of novel computer vision and deep learning algorithms, with a specialized focus on 2D and 3D data (e.g point clouds).
- Focus Area Expertise: Drive innovation in key perception areas, including:
- 3D Reconstruction and SLAM: Advanced techniques for real-time 3D mapping, pose estimation, and environmental modeling from multi-modal sensor inputs (e.g., RGB-D, LiDAR).
- Hand/Body Tracking: Developing robust and precise models for hand pose estimation, gesture recognition, and full-body tracking under various lighting and occlusion conditions.
- Object Detection and Tracking (MOT/SOT): Designing high-performance deep learning models for accurate detection and persistent tracking of objects and people in video streams.
- Video Processing: Creating algorithms for temporal feature extraction, video-based action recognition, and motion analysis.
- Model Optimization: Optimize computationally intensive models for deployment on edge devices (low power, low latency) and/or large-scale cloud infrastructure.
- Technical Leadership: Serve as the subject matter expert in Computer Vision, providing technical direction and mentorship to junior engineers and cross-functional teams.
- Publication & IP: Maintain state-of-the-art knowledge, evaluate recent academic publications (e.g., CVPR, ICCV, ECCV), and drive the filing of patents and publication of novel research.
- Cross-Functional Partnering: Collaborate closely with Software Engineering, Product, and Hardware teams to define requirements, integrate vision systems, and ensure solutions meet performance targets.
You have:
- Ph.D. in Computer Science, Computer Engineering, or a related quantitative field (Mathematics, Electrical Engineering, etc.) OR a Master's degree with 4+ years of equivalent professional experience in an applied research setting.
- 5+ years of hands-on experience in algorithm development for 2D/3D computer vision and deep learning.
- Deep Learning Frameworks: Expert proficiency in at least one major deep learning framework (PyTorch, TensorFlow or Jax).
- Programming: Mastery of Python for machine learning and strong proficiency in C++ for performance-critical algorithm implementation.
- 2D/3D Fundamentals: In-depth knowledge of classical and modern computer vision fundamentals, including multi-view geometry, projective geometry, camera calibration, and 3D graphics/rendering principles.
- Building real-time and batch ML systems that analyze structured and unstructured signals
- Hands-on experience rapidly prototyping and iterating on ML systems with changing requirements
Nice to haves:
- Deep Learning Frameworks: Expert proficiency in at least one major deep learning framework (PyTorch, TensorFlow or Jax).
- Programming: Mastery of Python for machine learning and strong proficiency in C++ for performance-critical algorithm implementation.
- 2D/3D Fundamentals: In-depth knowledge of classical and modern computer vision fundamentals, including multi-view geometry, projective geometry, camera calibration, and 3D graphics/rendering principles.
- Building real-time and batch ML systems that analyze structured and unstructured signals
- Hands-on experience rapidly prototyping and iterating on ML systems with changing requirements
The base salary range for this full-time position in the location of San Francisco is:
$218,400 - $273,000 USD