View All Jobs 1594

Machine Learning Engineer - Distributed Training

Develop and optimize Ray’s distributed training libraries for large-scale machine learning workloads.
San Francisco, California, United States
Senior
$170,112 - 237,000 USD / year
2 weeks ago
Anyscale

Anyscale

Ray is the most popular open source framework for scaling and productionizing AI workloads.

✨ About The Role

- The role involves developing scalable and fault-tolerant distributed machine learning libraries that support leading ML platforms. - You will be responsible for creating an exceptional end-to-end experience for training machine learning models. - The position requires solving complex architectural challenges and transforming them into practical solutions. - Collaboration with the open-source community, including ML researchers and engineers, is a key aspect of the job. - The role also includes working directly with end-users to enhance the product based on their feedback.

⚡ Requirements

- The ideal candidate will have over 5 years of experience in building, scaling, and maintaining software systems in production environments. - A strong foundation in algorithms, data structures, and system design is essential for success in this role. - Proficiency with popular machine learning frameworks and libraries such as PyTorch, TensorFlow, and XGBoost is required. - Experience in designing fault-tolerant distributed systems will be a significant advantage. - Candidates with a background in managing and maintaining open-source libraries will be highly regarded.
+ Show Original Job Post
























Machine Learning Engineer - Distributed Training
San Francisco, CA
$170,112 - 237,000 USD / year
Software
About Anyscale
Ray is the most popular open source framework for scaling and productionizing AI workloads.