✨ About The Role
- The role involves pioneering synthetic and hybrid data creation and post-training research, focusing on the science of data.
- Interns will work on innovative research frameworks to enhance post-training data pipelines and evaluation methods for large language models (LLMs).
- The position includes studying model generalization and capabilities to inform data-driven advancements.
- Interns will engage in research on synthetic data and hybrid data generation with human involvement to improve data quality.
- The job entails investigating strategies to refine data pipelines for model improvement and developing advanced evaluation methodologies for model performance assessment.
âš¡ Requirements
- The ideal candidate is currently enrolled in a BS, MS, or PhD program with a focus on Machine Learning, Deep Learning, Natural Language Processing, or Computer Vision.
- A successful applicant will have a graduation date in Fall 2025 or Spring 2026, indicating they are in the later stages of their academic program.
- Prior research experience or a track record of publications in relevant fields such as LLMs, NLP, or multimodal agents is essential.
- Proficiency in one or more general-purpose programming languages, particularly Python or JavaScript, is required.
- The candidate should possess strong communication skills, with the ability to speak and write fluently in English.