View All Jobs 4180

Data Engineer – Spark Specialist

Assist clients in optimizing Spark-based distributed data processing architectures
ParisBerlinAmsterdamMadridLondon
Senior
19 hours agoBe an early applicant
Dataiku

Dataiku

Provides an end-to-end collaborative data science and machine learning platform for building, deploying, and managing AI and analytics projects.

Data Engineer – Spark Specialist

Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. Providing no-, low-, and full-code capabilities, Dataiku meets teams where they are today, allowing them to begin building with AI using their existing skills and knowledge.

About the Role

Dataiku is looking for a Data Engineer specialized in Spark (PySpark) to join our Field Engineering team. In this role, you will work closely with our clients to troubleshoot and optimize complex data pipelines within the Dataiku platform. This includes both reactive support (advanced issues reported via the support portal) and proactive services (performance reviews and architecture advisory missions we propose to clients).

You will serve as a technical expert in data processing, leveraging SQL and Python frameworks. You will specialize in Spark-based distributed data processing and lakehouse architecture. You will help our clients succeed, whether working with SQL-based workflows, processing data on Kubernetes, Databricks, or other modern data platforms.

What You'll Do

  • Help customers design, build, and optimize Flows in Dataiku, improving overall project performance and maintainability
  • Debug and enhance complex Spark code and data pipelines for better performance and reliability.
  • Guide clients in tuning and scaling Spark environments, such as Kubernetes and Databricks, including providing architectural guidance and best practices to enhance performance and reliability.
  • Optimize SQL-based data pipelines to ensure efficient and robust data workflows within Dataiku.
  • Advise clients on integrating different data pipelines (Spark, SQL, Python) into optimized solutions
  • Collaborate with internal teams to resolve technical issues and contribute to the knowledge base.

Who You Are

You have deep hands-on experience building, debugging, and tuning Spark pipelines in production environments. Specifically, you have:

Spark & PySpark Expertise

  • Proficiency in writing and debugging PySpark code for large-scale data processing.
  • Experience with Parquet, Delta Lake, and columnar file formats.
  • Understanding of Spark's interaction with metastores (e.g., Hive, Unity Catalog).
  • Deep understanding of resource management: Spark executors, cores, memory, and relevant configurations (e.g., spark.executor.memory, spark.sql.shuffle.partitions).
  • Expertise in tuning Spark jobs: partitioning, caching, broadcast joins, and avoiding unnecessary shuffles.

Lakehouse & Orchestration

  • Familiarity with lakehouse architectures and ACID-compliant data layers (Delta Lake, Iceberg, Hudi).
  • Experience working with Databricks, including Databricks Connect and Databricks Workflows.
  • Experience automating and scheduling Spark jobs using tools like Apache Airflow or native orchestration tools.

Core Data Engineering Skills

  • Proven experience developing, optimizing, and troubleshooting SQL-based data pipelines for efficient ETL and data transformation processes.
  • Proficiency in building and managing data transformation workflows in Python, leveraging frameworks such as pandas.
  • Familiarity with data modeling concepts and data quality best practices.
  • Experience integrating data from a variety of sources, including databases, APIs, and cloud storages.
  • Ability to communicate technical concepts effectively to both technical and non-technical stakeholders.

What does the hiring process look like?

  • Initial call with a member of our Technical Recruiting team
  • Video call with the Field Engineer Hiring Manager
  • Technical Assessment to show your skills (Home Test)
  • Debrief of your Tech Assessment with FE Team members
  • Final Interview with the VP Field Engineering
+ Show Original Job Post
























Data Engineer – Spark Specialist
ParisBerlinAmsterdamMadridLondon
Software
About Dataiku
Provides an end-to-end collaborative data science and machine learning platform for building, deploying, and managing AI and analytics projects.