포지션 상세
[We are looking for the best]
At 42dot, our AD ML Platform Engineers build the core data platform and ML training / eval platform for the cutting edge algorithms in autonomous driving. We develop the distributed system of a scalable data platform for large-scale dataset (millions of scenes), as well as high-performance data serving SDKs for ML model training / evaluation. The platforms we deliver could highly improve the efficiency of ML model development lifecycle, including training, evaluation, deployment, as well as monitoring in the cloud environment.
• Build up the data lakehouse for autonomous driving scene datasets, including the sensor data, calibration data, as well as annotation data
• Drive the Autonomous Driving Data SDK development, including scene data search, datasets preparation, dataset loading, etc.
• Dig into performance bottlenecks all along the data processing pipelines, from data processing latency, data search latency to Test Procedure (TP) coverage.
• Bootstrap and maintain infrastructure for Data Platform components—Data Processing Pipeline, Database, Data Lakehouse and Data Serving.
• Collaborate with cross-functional teams, including ML algorithm, ML application, and Cloud Infra to align ML Platforms with overall Autonomous Driving System Architecture.
• Minimum of 7 years of experience in Data Engineering or ML Platform roles
• Expert-level proficiency in Python and solid experience in Python SDK development
• Solid working experience in Databases (e.g., MongoDB, PostgreSQL, etc)
• Strong understanding of modern AI frameworks (e.g., PyTorch, TensorFlow etc.), especially the principle of distributed data loader for model training
• Hands-on experience with data pipeline job orchestration with Databricks Workflows or Apache Airflow, as well as integrating data pipelines with machine learning models
• Extensive experience with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake)
• Experience with Apache Spark or other big data computing engines
• Excellent leadership and communication skills, with a demonstrated ability to lead technical projects
At 42dot, our AD ML Platform Engineers build the core data platform and ML training / eval platform for the cutting edge algorithms in autonomous driving. We develop the distributed system of a scalable data platform for large-scale dataset (millions of scenes), as well as high-performance data serving SDKs for ML model training / evaluation. The platforms we deliver could highly improve the efficiency of ML model development lifecycle, including training, evaluation, deployment, as well as monitoring in the cloud environment.
주요업무
• Set technical strategy and oversee development of high scale, reliable data platform to manage, visualize and serve large-scale datasets forML model training and validation.• Build up the data lakehouse for autonomous driving scene datasets, including the sensor data, calibration data, as well as annotation data
• Drive the Autonomous Driving Data SDK development, including scene data search, datasets preparation, dataset loading, etc.
• Dig into performance bottlenecks all along the data processing pipelines, from data processing latency, data search latency to Test Procedure (TP) coverage.
• Bootstrap and maintain infrastructure for Data Platform components—Data Processing Pipeline, Database, Data Lakehouse and Data Serving.
• Collaborate with cross-functional teams, including ML algorithm, ML application, and Cloud Infra to align ML Platforms with overall Autonomous Driving System Architecture.
자격요건
• Bachelor's degree or higher in Computer Science, Engineering, Robotics, or a similar technical field.• Minimum of 7 years of experience in Data Engineering or ML Platform roles
• Expert-level proficiency in Python and solid experience in Python SDK development
• Solid working experience in Databases (e.g., MongoDB, PostgreSQL, etc)
• Strong understanding of modern AI frameworks (e.g., PyTorch, TensorFlow etc.), especially the principle of distributed data loader for model training
• Hands-on experience with data pipeline job orchestration with Databricks Workflows or Apache Airflow, as well as integrating data pipelines with machine learning models
• Extensive experience with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake)
• Experience with Apache Spark or other big data computing engines
• Excellent leadership and communication skills, with a demonstrated ability to lead technical projects



