포지션 상세
[We are looking for the best]
At 42dot, our AD ML Platform Engineers build the core data platform and ML training / eval platform for the cutting edge algorithms in autonomous driving. We develop the distributed system of a scalable data platform for large-scale dataset (millions of scenes), as well as high-performance data serving SDKs for ML model training / evaluation. The platforms we deliver could highly improve the efficiency of ML model development lifecycle, including training, evaluation, deployment, as well as monitoring in the cloud environment.
• Develop advanced autonomous driving data SDK, including scene data search, datasets preparation, dataset loading, etc.
• Build up the data lakehouse for autonomous driving scene dataset, including the sensor data, calibration data, as well as annotation data
• Dig into performance bottlenecks all along the data processing pipelines, from data processing latency, data search latency to Test Procedure (TP) coverage.
• Bootstrap and maintain infrastructure for data platform components—data processing pipeline, database, data lakehouse and data serving.
• Collaborate with cross-functional teams, including ML algorithm, ML application, and Cloud Infra to align ML Platforms with overall autonomous driving system architecture.
• Minimum of 5 years of experience in Data Engineering or ML Platform roles
• Proficient in Python and solid experience in Python SDK development
• Solid working experience in Databases (e.g., MongoDB, PostgreSQL, etc)
• Hands-on experience with data pipeline job orchestration with Databricks Workflows or Apache Airflow, as well as integrating data pipelines with machine learning models
• Extensive experience with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake)
• Experience with Apache Spark or other big data computing engines
At 42dot, our AD ML Platform Engineers build the core data platform and ML training / eval platform for the cutting edge algorithms in autonomous driving. We develop the distributed system of a scalable data platform for large-scale dataset (millions of scenes), as well as high-performance data serving SDKs for ML model training / evaluation. The platforms we deliver could highly improve the efficiency of ML model development lifecycle, including training, evaluation, deployment, as well as monitoring in the cloud environment.
주요업무
• Develop a high scale, reliable data platform to manage, visualize, search and serve large-scale datasets for ML model training, fine tune and validation.• Develop advanced autonomous driving data SDK, including scene data search, datasets preparation, dataset loading, etc.
• Build up the data lakehouse for autonomous driving scene dataset, including the sensor data, calibration data, as well as annotation data
• Dig into performance bottlenecks all along the data processing pipelines, from data processing latency, data search latency to Test Procedure (TP) coverage.
• Bootstrap and maintain infrastructure for data platform components—data processing pipeline, database, data lakehouse and data serving.
• Collaborate with cross-functional teams, including ML algorithm, ML application, and Cloud Infra to align ML Platforms with overall autonomous driving system architecture.
자격요건
• Bachelor's degree or higher in Computer Science, Engineering, Robotics, or a similar technical field.• Minimum of 5 years of experience in Data Engineering or ML Platform roles
• Proficient in Python and solid experience in Python SDK development
• Solid working experience in Databases (e.g., MongoDB, PostgreSQL, etc)
• Hands-on experience with data pipeline job orchestration with Databricks Workflows or Apache Airflow, as well as integrating data pipelines with machine learning models
• Extensive experience with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake)
• Experience with Apache Spark or other big data computing engines



