Our team has an immediate 12-month contract opening for a researcher.
About the team:
Cloud Native Data Engine team within Distributed Scheduling and Data Engine Lab, led by esteemed technical experts with extensive industry and academic experience, merge software development with cutting-edge industrial research in cloud database area. Our research currently focuses on cloud native database architecture (TaurusDB) and high-performance query and transaction processing (SQL Engine) in next-generation cloud infrastructure. Team publishes innovative research at leading conferences SIGMOD, VLDB, ICDE and recognized as key technology contributors in industry.
About the job:
This unique role combines software development with cutting-edge industrial research in databases, encompassing cloud-native database architecture (TaurusDB) and high-performance query and transaction processing (GaussDB SQL Engine) within next-generation cloud infrastructure.
Design, implement, and maintain database architectures for machine learning workloads, ensuring efficient data management and optimized performance.
Research and stay updated on emerging trends in database technology and machine learning to propose innovative solutions that improve system efficiency and capability.
Investigate and summarize state-of-the-art database technologies by reviewing the latest conference papers, attending workshops, and engaging with industry trends.
Assist in the implementation of AI-driven analytics and advanced features like vector search, similarity matching, and recommendation systems.
Actively pursue opportunities to invent and submit patents, as well as write papers in leading academic and industrial conference.
Requirements
About the ideal candidate:
1-3 years of strong programming skills in C/C++, with expertise in systems-level programming and debugging.
Deep understanding of cloud computing technologies, including cloud storage, distributed systems, parallel computing, and consistency protocols.
Experience working with machine learning frameworks (e.g., TensorFlow, PyTorch, scikit-learn) and understanding how they can be applied within database contexts.
Familiarity with MySQL, PostgreSQL, or other open-source databases — including knowledge of their internal mechanisms such as transaction management, storage engines, MVCC, SQL optimization, query execution, and vector execution — is considered an asset.
Familiarity with AI agents and practical experience in deployment, or experience integrating ML models into production databases or data pipelines, is considered an asset.
Experience with database extensions or ML-related plugins (e.g., pgvector for PostgreSQL); Preferably using modern AI accelerators, such as GPUs, NPUs, or TPUs.
Proven ability to conduct research and quickly learn new technologies and products.
A master’s or Ph.D. in Computer Science, Computer Engineering, Mathematics, or a related field is an asset.