Remote

Data Engineer

We’re looking for a Data Engineer to build robust data pipelines, manage large-scale datasets, and power Quantix’s predictive intelligence and market analytics systems with clean, reliable, real-time data.

About the position

Quantix is seeking a Data Engineer to architect and maintain the data infrastructure that powers our machine learning models, analytics engines, and intelligence systems. You will design real-time pipelines, optimize data flows, and ensure the availability, quality, and performance of large market and on-chain datasets. This role is essential to building a reliable foundation for AI-driven insights across institutional finance.

Job responsibilities

Build and maintain scalable data pipelines for ingestion, transformation, and storage of large multi-source datasets.
Design workflows for real-time and batch data processing, enabling fast and reliable model access to clean data.
Develop ETL/ELT systems that handle market data, on-chain data, alternative datasets, and metadata efficiently.
Optimize database schemas, query performance, and storage architectures for analytical workloads.
Work closely with ML and research teams to ensure datasets are structured and enriched for modelling.
Implement data validation, quality checks, and monitoring systems to maintain consistency and reliability.
Manage data pipelines across cloud environments, ensuring security, scalability, and high availability.
Build internal tools and APIs that improve data accessibility and streamline research workflows.
Troubleshoot pipeline issues, performance bottlenecks, and latency challenges in production environments.

Required skills

Strong proficiency in Python, SQL, and data processing frameworks.
Experience with ETL/ELT pipelines, workflow orchestration, and data transformation tools.
Familiarity with distributed data systems such as Spark, Flink, Dask, or similar frameworks.
Solid understanding of databases (PostgreSQL, BigQuery, Snowflake, or similar).
Experience handling large-scale datasets, especially time-series or structured financial data.
Knowledge of cloud platforms (AWS, GCP, Azure) and scalable storage systems (S3, GCS, etc.).
Ability to work with real-time streaming tools such as Kafka, Pub/Sub, or Kinesis (preferred).
Strong understanding of data integrity, quality checks, and pipeline observability.
Bonus: experience with on-chain data, blockchain analytics, or API/data source integrations.