Context
FinTech Inc., a rapidly growing financial services platform, processes over 1 million transactions daily. Currently, the company uses a traditional ETL pipeline that relies on direct database queries, which has resulted in performance bottlenecks and data quality issues. To improve efficiency and reliability, the data engineering team is tasked with redesigning the ETL process using Entity Framework (EF) to manage high-volume transactions effectively.
Scale Requirements
- Throughput: Handle 1 million transactions per day, averaging ~12 transactions per second.
- Latency: Ensure that data is available for reporting within 10 minutes of transaction completion.
- Storage: Store transactional data in SQL Server, requiring efficient indexing for fast retrieval.
Requirements
- Implement an ETL pipeline using Entity Framework to extract data from various sources (APIs, flat files).
- Transform data according to business rules, including validation and enrichment processes.
- Load data into SQL Server with upsert functionality to handle duplicates efficiently.
- Ensure data quality checks are in place to validate data integrity before loading.
- Include automated monitoring and alerting for data pipeline performance and quality metrics.
Constraints
- Infrastructure: Existing SQL Server setup with limited resources (8 vCPUs, 32GB RAM).
- Budget: Maintain operational costs under $5,000/month.
- Compliance: Adhere to financial regulations requiring transaction audit trails.