Overview
The client needed to bring Google BigQuery data into Azure Data Lake Gen2 so it could be centralized with other enterprise data sources. The goal was to create a recurring pipeline that could extract BigQuery datasets and make them available for reporting and analytics within the Azure ecosystem.
Challenge
- Move BigQuery datasets into Azure Data Lake Gen2 on a recurring schedule.
- Avoid repeatedly scanning and copying unnecessary data from large tables.
- Handle secure cross-cloud authentication.
Solution
MSPowerhouse designed an Azure Data Factory pipeline using the Google BigQuery connector and service account authentication. The solution included an initial full load pattern and a recommended incremental load strategy for recurring schedules.
Where BigQuery tables were small, full reloads could be acceptable. For larger tables, MSPowerhouse recommended partition-aware or watermark-based incremental extraction to avoid repeatedly scanning and moving unnecessary data.
Technical Execution
- Azure Data Factory Google BigQuery connector.
- Google service account authentication.
- Secure key handling through Key Vault where appropriate.
- Copy Activity into Azure Data Lake Gen2.
- Raw landing by dataset, table, and run date.
- Table sizing review through BigQuery metadata.
- Incremental load planning using partition fields or modified timestamps.
- Parquet or structured output options for reporting efficiency.
- Scheduled pipeline triggers.
- Monitoring for pipeline failures and data volume.
Outcome
The client received a practical pattern for bringing BigQuery data into Azure Data Lake Gen2 while planning for recurring refreshes and cost control.
Impact
This project helped the client connect Google's data platform with Azure's analytics ecosystem and created a cost-aware path for recurring data ingestion.



