Blog

Porting Snowplow to Microsoft Azure

By
Snowplow Team
&
March 15, 2024
Share this post

With the release of open-source Azure support, Snowplow now empowers data teams to run their event data pipelines natively on Microsoft Azure. This post outlines the evolution of Snowplow's Azure implementation, the core Azure services used, and the strategic considerations for adopting Snowplow on Azure.

Q: Why port Snowplow to Microsoft Azure?

Porting Snowplow to Azure aligns with our broader goal of platform independence. Azure is a popular choice for industries like finance, manufacturing, and gaming, and it offers unique services like Data Lake Analytics and Event Hubs. Key benefits of Snowplow on Azure include:

  • Broader Adoption: Enable organizations committed to Azure to deploy Snowplow without AWS dependencies.

  • Pipeline Portability: Allow users to move Snowplow pipelines seamlessly across cloud platforms.

  • Hybrid Pipelines: Combine best-in-class services from multiple clouds (e.g., Azure Event Hubs and AWS Lambda).

Q: What Azure services are used in the Snowplow architecture?

The Snowplow pipeline on Azure leverages the following services:

Snowplow Component / AWS Service / Azure Service / Purpose

Unified Log / Amazon Kinesis / Event Hubs / Cloud-scale data ingestion

Storage / S3 / Azure Blob Storage / Object storage for raw events

Event Collection / Elastic Beanstalk / VM Scale Sets / Scalable collection infrastructure

Event Enrichment / EMR /HDInsight / Batch processing and enrichment

Data Warehouse / Redshift / Azure SQL Data Warehouse / Data modeling and analytics

Data Lake / S3 / Azure Data Lake Store / Massive-scale data storage

Q: What are the architectural design principles for Snowplow on Azure?

  • Native Integration: Use Azure-specific services where possible, such as Event Hubs for streaming and Data Lake Store for enriched data.

  • Scalability: Implement VM Scale Sets for flexible scaling and Azure SQL Data Warehouse for analytical workloads.

  • Data Separation: Store raw events in Blob Storage and enriched data in Data Lake Store to optimize costs and performance.

Q: How does Azure Data Lake Analytics enhance Snowplow’s data processing?

Azure Data Lake Analytics (ADLA) allows for powerful, on-demand queries using U-SQL. Benefits include:

  • Massive Scale Processing: Query terabytes of Snowplow data without setting up extensive infrastructure.

  • Data Integration: Integrate C#, Python, or R scripts within U-SQL for custom processing.

  • Event-Level Analysis: Analyze enriched event streams stored in Data Lake Store using advanced analytics.

Q: What are the deployment phases for Snowplow on Azure?

  1. Phase 1 - Real-Time Pipeline: Set up Event Hubs for streaming data ingestion, Blob Storage for raw events, and Data Lake Analytics for enriched event processing.

  2. Phase 2 - Batch Processing: Implement HDInsight for batch enrichment of Snowplow data.

  3. Phase 3 - Data Warehousing: Load processed data into Azure SQL Data Warehouse for advanced analytics.

Q: What are the key considerations for migrating from AWS to Azure?

  • Data Format Compatibility: Ensure data format consistency, particularly for streaming and data lake storage.

  • Cost Management: Utilize Azure's auto-scaling features to optimize compute costs in SQL Data Warehouse and HDInsight.

  • Service Availability: Confirm the availability of Azure services in the desired region before deployment.

Final Thoughts

With native Azure support, Snowplow users can now build powerful, scalable data pipelines using Azure’s extensive service ecosystem. By leveraging Azure Event Hubs, Data Lake Store, and SQL Data Warehouse, data teams can deploy Snowplow pipelines tailored to Azure while maintaining data consistency across multi-cloud architectures.

Subscribe to our newsletter

Get the latest content to your inbox monthly.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.