Ensuring Exclusive Sub-Task Execution in Multiple Data Pipelines

In modern data engineering, it’s common to have multiple data pipelines running concurrently. However, when these pipelines share a common sub-task, it becomes crucial to ensure that the sub-task does not run simultaneously across multiple pipelines. This can prevent data corruption, race conditions, and other issues related to concurrent executions. To address this challenge, a… Continue reading Ensuring Exclusive Sub-Task Execution in Multiple Data Pipelines

Lessons on photography from the movie “Civil War”

This is an iconic scene: a young journalist uses an older Nikon FE2 inherited from her father, while another professional uses a Sony A7 series camera. Notice their left hands; manual focus allows them to quickly capture the decisive moment. The director leverages this motif multiple times throughout the movie to illustrate the growth of… Continue reading Lessons on photography from the movie “Civil War”

Creating Read-Only External Table in Unity Catalog by Using Existing Delta Table in Azure Storage Account

In this tutorial, we’ll walk through the steps to create a read-only external table in Azure Databricks using an existing Delta table stored in an Azure Storage Account. This allows you to query the data in the Delta table without needing to copy it into your Databricks cluster. Prerequisites: Steps: 1. Create Access Connector for… Continue reading Creating Read-Only External Table in Unity Catalog by Using Existing Delta Table in Azure Storage Account

Decoding A24’s Rise: A Blueprint for Indie Success

In the vast ocean of cinema, where titans dominate the waves, A24 has emerged as a lighthouse of indie brilliance. This Vox’s video clip helps us dive into how this underdog studio carved out its niche, turning heads and capturing hearts along the way. A24’s Recipe for Success A Trophy Case That’s Enviable [00:00:16] Right… Continue reading Decoding A24’s Rise: A Blueprint for Indie Success

A simple mistaken occurred leveraging spark in python multiprocessing

Look at this snippet first: It looks fine at the first glance. However, after the validation, the output was incomplete in delta table. At the end, the issue happens in the df_test, which is not a local variable in a function. So, when it ran as multicore, df_test was overwritten. The best way to avoid… Continue reading A simple mistaken occurred leveraging spark in python multiprocessing