One day we found databricks DBU cost surged, but we didn’t know which ADF job caused this issue. Then I asked Azure support if any way we can track to the pipeline level. Unfortunately, they told me “ Regarding your question, please note that you can only track cost against the resource group or workspace(cluster). Unfortunately, there is no way to track Databricks cost at a job or user level.” Ok, that’s fine.
However, when I opened the cost details which exported from cost management in Azure, I found a tag column in DBU resource usage.
"ClusterId": "1210-XXXX-XXXX","DatabricksInstancePoolCreatorId": "XXXXXX","DatabricksInstancePoolId": "1025-204618-XXXXX-pool-XXXXXX","ClusterName": "job-8810145-run-1","JobId": "8810145","RunName": "ADF_df-eus-prod-01_Standard_Module_run_pipeline_935e1d13-02c5-4ae3-9441-3a414d1ad0eb","Creator": "XXXX@XXXX.com","Vendor": "Databricks","DatabricksInstanceGroupId": "-8346123XXXXX6640367"
Are these our databricks information? Maybe we can do something !? After consulting from my coworker Sai, he told me the JobID is similar to notebook run ID. Then everything could be solved since we already records everything when running notebook including ADF pipeline Id, cust information, notebook name, notebook Runid(jobId) and start-end time.
By simple SQL query to join cost csv table and log table, we can category DBU cost to each ADF pipeline.
select ProductName,t2.notebook_name,sum(t1.PreTaxCost) from cost_df t1 left join (select distinct notebook_name,notebook_runID,cust_name from delta.`/mnt/eus-logging/job-log-details` ) t2 on t2.notebook_runID=get_jobid(Tags) where ProductName='Azure Databricks - Premium - Jobs Compute - DBU' group by ProductName,t2.notebook_name
To go through this process, you have to log the ADF pipeline and notebook runtime information when you execute the notebook. I was doing this by add following function at the start/end of notebook.
# at beginning notebook_name = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get() notebook_RunID = getRunID() # this is function in /Public/JOB_LOG write_log_detail(ADFRunID,cust_id,cust_name,notebook_name,notebook_RunID,startTime=str(datetime.now())) # notebook start time # at the end write_log_detail(ADFRunID,cust_id,cust_name,notebook_name,notebook_RunID,endTime=str(datetime.now())) # notebook end time
Here function write_log_detail is the one writing into delta table.
Unfortunately, I have not found the way to break down the VM cost into pipeline level, I think there should be a way. Just need be more time. will update later.