Since version 2019.3, Tableau starts to support databricks with the native connection driver. So we don’t have to use some […]
Category: Big Data
Parquet format is a very good choice for big data. It is fast, small and distributed. But not all software […]
For a long term, I thought there was no pipeline concept in Databricks. For the most engineers they will write […]
loT + AI + Cloud = Edge computing. This is my understanding of edge computing. As the raising of loT, […]
Before we start to talk about delta lake, we have to take time to deal with data lake and understand […]
— version 1.0: initial @20190428– version 1.1: add image processing, broadcast and accumulator– version 1.2: add ambiguous column handle, maptype […]
Essentially, ADF(Azure Data Factory) only takes responsibility of Extraction and Load in the ELT pipeline. Then we have to use […]
ETL is the most common tool in the process of building EDW, of course the first step in data integration. […]
Purpose Using pyspark to help analysis the situation of global warming. The data is from NCDC(http://www.ncdc.noaa.gov/) through 1980 to 1989 […]