Computer Science • Programming
Recently we get the work to improve the ETL efficiency. The latency mostly happen in I/O and file translation, since […]
Computer Science • Programming
Recently we get the work to improve the ETL efficiency. The latency mostly happen in I/O and file translation, since […]
Before we start to talk about delta lake, we have to take time to deal with data lake and understand […]
As an amateur photographer, I was believing DSL is better than phone camera since it has a much larger CMOS […]
Computer Science • Machine Learning
YOLO also know as You Only Look Once. Not like R-CNN, YOLO uses single CNN to do the object detection […]
Computer Science • Machine Learning
Spark provides spark MLlib for machine learning in a scalable environment. MLlib includes three major parts: Transformer, Estimator and Pipeline. […]
— version 1.0: initial @20190428– version 1.1: add image processing, broadcast and accumulator– version 1.2: add ambiguous column handle, maptype […]
Essentially, ADF(Azure Data Factory) only takes responsibility of Extraction and Load in the ELT pipeline. Then we have to use […]
Computer Science • ETL&DW • Programming
If you are a data scientist, you maybe never need to do the data preprocess work, like ETL/ELT, performance tunning […]
Computer Science • Machine Learning
— version 1@20190401 –version 2@20190402: change to category to 2 Today I tried a text classification task where the data […]
Computer Science • Visualization
— 03/26/2019 version 0.1 In the field of BI tools, the three wildly used are Tableau, PowerBI and Qlik. According […]