A simple mistaken occurred leveraging spark in python multiprocessing

Look at this snippet first: It looks fine at the first glance. However, after the validation, the output was incomplete in delta table. At the end, the issue happens in the df_test, which is not a local variable in a function. So, when it ran as multicore, df_test was overwritten. The best way to avoid… Continue reading A simple mistaken occurred leveraging spark in python multiprocessing

Share some useful/special MS SQL tips as a data engineer

If you are a data scientist, you maybe never need to do the data preprocess work, like ETL/ELT, performance tunning or OLTP database design. Everything is already prepared in the structured data warehouse or flat file, it is beauty and nice. Regarding to data quality, all a data scientist need to do is handle some… Continue reading Share some useful/special MS SQL tips as a data engineer

Learn Django with me(part 3)

Handle view and templates View consist of a set of functions which handle the different url request with the specific url pettarns. And it returns either of HttpResponse or Http404. Firstly, let’s update webapp/views.py: from django.shortcuts import render from .models import Question def index(request): # get the lastest 5 questions latest_question_list = Question.objects.order_by(‘-pub_date’)[:5] # create… Continue reading Learn Django with me(part 3)

Singleton Pattern in Java, Python and C++

An implementation of the singleton pattern must: ensure that only one instance of the singleton class ever exists class creates its own singleton pattern instance provide global access to that instance. Typically, this is done by: declaring all constructors of the class to be private providing a static method that returns a reference to the… Continue reading Singleton Pattern in Java, Python and C++

Efficiency of different programming languages

Most likely every programmer knows python is a low efficiency language, but how slow it is? See this picture: The author compared virtually all languages considering three variables: energy consumption, memory consumption and execution time. Java and C are doing very well in energy and time. As to python, the numbers are not ideal as… Continue reading Efficiency of different programming languages