Learn Django with me(part 2)

Change Database Setting

Open up mysite/settings.py, find snippet as blew:

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
}
}

Here you can change your database to others if needed
* ENGINE – Either ‘django.db.backends.sqlite3’, ‘django.db.backends.postgresql’, ‘django.db.backends.mysql’, or ‘django.db.backends.oracle’.
* NAME – Name of database

We can aslo change the time zone in the setting file

TIME_ZONE = 'America/Chicago'

To create related tables in the database, we need to execute
python manage.py migrate. It will create tables following by INSTALLED_APPS in setting.py.

To read the recetly created tables in sqlite:

python manage.py dbshell
# into sqlite shell
.table

Thre result will like blew:

sqlite> .tables
auth_group                  auth_user_user_permissions
auth_group_permissions      django_admin_log
auth_permission             django_content_type
auth_user                   django_migrations
auth_user_groups            django_session

Create a model

In offical defination, model is the single, definitive source of truth about your data.. In my option, model is only data model in single place, rather than in database as well as in you codes.

Let copy this into webapp/models.py. We create two classes which are also two tables in the database. Each variable is a filename with its data type, such as models.CharField is type char, and models.DataTimeField is datatime. Here we can also figure out a ForeignKey in class Choice which points to Question.

from django.db import models

class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published')
def __str__(self):
return self.question_text

class Choice(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
def __str__(self):
return self.choice_text

To active model, we need to add config file into INSTALLED_APPS. webapp.apps.WebappConfig means calling WebappConfig in apps file in webapp folder.

INSTALLED_APPS = [
'webapp.apps.WebappConfig',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
]

Then we run makemigrations to create migration files.

python manage.py makemigrations webapp
# then you will see somehitng like the following:
Migrations for 'webapp':
webapp/migrations/0001_initial.py
- Create model Choice
- Create model Question
- Add field question to choice

We can find the migration opertaion in webapp/migrations/, we run python manage.py migrate, then we can find two new table in the database already(.schema {tablename})

Summary 3 steps for making model changes:

  • Change your models (in models.py).
  • Run python manage.py makemigrations to create migrations for those changes
  • Run python manage.py migrate to apply those changes to the database.

Play with Shell to add some records into db

To get into the shell, we need to execute

python manage.py shell

Then add some questions and choice

from webapp.models import Choice, Question
# show all question
Question.objects.all()

# add a new question
from django.utils import timezone
q = Question(question_text="What's new?", pub_date=timezone.now())

# save into database
q.save()
q.id()

# search in database, similar to where in sql
Question.objects.filter(id=1) # id=1
Question.objects.filter(question_text__startswith='What')
Question.objects.get(pk=1) # filter with pk

# add some choices, here Django creates a set to hold the "other side" of ForeignKey relation
q.choice_set.create(choice_text='Not much', votes=0)
q.choice_set.create(choice_text='The sky', votes=0)
q.choice_set.create(choice_text='The moon', votes=0)

# delete records
d = q.choice_set.filter(choice_text__startswith='The moon')
d.delete()

Django Admin

To create a admin with python manage.py createsuperuser, then system will ask you enter the username, email and password. After this, we can access admin website http://localhost:8000/admin/

If admin account want to add new question in the website, we need to add the follow snippet into admin.py

from django.contrib import admin

from .models import Question

admin.site.register(Question)

Some issue

When I tried to save question in the admin webpage, there poped out a issue like no such table: main.auth_user__old. Just marked here waiting to find the reason later.

Learn Django with me(part 1)

Although I touched python for a while, most of time I use it only for data analysis with panda or some machine learning packages. Django as one of most famous webframeworks has been existing for over 12 years. So, I decide to learn it step by step with the official tutorial and share my experience with you.

Prepare Django

# install django
sudo pip install Django
# build project
django-admin startproject {name of site}
# run test
# goto project folder, you will find manage.py, run
python manage.py runserver {port}

Fast explain some of files:

  • mange.py: A command-line utility that lets you interact with this Django project in various way.
  • {name of site}: python package for the project
  • mysite/__init__.py: An empty file that tells Python that this directory should be considered a Python package. If you’re a Python beginner, read more about packages in the official Python docs.
  • mysite/settings.py: Settings/configuration for this Django project. Django settings will tell you all about how settings work.
    mysite/urls.py: The URL declarations for this Django project; a “table of contents” of your Django-powered site. You can read more about URLs in URL dispatcher.
  • mysite/wsgi.py: An entry-point for WSGI-compatible web servers to serve your project. See How to deploy with WSGI for more details.

Create a new app

# add a new app
python manage.py startapp {name of app}
  • In each App folder, there are three important python files
  • urls.py: controls what is served based on url patterns
  • models.py: database structures and metadata
  • views.py: handles what the end-user “views” or interacts with

Then we need add the app into setting.py under the site folder {name of site}

INSTALLED_APPS = [
'{name of app}',
]

And update the urls.py under the same folder

from django.contrib import admin
from django.urls import path,include

urlpatterns = [
path('admin/', admin.site.urls),
path('webapp/', include('webapp.urls')),
]

We have Completed all files modification under the site folder. Then we go to app folder to create urls.py and change view.py.

# create a file name `urls.py` under app folder
touch urls.py

# copy this to the file, which directs the request to views.py
# path(route, view, kwargs,name)
# @route: URL pattern
# @view: function name to be called
# @kwargs: argument to be passed in a dictionary
# @name: refer URL with the name
from django.urls import path
from . import views
urlpatterns = [
path('', views.index, name='index'),
]

# change view.py as
from django.http import HttpResponse
def index(request):
return HttpResponse("Hello, world. You're at the polls index.")

After all these done, we can access webapp by http://localhost:8000/{name of app}/

Simple global warming analysis by pyspark

Purpose

Using pyspark to help analysis the situation of global warming. The data is from NCDC(http://www.ncdc.noaa.gov/) through 1980 to 1989 and 2000 to 2009(except 1984 and 2004).

Two Stretage

  • Get the max/min temperature and max wind speed only filter mistaken data(9999). Steps are as follows:
    • Load files into RDD: sc.textFile("/home/DATA/NOAA_weather/{198[0-3],198[5-9]}/*.gz")
    • Extract fields from files through map function: parse_record, return a key-value(tuple) data.
    • Filter 9999 data: .filter(lambda x: x[1]!=9999)
    • reducebyKey to get max or min data ( the key is year): .reduceByKey(lambda x,y: max(x,y)
  • Get the average temperature and avg wind speed by year, latitude and longitude of station which is a fixed land station.
    • Load files into RDD. Same as mapreduce
    • Load RDD to Dataframe. sqlContext.createDataFrame(all_fields,schema=["date","report_type","lat","lon","wind_speed","wind_qulity","temp"])
    • Filter error data(9999) and station type(FM-12) df.where((df['lat']!='+9999') & (df['lon']!='+9999') & (df['wind_speed']!=9999) & (df['temp']!=9999) & (df['report_type']=='FM-12'))
    • aggregate average by year, latitude and longitude:df.groupBy(['date',"lat","lon"]).agg({"wind_speed":"avg","temp":"avg"})

Result and visualization

  • the max/min temperature and max wind speed(based on stretage 1.)
year, max_temp(10x), min_temp(10x), max_wind_speed(10x)
1980,600,-780,617
1981,580,-850,618
1983,616,-931,618
1984,617,-932,618
1982,617,-930,700
1986,607,-901,607
1987,607,-900,602
1985,611,-932,618
1989,606,-900,900
2000,568,-900,900
1988,607,-900,618
2002,568,-932,900
2001,568,-900,900
2003,565,-900,900
2005,610,-925,900
2006,610,-917,900
2007,610,-900,900
2008,610,-900,900
2009,610,-854,900

These extreme data maybe happen in special area, like Antarctica or Sahara Desert.

  • the average temperature and avg wind speed by year, latitude and longitude(based on stretage 2.)
img
img

Through trend line chat, we can figure out the temperature in 2000 to 2009 is significantly higher(1.5-2.0℃) than in 1980 to 1989. Also the max wind speed is greater than mostly before.

Let’s create another chat, which shows the temperature difference between two decades.

img

The red color means weather becomes warm, otherwise weather becomes cold. In the most of countries, the red dots are more than cold ones. The most serious area is Europe.

Files

  • infor_extra.py: main file to execute in spark, output a single csv file in the folder named output1
  • weather_gap.py: caculate the gap between two decades, export to output.csv
  • weather_overview.twb and weather_gap.twb are two tableau files for visualization.
  • If out of memory or GC error happens, please squeeze the year range in infor_extra.pyrdd = sc.textFile("/home/DATA/NOAA_weather/{198[0-3],198[5-9]}/*.gz")

Code: https://github.com/neoaksa/HPC/tree/master/6.Global_Warming

Inside the Linux kernel

cite from: http://turnoff.us/geek/inside-the-linux-kernel/

enter image description here
1. The foundation is a file system. PID 421 is accessing files, while a watch dog is monitoring the whole file system.
2. On the first floor, we can find a process table, the “mom” is PID 1. PID 1341(apache http) for port 80, PID 52 for ssh(22), and nobody for expired FTP(21). Also there are some others, like wine, pipes, cron and a hacker closing to the watch dog. PID in this floor can go upstairs to IO or down stairs to access file system.
3. Go upstairs, two pids are responsible to input and output(tty)

Singleton Pattern in Java, Python and C++

An implementation of the singleton pattern must:

  1. ensure that only one instance of the singleton class ever exists
  2. class creates its own singleton pattern instance
  3. provide global access to that instance.

Typically, this is done by:

  1. declaring all constructors of the class to be private
  2. providing a static method that returns a reference to the instance.

Java

//Lazy loading(not recommend)
public class Singleton {  
    private static Singleton instance;  
    private Singleton (){}  
    public static synchronized Singleton getInstance() {  
    if (instance == null) {  
        instance = new Singleton();  
    }  
    return instance;  
    }  
}
//Non-lazy loading(recommend)
public final class Singleton {
    // create instance
    private static final Singleton INSTANCE = new Singleton();
    // set constructor as null, so that no instance will be created 
    private Singleton() {}
    // return this singletion instance 
    public static Singleton getInstance() {
        return INSTANCE;
    }
}
//enum(recommend)
public enum Singleton {  
    INSTANCE;  
    public void dummy() {  
    }  
}
// how to use
 SingletonEnum singleton = SingletonEnum.INSTANCE;
 singleton.dummy();

Python

// Lazy loading
class Singleton(object):
    def __new__(cls, *args, **kw):
        if not hasattr(cls, '_instance'):
            orig = super(Singleton, cls)
            cls._instance = orig.__new__(cls, *args, **kw)
        return cls._instance

class MyClass(Singleton):
    a = 1

//meta class
class Singleton(type):
    _instances = {}
    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
        return cls._instances[cls]

class MyClass(metaclass=Singleton):
     pass

//decorator
from functools import wraps

def singleton(cls):
    instances = {}
    @wraps(cls)
    def getinstance(*args, **kw):
        if cls not in instances:
            instances[cls] = cls(*args, **kw)
        return instances[cls]
    return getinstance

@singleton
class MyClass(object):
    a = 1

C++

#include <iostream>

class Singleton
{
    private:
        /* Here will be the instance stored. */
        static Singleton* instance;

        /* Private constructor to prevent instancing. */
        Singleton();

    public:
        /* Static access method. */
        static Singleton* getInstance();
};

/* Null, because instance will be initialized on demand. */
Singleton* Singleton::instance = 0;

Singleton* Singleton::getInstance()
{
    if (instance == 0)
    {
        instance = new Singleton();
    }

    return instance;
}

Singleton::Singleton()
{}

int main()
{
    //new Singleton(); // Won't work
    Singleton* s = Singleton::getInstance(); // Ok
    Singleton* r = Singleton::getInstance();

    /* The addresses will be the same. */
    std::cout << s << std::endl;
    std::cout << r << std::endl;
}

OpenCV 4.0 released!!!

OpenCV 4.0 has relased since more than 3 years after 3.0 release. As a most wildly used CV libary which can be used to object detection, image classification, moving object detection and human face detecion. It inculds lots of machine learning and state of art computer vision algorithms. Also, it supports bunch of coding languages, like C++, java, python…

Summary of Upgrade

  1. C++ 11 required. Lots of C API from 1.x have removed. CMake >= 3.5.1
  2. New module G-API has been added. It declares image processing task in form of expressions and then submit it for execution – using a number of available backends.
  3. More DNN module supported.
    3.1 Mask-RCNN model. Here is a guid using Tensorflow ojbect detection.
    3.2 Integrated ONNX(Open Neural Network Exchange ) parser.
  4. Kinect Fusion algorithm has been implemented and optimized for CPU and GPU (OpenCL)
  5. QR code detector and decoder
  6. high-quality DIS dense optical flow algorithm has been moved from opencv_contrib to the video module.
  7. Persistence (storing and loading structured data to/from XML, YAML or JSON) in the core module has been completely reimplemented in C++ and lost the C API as well.

MPI non-blocking receiving with OpenMP

A simple prototype

  1. In the cluster, a Master – Slave structure set up. Master node responses to send the task to slave nodes and regularly check the status of all slave nodes(busy or idle). Salve nodes response to split task from master into sub tasks running with multi-threads.
  2. Master node assigns tasks by iterating each row of the first column in the lattice. If all slave nodes are busy, master will waiting for feedback from slave nodes, otherwise master node will send the new task to the idle slave node.
  3. Master node uses non-blocking method(Irecv) to get the feedback from slave node. So that master node is able to check status of all slave nodes as well as receive feedback.
  4. Slave node splits task into subtask by iterating each row of the second column in the lattice, which is running in a dynamic schedule looping. So that each thread can keep busy all time.

Non-blocking MPI

pool[node] is used to record working slave nodes, wait[node] is used to record if MPI_Irecv is running for the working slave node. Then use MPI_Request_get_status to check the status of request.

            //receive result from slave nodes
            for(int node=0; node < (num_nodes-1); node++){
                if(pool[node]==1){
                    if(wait[node]==0){
                        MPI_Irecv(&node_solution[node], 1, MPI_INT, node+1, TAG, MPI_COMM_WORLD,&rq[node]);
                        wait[node]=1;
                    }

                    MPI_Request_get_status(rq[node],&flag[node],NULL);
                    if(flag[node]){
//                         printf("rec by node %d: %d - %d \n",row,node+1,node_solution[node]);
                        total_solution += node_solution[node];
                        pool[node]=0;
                        wait[node]=0;
                    }
                }
            }

How to complie

compile with openmp: mpic++ -fopenmp NQ-MPI.cpp -o NQ
run in mpi: mpirun -np 5 --host arch06,arch07,arch04,arch05,arch08 ./NQ

Only Programmers Understand

when a new intern debugging

You set a breakpoint successfully, then

Start a unit test

Forget “where” in a SQL

A tiny bug for 10 hours

There is no bug, we pretend

Last mins of the project

Let’s start a multi-thread program

You thought you catched all exceptions

Reduce some unused code

First time you presented a demo to boss

pair programming

Review some code you created one month ago

project went online after a perfect final test

Baisc NLP by spaCy

I had my first contact with NLP was sensitive classification by NLTK, which was refreshed me how NLP working. A couple of days ago, since I needed to extract some keywords from one or more paragraphs, I tried to understand spaCy which I thought is easier for relatively simple subjects. I summarized some key concepts from spaCy.io and put one of my examples for reference.

The pipeline of NLP

No matter we use NLTP or spaCy, there are almost same pipelines:

1. Sentence Segmentation

It makes sense we break down the paragraph into sentence since each sentence expresses its own topics. spaCy uses the dependency parse to determine sentence boundaries in term of stats model.

doc = nlp(u"This is a sentence. This is another sentence.")
for sent in doc.sents:
    print(sent.text)

Unlike structure papers, web articles are usually semi-structured. Hence, we need to some customized processing and put it into the standard pipeline.

def set_custom_boundaries(doc):
    # do something here
    return doc
# you can only set boundaries before a document is parsed
nlp.add_pipe(set_custom_boundaries, before='parser')

2. Word Tokenization

Tokenization is used to break a sentence to separate words call Tokens. Tokenization is based on the language which decides the punctuations, prefix, suffix and inffix.

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')
for token in doc:
    print(token.text)

3. PoS(Parts of Speech), Lemmatization and stop words

At this step, spaCy makes a prediction for each token and put on the most likely tags for them. At the same time, spaCy figures out the basic form and stop words.

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
          token.shape_, token.is_alpha, token.is_stop, [child for child in token.children])
# token.lemma_: the basic form of word
# token.pos_: simple pos
# token.tag_: detail pos
# token.dep_: syntactic dependency
# token.shape_: word shape
# token.is_alpha_: is alpha word
# token.is_stop_: is stop word
# token.children: children token

4. Dependency Parsing

After PoS, we already know the relationship between words. Next step is using more friendly visualization to show the relationships in a sentence. Dependency visualizer is a tree sturcute where the root is main verb in the sentence.

display.serve(doc, style='dep')

5. Finding Noun Phrases

Sometimes, we need to simply the sentence. Group noun phrases together makes this more sense.

For simple way:

from spacy.symbols import *
for np in doc.noun_chunks:
    print(np.text)

For flexible way is iterating over the words of the sentence and consider the syntactic context to determine whether the word governs the phrase-type you want.

from spacy.symbols import *
np_labels = set([nsubj, nsubjpass, dobj, iobj, pobj]) # Probably others too
def iter_nps(doc):
    for word in doc:
        if word.dep in np_labels:
            yield word.subtree

6. Named Entity Recognition (NER)

The goal of Named Entity Recognition, or NER, is to detect and label these nouns with the real-world concepts that they represent. The most common NE are:People’s names,Company names,Geographic locations (Both physical and political),Product names,Dates and times, Amounts of money,Names of events

for entity in doc.ents:
    print(f"{entity.text} ({entity.label_})")
html = display.render(doc, style='ent')
display(HTML(html))

7. Coreference Resolution

This is the most hard part in NLP. How should computer know the pronominal, like “it”, “he” or “she”? spaCy is not doing it well. But there are deep learning models, like huggingface.

My Example

Context

Speaking at a swearing-in ceremony for Associate Supreme Court Justice Brett Kavanaugh in the East Room of the White House Monday evening, President Trump apologized to Kavanaugh and his family “on behalf of our nation” for what he called a desperate Democrat-led campaign of “lies and deception” intent on derailing his confirmation.

PoS

Speaking speak VERB VBG advcl Xxxxx True False [at]
at at ADP IN prep xx True True [ceremony]
a a DET DT det x True True []
swearing swearing NOUN NN amod xxxx True False [in]
......
on on ADP IN prep xx True True [derailing]
derailing derail VERB VBG pcomp xxxx True False [confirmation]
his -PRON- ADJ PRP$ poss xxx True True []
confirmation confirmation NOUN NN dobj xxxx True False [his]

Dependency Parsing

Noun Phrases

a swearing-in ceremony
Associate Supreme Court Justice Brett Kavanaugh
the East Room
the White House
President Trump
Kavanaugh
his family
behalf
our nation
what
he
"lies
deception
his confirmation

NER

Mandelbrot set

Defination of Mandelbort set

According to wiki, the Mandelbrot set is the set of complex numbers c for which the function f_{c}(z)=z^{2}+c does not diverge when iterated from  z=0 .

What magic does it has

If we try to find the set of c in a complex plane. Here is the magic:

where x axis is the real part of c, y axis is the imaginary of c. A point c is colored black if it belongs to the set.

How to find set

here is the pseducode from wiki:

For each pixel (Px, Py) on the screen, do:
{
  x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2.5, 1))
  y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1))
  x = 0.0
  y = 0.0
  iteration = 0
  max_iteration = 1000

  //when the sum of the squares of the real and imaginary parts exceed 4, the point has reached escape.
  while (x*x + y*y < 2*2  AND  iteration < max_iteration) {
    xtemp = x*x - y*y + x0
    y = 2*x*y + y0
    x = xtemp
    iteration = iteration + 1
  }
  color = palette[iteration]
  plot(Px, Py, color)
}

for complex numbers:

z=x+iy

 z^{2}=x^{2}+i2xy-y^{2}

 c=x_{0}+iy_{0}

Hence: x is sum of real parts in z and c, y is sum of imaginary parts of z and c. x={Re}(z^{2}+c)=x^{2}-y^{2}+x_{0} and  y={Im}(z^{2}+c)=2xy+y_{0}

How about 3D

I am going to try my 3D later, but please see the detail from skytopia firstly, I cited the formula directly.

3D formula is defined by:

z -> z^n + c where z and c are defined by {x,y,z}


{x,y,z}^n = r^n { sin(theta*n) * cos(phi*n) , sin(theta*n) * sin(phi*n) , cos(theta*n) }

where
r = sqrt(x^2 + y^2 + z^2)
theta = atan2( sqrt(x^2+y^2), z )
phi = atan2(y,x)

// z^n + c is similar to standard complex addition

{x,y,z}+{a,b,c} = {x+a, y+b, z+c}

//The rest of the algorithm is similar to the 2D Mandelbrot!

//Here is some pseudo code of the above:

r = sqrt(x*x + y*y + z*z )
theta = atan2(sqrt(x*x + y*y) , z)
phi = atan2(y,x)

newx = r^n * sin(theta*n) * cos(phi*n)
newy = r^n * sin(theta*n) * sin(phi*n)
newz = r^n * cos(theta*n)