For most of us who learned CNN, we already knew the convolutional operation is used for feature extraction in the spatial relationship. Compared with the full connection NN, it is good for weights sharing and translation invariant. There are many different convolutions. Recently, I found a very good article which summarized this topic. I translated it to English combined with my understanding. If you want to read the original one, you can go here.
1. Standard Convolution
1.1 Single channel
It’s element-wise multiply then sum together. The Convolutional filter moves forward each element in the picture. Here we set padding = 0, stride = 1. This is very useful for the gray picture.
1.2 multi channels
For the color pictures, they are made of 3 layers: Red, Green and Yellow. we create a 333 convolution which contains 3 convolutional kernels. Then we sum the three results togher to one channel 2D array.
This is our first apply ConvLSTM to CFD successfully! although the case is simple and under control of lots of factors. The ground factor is generated by Openfoam, and the custom model is predNet from
coxlab. We trained three models in this time.
1.Training: use Nth frame to predict (N+1)th frame
Prediction: use 1-10th frame to predict 2-11th frame, then combined 11th frame in the predicted output with 1-10th frame. With new input(2-10th are ground truth, 11th is predicted), we can keep predicting 3-12th frame. In this experience, we predict the frames until 20th where sliding window = 1 frame. ( only first few frame are good, since we use the predicting frames to do the prediction)
2.Training: Nth frame to predict (N+10)th frame
Prediction: use 1-10 frame to predict 11-20 frame, no sliding window. ( this is very good since all input are ground truth)
3.Training: use Nth frame to predict (N+1)th frame
Prediction: use one frame to predict next frame, like driving prediction (animation has a little problem. right side is prediction, left side is ground truth)
For some reason, there is a request to predict video frames. We need that video is a combination of spatial and temporal dimensions. FCN and LSTM are good for them respectively. But for both of them, we need to use ConvLSTM. Since I just start to learn it, so I write down some of notes for good understanding.
1.First thing first, let’s see what LSTM looks like:
From left to right, we can see
input modulation gate and
output gate. On the top side is memory pipe. It simulates the manner that human remember things. For more information, how the LSTM works please click here.
In keras, there are already three kinds of RNN: simpleRNN, LSTM and GRU. They are all easy to use.
2. What is ConvLSTM
Since LSTM is not good for spatial vector as input, which is only one dimension, they created ConvLSTM to allowed multidimensional data coming with convolutional operations in each gate.
We can find the basic formulas are as same as LSTM, they just use convolutional operations instead of one dimension for input, previous output and memory. Keras needs a new component which called
ConvLSTM2D to wrap this ConvLSTM.
3. Where we use it?
As I said in the beginning, it is used for prediction with time and space. The already done in academic inculds: predict precipitation, video frame prediction, some physic movement activities. You can find more in my reference.
1. the bounce ball. https://www.youtube.com/watch?v=RjZ1VKYyHhs
2. weather forecast. https://papers.nips.cc/paper/5955-convolutional-lstm-network-a-machine-learning-approach-for-precipitation-nowcasting.pdf
3. some video prediction. https://www.youtube.com/watch?v=MjFpgyWH-pk