As an amateur photographer, I was believing DSL is better than phone camera since it has a much larger CMOS so that it can receive more photons, until I installed Google Camera on my Galaxy S8. The results surprised me a lot. I mean, it performed better than my Conan 5DIII in the most of case without editing in the software like photoshop.
Except HDR+ and portrait mode, Google camera provides a magic mode called Night Sight. Unfortunately, this can only work on Pixel series (Pixel 3/3a is the best) phone with hardware support. You can find the A/B Testing here.
How does Night Sight improve the quality of shots in the night?
HDR+. HDR+ is the foundational function for Night Sight. It is a computational photography technology that improves this situation by capturing a burst of frames, aligning the frames in software, and merging them together. Since each frame is short enough to prevent the blur caused by hand shake, the result turns out to be sharper and wilder dynamic range than without HDR+.
Positive-shutter-lag (PSL) . In the regular mode, Google camera uses zero-shutter-lag (ZSL) protocol which limits exposures to at most 66ms no matter how dim the scene is, and allows our viewfinder to keep up a display rate of at least 15 frames per second. Using PSL means you need to hold still for a short time after pressing the shutter, but it allows the use of longer exposures, thereby improving SNR at much lower brightness levels.
Motion metering. Optical image stabilization (OIS) is widely used in many devices to prevent hand shake. But it doesn’t help for long exposure and motion object. Pixel 3 adds motion metering to detect the motion object and adjust the exposure time for each frame. For example, if it detects a dog moving in the frame or we are using the tripod, it will increase exposure time.
Super Res Zoom. HDR essentially uses algorithm to aliment and merge the frames to increase the SNR( signal to noise ratio). Pixel 3 provides a new algorithm called Super Res Zoom for super-resolution and reduce noise, since it averages multiple images together. Super Res Zoom produces better results for some nighttime scenes than HDR+, but it requires the faster processor of the Pixel 3.
Learning-based AWB algorithm. When it is dim, AWB( auto white balance) is not functional well. And it is an ill-posed problem, which means we cannot inverse the problem (find out the real color of object in the dark). In this case, Google camera utilizes machine learning to “guess” what is the true color and shift the white balance.
S-curve into our tone mapping. As we know, if we exposure a picture for a long time, all the objects become brighter so that we can not figure out when this picture takes. Google uses sigmoid function to adjust the object brightness ( dark objects become darker, light objects become brighter).
YOLO also know as You Only Look Once. Not like R-CNN, YOLO uses single CNN to do the object detection as well as localization which makes it super faster than R-CNN with only losing a little accuracy. From 2016 to 2018, YOLO has been imporved from v1 to v3. In this article, I will use a simple way to explain how YOLO works.
What tasks we need to solve in object detection problem?
Yolo use the same method as human to detect the object. There are three major steps: 1. is it an object? 2. what object is it? 3. where is the position and size of this object. BUT! Through CNN, YOLO can do these three things all together.
How YOLO solve this problem?
First, Let’s introduce Grid Cell in YOLO. The whole input image is divided into grid. Each grid cell predicts only one objects with fixed boundary boxes( say #B). For each boundary box has its own box confidence score to reflet the possibility of object. For each grid cell it predicts C conditional class probabilities( one per class). so that we will get predictions. Here 5 means central location(x,y), size( h,w) and confidence score of each boundary box.
Then you will find so many boundary box from output. How we choose of them? Here we need to do Non-max suppression. The step is as blew:
discard all boxes with box confidence less then a threshold. ( say 0.65)
While there are any renaming box(overlapping):
pick the box with the largest confidence that as a prediction
discard any remaining boxes with IoU(intersection over union: you can see it as overlap size between two boundary box) greater than a threshold(say 0.5)
After Non-max suppression, we need to calculate class confidence score , which equals to box confidence score * conditional class probability. Finally, we get the object with probability and its localization. (see Figure 1)
YOLO Network Design
Let’s see how YOLO v1 looks like. Input = 448*448 image, output = . There are 24 convolutional layers followed by 2 full connected layer for localization. It use sum-squared error between the predictions and ground truth to calculate loss which is consist of classification loss, localization loss and confidence loss.
Classification loss:
Localization loss:
Confidence loss:
The final loss add three components together.
YOLO V2
YOLO v2 improves accuracy compared with YOLO v1.
Add batch normalization on all of the convolutional layers. It get more than 2% improvement in accuracy.
High-resolution classifier. First fine tune the classification network at the full resolution for 10 epochs on IMageNet. This gives network time to adjust tis filters to work better on higher resolution input.
Convolutional with Anchor Boxes. YOLO v1 can only predicts 98 boxes per images and it makes arbitrary guesses on the boundary boxes which leads to bad generalization, but with anchor boxes, YOLO v2 predicts more than a thousand. Then it use dimension cluster and direct location prediction to get the boundary box.
Dimension Cluster. use K-mean to get the boundary boxes patterns. Figure 5 might be the most common boundary boxes in spec dataset.
5. Direct location prediction. Since we use anchor boxes, we have to predict on the offsets to these anchors.
6. Multi-Scale Training. Every 10 batches, YOLOv2 randomly selects another image size to train the model. This acts as data augmentation and forces the network to predict well for different input image dimension and scale.
YOLO v3
Detection at three scales. YOLOv3 predicts boxes at 3 different scales. Then features are extracted from each scale by using a method similar to that of feature pyramid networks
Bounding box predictions. YOLO v3 predicts the object score using logistic regression.
Class prediction. Use independent logistic classifiers instead of softmax. This is done to make the classification multi-able classification.
Spark provides spark MLlib for machine learning in a scalable environment. MLlib includes three major parts: Transformer, Estimator and Pipeline. Essentially, transformer takes a dataframe as an input and returns a new data frame with more columns. Most featurization tasks are transformer. Estimator takes a dataframes as an input and returns a model(transformer), as we know the ML algorithms.. Pipeline combines transformer and estimator together. Additionally, data frame becomes the primary API for MLlib. There is not any more new features for RDD based API in Spark MLib.
If you already understood or used high level machine learning or deep learning frameworks, like scikit-learning, keras, tersorflow, you will find everything is so familiar with. But when you use spark MLlib in practice, you still need third library’s help. I will talk about it in the end.
Basic Stats
# Corrlation
from pyspark.ml.stat import Correlation
r1 = Correlation.corr(df, "features").head()
print("Pearson correlation matrix:\n" + str(r1[0]))
# Summarizer
from pyspark.ml.stat import Summarizer
# compute statistics for multiple metrics without weight
df.select(summarizer.summary(df.features)).show(truncate=False)
# ChiSquare
r = ChiSquareTest.test(df, "features", "label").head()
print("pValues: " + str(r.pValues))
print("degreesOfFreedom: " + str(r.degreesOfFreedom))
print("statistics: " + str(r.statistics))
Featurization
# TF-IDF
# stop word remove
remover = StopWordsRemover(inputCol="raw", outputCol="filtered")
remover.transform(sentenceData)
# tokenize
tokenizer = Tokenizer(inputCol="sentence", outputCol="words")
wordsData = tokenizer.transform(sentenceData)
# n grame
ngram = NGram(n=2, inputCol="wordsData", outputCol="ngrams")
ngramDataFrame = ngram.transform(wordDataFrame)
# word frequence
hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", numFeatures=20)
featurizedData = hashingTF.transform(wordsData)
# idf
idf = IDF(inputCol="rawFeatures", outputCol="features")
idfModel = idf.fit(featurizedData)
rescaledData = idfModel.transform(featurizedData)
# word2vec
word2Vec = Word2Vec(vectorSize=N, minCount=0, inputCol="text", outputCol="result")
model = word2Vec.fit(documentDF)
# binarizer
binarizer = Binarizer(threshold=0.5, inputCol="feature", outputCol="binarized_feature")
binarizedDataFrame = binarizer.transform(continuousDataFrame)
# PCA
# reduce dimension to 3
pca = PCA(k=3, inputCol="features", outputCol="pcaFeatures")
model = pca.fit(df)
# StringIndex
# encodes a string column of labels to a column of label of indices order by frequency or alphabet
indexer = StringIndexer(inputCol="category", outputCol="categoryIndex")
indexed = indexer.fit(df).transform(df)
# OneHotEstimator
# we need to use StringIndex first if apply to categorical feature
encoder = OneHotEncoderEstimator(inputCols=["categoryIndex1", "categoryIndex2"],
outputCols=["categoryVec1", "categoryVec2"])
model = encoder.fit(df)encoded = model.transform(df)
# Normalize & Scaler
normalizer = Normalizer(inputCol="features", outputCol="normFeatures", p=1.0)
lInfNormData = normalizer.transform(dataFrame, {normalizer.p: float("inf")})
l1NormData = normalizer.transform(dataFrame)
# standard scaler
# withMean=false: standard deviation, withMean=true: mean
scaler = StandardScaler(inputCol="features", outputCol="scaledFeatures",
withStd=True, withMean=False)
# maxmin scaler
scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
# max abs scaler
scaler = MaxAbsScaler(inputCol="features", outputCol="scaledFeatures")
# bin
from pyspark.ml.feature import Bucketizer
splits = [-float("inf"), -0.5, 0.0, 0.5, float("inf")]
bucketizer = Bucketizer(splits=splits, inputCol="features", outputCol="bucketedFeatures")
# QuantileDiscretizer
discretizer = QuantileDiscretizer(numBuckets=3, inputCol="hour", outputCol="result")
# ElementwiseProduct
transformer = ElementwiseProduct(scalingVec=Vectors.dense([0.0, 1.0, 2.0]),
inputCol="vector", outputCol="transformedVector")
# SQL Transformer
sqlTrans = SQLTransformer(
statement="SELECT *, (v1 + v2) AS v3, (v1 * v2) AS v4 FROM __THIS__")
# VectorAssembler
# combine vector together for future model inputs
assembler = VectorAssembler(
inputCols=["hour", "mobile", "userFeatures"], # the columns we need to combine
outputCol="features") # output column
# Imputer
# handle missing value
imputer = Imputer(inputCols=["a", "b"], outputCols=["out_a", "out_b"])
imputer.setMissingValue(custom_value)
# slice vector
slicer = VectorSlicer(inputCol="userFeatures", outputCol="features", indices=[1])
# ChiSqSelector
# use Chisqare to select the features
selector = ChiSqSelector(numTopFeatures=1, featuresCol="features",
outputCol="selectedFeatures", labelCol="clicked")
Clarification and Regression
# Linear regression
lr = LinearRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
# logistic regression
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
mlr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8, family="multinomial") # multinomial
# decision tree
# classification
labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel").fit(data)
featureIndexer =\
VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)
(trainingData, testData) = data.randomSplit([0.7, 0.3])
dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures")
# regression
dt = DecisionTreeRegressor(featuresCol="indexedFeatures")
# Random forest
# classification
rf = RandomForestClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", numTrees=10)
# regeression
rf = RandomForestRegressor(featuresCol="indexedFeatures")
# gradient-boosted
gbt = GBTRegressor(featuresCol="indexedFeatures", maxIter=10)
# preceptron
trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
# SVM
# Linear SVM, there is no kernel SVM like RBF
lsvc = LinearSVC(maxIter=10, regParam=0.1)
# Naive Bayes
nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
# KNN
knn = KNNClassifier().setTopTreeSize(training.count().toInt / 500).setK(10)
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating",
coldStartStrategy="drop")
model = als.fit(training)
# Evaluate the model by computing the RMSE on the test datapredictions = model.transform(test)evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
Validation
# split train and test
train, test = data.randomSplit([0.9, 0.1], seed=12345)
# cross validation
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=2) # use 3+ folds in practice
You might be already found the problem. The ecosystem of Spark MLlib is not as rich as scikit learning, and it is lack of deep learning (of course, its name is machine learning). According to Databricks documents, We still have the solutions.
Use scikit learning on single node. Very simple solution. but since scikit leanring load the data still in memory. If the note is faster enough(driver), we can get a good performance as well.
To solve deep learning problem. we have two work around methods.
Apply keras, tensorflow on single node with GPU acceleration(Recommend by databricks).
Distribute Training. It might be slower than on the single node because of communication overhead. There are two frameworks used for distribute training. Horovod and Apache SystemML. I’ve never use Horovod, but you can find information here. As to SystemML, it is more like a wrapper for high level API and provide cluster optimizer which parses the code into spark RDD(live variable analysis, propagate stats, rewrite by matrix decomposition and runtime instruction). From the official website, we know it is much faster than MLLib and native R. The problem is it didn’t update anymore since 2017.
# Create and save a single-hidden-layer Keras model for binary classification# NOTE: In a typical workflow, we'd train the model before exporting it to disk,# but we skip that step here for brevity
model = Sequential()
model.add(Dense(units=20, input_shape=[num_features], activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))
model_path = "/tmp/simple-binary-classification"
model.save(model_path)
transformer = KerasTransformer(inputCol="features", outputCol="predictions", modelFile=model_path)
It seems no perfect solution for machine learning in spark, right? Don’t forget we have other time costing jobs: hyper parameters configuration and validation. We can run same model with different hyper parameters on different nodes using paramMap which similar to grid search or random search.
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
paramGrid = ParamGridBuilder().addGrid(lr.maxIter, [10, 100, 1000]).addGrid(lr.regParam, [0.1, 0.01]).build()
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=RegressionEvaluator(),
numFolds=2) # use 3+ folds in practice
cvModel = crossval.fit(training)
There might be someone saying: why we don’t use MPI? The answer is simple, too complex. Although it can gain the perfect performance, and you can do whatever you want even running distributed GPU + CPU codes, there are too many things we need to manually configuration on low level API without fail tolerance.
In conclusion, we can utilize spark for ELT and training/ validation model to maximize the performance(it did really well for these works). But until now, we still need third frameworks to help us do deep learning or machine learning tasks on single strong node.