Set up MySQL on Azure Ubuntu and compare with Azure SQL

I will combine three parts: Create Ubuntu VM & attach data disk, Install and configure MySQL, Performance comparison with Azure SQL.

Create Ubuntu VM

Choose your size of VM. Here I used D4s_v3, which has 4 cores and 16GB memory. You need to choose disk for storage and set the initial admin password, I recommend premium SSD.
Open SSH access. Go to network, add SSH port 22 into your inbound port rules. Later we will add mySQL port 3306 as well.
Mount datadisk. Remember the disk for storage you chosen in the step 1? It wouldn’t mount automatically. So you need to do the following steps in your SSH.

dmesg | grep SCSI
sudo fdisk /dev/sdc
---------------------------------------
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-10485759, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-10485759, default 10485759):
Using default value 10485759
Command (m for help): p

Disk /dev/sdc: 5368 MB, 5368709120 bytes
255 heads, 63 sectors/track, 652 cylinders, total 10485760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x2a59b123

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048    10485759     5241856   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
------------------------------------------------
sudo mkfs -t ext4 /dev/sdc1
sudo mkdir /datadrive
sudo mount /dev/sdc1 /datadrive
------------------------------------------------
//if you want to automount, you need to edit fstab file
# retrieve UUID
ls -al /dev/disk/by-uuid/
# edit fstab 
sudo nano /etc/fstab
UUID=<ID> /datadrive auto defaults 0 0

After these steps, you have done all configurations for Ubuntu. You can check the link by using df -H command.

Install and configure MySQL

Install MySQL.

sudo apt-get update
sudo apt-get install mysql-server

Allow remote access

sudo ufw enable
sudo ufw allow mysql

then edit “/etc/mysql/mysql.conf.d/mysqld.cnf ” to change bind-address to 0.0.0.0 which allow all ip to remote mySQL.

nano /etc/mysql/mysql.conf.d/mysqld.cnf

Start MySQL

sudo systemctl start mysql

Add a new root user. You can use any IP address to replace % blew.

CREATE USER '<username>'@'%' IDENTIFIED BY '<user password>';

Change the data dictionary. By default, the VM only provides 30GB, you have to use your extra disk to save the database.

# stop service
sudo systemctl stop mysql
# sync to new path
sudo rsync -av /var/lib/mysql /datadrive
# backup 
sudo mv /var/lib/mysql /var/lib/mysql.bak
# change configure files
sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
-----------------------
datadir=/datadrive/mysql
--------------------
# configure AppArmor Access Control
sudo nano /etc/apparmor.d/tunables/alias
-----------------------
alias /var/lib/mysql/ -> /datadrive/mysql/,
----------------------
sudo systemctl restart apparmor
# dummy file
sudo mkdir /var/lib/mysql/mysql -p
# restart service
sudo systemctl start mysql

Then you can use select * from @@datadir to check the data dictionary.

Performance comparison with Azure SQL

40,000,000 rows	AzureSQL	Ubuntu+mySQL
Write(from databricks)	25mins	31mins
Read(to Tableau)	44mins	12mins

what a surprise! VM mySQL is faster than AzureSQL.

Tips: How to write data into AzureSQL and mySQL through Databricks.

To SQL server:

# you have to load com.microsoft.azure:azure-sqldb-spark:1.0.2 into library first
%scala
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._


val config = Config(Map(
  "url"          -> "<accountname>.database.windows.net",
  "databaseName" -> "<dbname>",
  "dbTable"      -> "<tablename>",
  "user"         -> "<admin name>",
  "password"     -> "<password name>"
))

import org.apache.spark.sql.SaveMode

df.write.mode(SaveMode.Overwrite).sqlDB(config)

To mySQL:

%scala
val jdbcHostname = "<mysql address>"
val jdbcPort = 3306
val jdbcDatabase = "<dbname>"
val jdbcUsername = "<user name>"
val jdbcPassword ="<password>"

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:mysql://${jdbcHostname}:${jdbcPort}/${jdbcDatabase}"

// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()

connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")

import org.apache.spark.sql.SaveMode


     df.write
     .mode(SaveMode.Overwrite) // <--- Append to the existing table
     .jdbc(jdbcUrl, "<table name>", connectionProperties)

NEO_AKSA

Set up MySQL on Azure Ubuntu and compare with Azure SQL

Capturing Moments: Beyond the Lens

Reviving Kodak: Leveraging Color Science

Ensuring Exclusive Sub-Task Execution in Multiple Data Pipelines

Lessons on photography from the movie “Civil War”

Creating Read-Only External Table in Unity Catalog by Using Existing Delta Table in Azure Storage Account

Decoding A24’s Rise: A Blueprint for Indie Success

Leave a Reply Cancel reply