Supplemental Subquery Exercises

CIT225 Subquery Supplemental Exercises

Written by Tanner Crook using Sakila sample data on MySQL


The book subquery exercises are a little awkward so it might be cleary and more fun to try out your knowledge of subqueries against real world scenarios.

Because sakila data is from 2005 and 2006, pretend that the current date (today) is 2006-01-01.

Scenario 1: Monthly Segments

It’s the end of the year and we want to see how the video store trended. A good way to see how performance is trending over time is to look at monthly averages in segments:

  • L12M Avg Monthly Revenue
  • L6M Avg Monthly Revenue
  • L3M Avg Monthly Revenue
  • L1M Avg Monthly Revenue

We hope to see that our avg revenue goes up in each segment, which means that on avg we are trending upwards.

FYI Tip: Using the Avg Monthly by segment vs Monthly Total with smooth out any outliers.

Step 1: Write Single Query for Segment


Write the query to pull the Monthly Total Revenue for the Last 12 Months (L12M).

You can use the DATE_SUB function to find payments in that period:

WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 12 MONTH)

Your results should look like this:

+----------------+----------+----------+
| period_segment | mnth     | revenue  |
+----------------+----------+----------+
| L12M           | May      |  4824.43 |
| L12M           | June     |  9631.88 |
| L12M           | July     | 28373.89 |
| L12M           | August   | 24072.13 |
| L12M           | February |   514.18 |
+----------------+----------+----------+
5 rows in set (0.04 sec)

Step 2: Write the Report Query to show Avg Monthly for all L12M


Write the outer query using the query from step 1 as a subquery in the FROM clause.

Your results should look like the following:

+----------------+-----------------+
| period_segment | avg_monthly_rev |
+----------------+-----------------+
| L12M           |    13483.302000 |
+----------------+-----------------+
1 row in set (0.06 sec)

We can now see that in the last twelve months what the average monthly revenue is.

Step 3: Repeat Step 1 for All Segments and put in Report Query


Using Step 1 as a template, create a query for each segment listed in the scenario and integrate into the reporting query.

You can use a UNION ALL to link your queries into a single table:

(
SELECT
'L12M' AS period_segment,
...
FROM
...
UNION ALL
SELECT
'L6M' AS period_segment,
...
FROM
...
)

Final Results


Your final results should show the following:

+----------------+-----------------+
| period_segment | avg_monthly_rev |
+----------------+-----------------+
| L12M           |    13483.302000 |
| L6M            |    17653.400000 |
| L3M            |      514.180000 |
| L1M            |      514.180000 |
+----------------+-----------------+
4 rows in set (0.06 sec)

Uh Oh! Looks like maybe we should have invested in Netflix instead of starting a brick-and-mortar video rental.

Scenario 2: Avg Monthly Rentals by Rating and Quarter

Challenge Scenario: This exercise won’t be as detailed or contain detailed step-by-step walkthroughs, but use the previous scenario for hints.

We rearrange our shelves on a quarterly basis. We want to analyze how rentals by ratings are trending by quarters so we can organize accordingly if there are any noticable trends.

Your Mission should you choose to accept it.

Get a dataset containing:

Dimensions: year, quarter, rating 
Metrics: average monthly rentals

The result set should look like the following:

+-------------+----------------+--------+---------------------+
| rental_year | rental_quarter | rating | avg_monthly_rentals |
+-------------+----------------+--------+---------------------+
|        2005 |              2 | G      |            302.5000 |
|        2005 |              2 | PG     |            337.5000 |
|        2005 |              2 | PG-13  |            391.0000 |
|        2005 |              2 | R      |            353.0000 |
|        2005 |              2 | NC-17  |            349.5000 |
|        2005 |              3 | G      |           1067.5000 |
|        2005 |              3 | PG     |           1251.5000 |
|        2005 |              3 | PG-13  |           1380.5000 |
|        2005 |              3 | R      |           1223.0000 |
|        2005 |              3 | NC-17  |           1275.0000 |
|        2006 |              1 | G      |             33.0000 |
|        2006 |              1 | PG     |             34.0000 |
|        2006 |              1 | PG-13  |             42.0000 |
|        2006 |              1 | R      |             29.0000 |
|        2006 |              1 | NC-17  |             44.0000 |
+-------------+----------------+--------+---------------------+
15 rows in set (0.06 sec)

Tips


Start by creating a subquery to get the monthly, rating total rentals.

We need the rental_month_start in date form so we can use the QUARTER() function when we aggregate in the main query. The YEAR() and MONTH() functions can be used with CONCAT() to fabricate this value.

CONCAT(YEAR(rental_date),'-',MONTH(rental_date),'-01') AS rental_month_start,

Subquery Data:

+-------------+--------------+--------------------+--------+-----------------+
| rental_year | rental_month | rental_month_start | rating | monthly_rentals |
+-------------+--------------+--------------------+--------+-----------------+
|        2005 | July         | 2005-7-01          | PG     |            1348 |
|        2005 | August       | 2005-8-01          | PG     |            1155 |
|        2005 | May          | 2005-5-01          | PG     |             216 |
|        2005 | June         | 2005-6-01          | PG     |             459 |
|        2005 | August       | 2005-8-01          | G      |             946 |
...

You will not need to leverage UNION ALL in this scenario.

Solutions

Don’t read further if you you don’t want to see the solutions!

Scenario 1


SELECT 
period_segment,
AVG(revenue) AS avg_monthly_rev
FROM
(
    SELECT 
    'L12M' AS period_segment,
    MONTHNAME(payment_date) AS mnth,
    SUM(amount) AS revenue 
    FROM payment 
    WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 12 MONTH)
    GROUP BY period_segment, mnth
    UNION ALL
    SELECT 
    'L6M' AS period_segment,
    MONTHNAME(payment_date) AS mnth,
    SUM(amount) AS revenue 
    FROM payment 
    WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 6 MONTH)
    GROUP BY period_segment, mnth
    UNION ALL
    SELECT 
    'L3M' AS period_segment,
    MONTH(payment_date) AS mnth,
    SUM(amount) AS revenue 
    FROM payment 
    WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 3 MONTH)
    GROUP BY period_segment, mnth
    UNION ALL
    SELECT 
    'L1M' AS period_segment,
    MONTH(payment_date) AS mnth,
    SUM(amount) AS revenue 
    FROM payment 
    WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 1 MONTH)
    GROUP BY period_segment, mnth
) am
GROUP BY period_segment;

Scenario 2


SELECT 
rental_year,
QUARTER(rental_month_start) AS rental_quarter,
rating,
AVG(monthly_rentals) AS avg_monthly_rentals
FROM
(
    SELECT 
    YEAR(rental_date) AS rental_year,
    MONTHNAME(rental_date) AS rental_month,
    CONCAT(YEAR(rental_date),'-',MONTH(rental_date),'-01') AS rental_month_start,
    f.rating AS rating,
    COUNT(rental_id) AS monthly_rentals
    FROM rental r 
    INNER JOIN inventory i 
    ON r.inventory_id = i.inventory_id
    INNER JOIN film f 
    ON i.film_id = f.film_id
    GROUP BY rental_year, rental_month, rental_month_start, rating
) mr
GROUP BY rental_year, rental_quarter, rating
ORDER BY 1, 2;

Installing the Sakila Demo Database from Scripts

It happens. Sometimes we mess up our demo database. Sometimes the installation option is missed and it doesn’t get installed. You don’t need to re-install all of MySQL to get the demo database. They can be installed using scripts.

You can download the Sakila database scripts here: https://downloads.mysql.com/docs/sakila-db.zip

Once downloaded, you can follow the steps on MySQL’s guide to run them: MySQL :: Sakila Sample Database :: 4 Installation

To execute the file (be sure you are in the same directory as the scripts) you can run:

source the-script-name.sql


Analyzing Data: High School Term Lengths

I always tell people that it is absolutely crazy not to make decisions based on data if you have data that can be used. A good example of this came into play at my work recently when there have been discussions around various term schedules. Currently, the school in question has a traditional two semester school year, with alternating four period days. This means that students have eight classes per term. A lot of the concern comes from the rotating days and missing courses constantly for holidays, activities, and remedial Fridays. Another concern is that eight classes is a lot to juggle at a time and that students might perform better by having less courses at a time, but to maintain the amount of credits achievable in a year, more terms would be needed.

Having done most of my undergraduate program at a trimester based university, I enjoyed the idea because as a student I felt the shorter terms contributed to my success because it was a shorter, more focused approached to learning. So immediately I turned to our datasets to see if I could put together any good data to support a shorter term.

You can find the work of the study in the R Notebook below.

Limiting NVIDIA GPU in Linux

I have a nice laptop with a GPU for machine learning and the occasional video game. A System76 Oryx Pro that has a 6GB GTX 1060 graphics card. This model is before they had integrated intel graphics with the easy switching to get that precious battery life when you aren’t using your GPU for workloads. So I searched high and low for a way to limit my GPU as it seemed like it was constantly spinning up even when just browsing the web.

On Linux, there is a NVIDIA utility called NVIDIA XServer Settings that will allow you to make some changes to your settings and see stats about your GPU. Specifically, there is a tab called PowerMizer where you can watch the card adapt it’s clock speeds to what it thinks you need. Which is great when you want the card to ramp up when it’s needed. However, I wanted to force my card to stay at the lowest performance level, knowing that I may sacrifice some performance for some battery life and fan noise.

I found a nice little solution to force my NVIDIA GPU to limit itself to a specific PowerMizer setting.

Add the following to the file: /etc/modprobe.d/nvidia-graphics-drivers.conf

# Force GPU to lowest level
options nvidia NVreg_RegistryDwords="OverrideMaxPerf=0x1"

To undo this, you can just comment out or remove the line. The only downside is that it requires a restart. But most of the time I open my laptop knowing what my workloads will be and it is a small price to pay to squeeze out that extra battery!

SQL WHERE Clause Operators

The following operators can be used in the WHERE clause to filter data:

Operator Description Example
= Equal WHERE last_name = 'Potter'
> Greater than WHERE price > 5
< Less than WHERE price < 5
>= Greater than or equal WHERE price >= 5
<= Less than or equal WHERE price <= 5
!= or <> Not equal WHERE last_name != 'Potter'
BETWEEN Between a certain range WHERE date_added BETWEEN '01-JAN-19' AND '31-DEC-19'
LIKE Search for a pattern (place a % to represent a wildcard) WHERE last_name LIKE 'Po%'
For example return ‘Potter’, ‘Poe’, and ‘Pope’.
IN In a specified list WHERE last_name IN ('Potter','Weasley','Granger')
ANY or SOME Check operator against a list (IN with conditional) WHERE 10 >= ANY (12,50,24)
Example 1
SELECT 
  first_name 
, last_name 
FROM contact 
WHERE first_name = 'Harry'
AND last_name = 'Potter';
Example 2
SELECT 
  item_title
, release_date
, item_rating
FROM item 
WHERE release_date BETWEEN '01-JAN-83' AND '31-JAN-83'
AND item_rating != 'R';
Also, keep in mind that these can all be inverted with the NOT keyword.

WHERE NOT last_name = 'Potter' would be synonymous with WHERE last_name != 'Potter'.

Fedora Emergency Mode?

Have something go wrong while you are in a virtual machine and you end up having to force quit? Sometimes that doesn’t end so well with the guest operating system. This can often be the case when running the Fedora instance for CIT225 and you may come face to face with the Fedora Emergency Mode.

You can verify that it is malformed data in the virtual disk by checking the system logs.
journalctl -xb

In the output you may see some red lines that say that a service called fsck failed. This means you most likely have malformed data. Luckily, that same service can fix the problem and have us up and going very quickly. Simply run:
fsck /dev/mapper/fedora-root

Then use the VMware menu to restart or reset the virtual machine. It should boot to Fedora.

CIT225 Fedora Instance Torrent

If you are having problems downloading the Fedora image (especially if you have bad internet connection)

I have created a torrent that should allow you to download incrementally (so if connectivity gets bad it doesn’t have to start all over). Use a torrent client like Transmission (Links to an external site.) and open the attached file in the torrent client.

If the class could also please help seed it would be awesome!

Torrent File: http://tannercrook.com/downloads/FedoraV2020.zip.torrent

 

Flask and SQLAlchemy

Instructions

Now that we have a database ready to use, we need to make our database accessible to our web application framework Flask. To do so, we will be using SQLAlchemy which will not only connect to the database, but will also manage the python objects to make manipulating and using the the data easier.

 

Getting the Project Files

Previously we had set up a basic Flask directory structure. We are going to move those files to back them up, and get the completed project files.

First, lets change our directory name so we can put the new files there:
mv /var/www/app/app /var/www/app/app.bak

Now lets go to our app directory and use git to download the files we need:
cd /var/www/app
git clone https://github.com/tannercrook/Flask-LAMP
And then rename the directory we download: mv Flask-LAMP app

You can now enter the directory and find our Flask structure!

 

 

Build the Python Container

As we did with our first Flask structure, we need to build the virtual environment for Flask and install all of the dependencies. We can use the requirements.txt to install all of the dependencies.

First, let’s make sure we are in the correct location. Running pwd should return /var/www/app/app.

Now, we can create our container for Python:
python3 -m venv venv

Next, we will activate the container, and install the dependencies:
source venv/bin/activate
pip3 install -r requirements.txt

If you get an error installing the requirements, try installing wheel first.

pip3 install wheel

 

 

Now you should have your container and the proper packages!

 

 

Connect to Your Database

The final step is to connect our app to our database. The database should be running on the same server, so all we need to do is supply it with the proper username and password. Note that all paths will be relative to you being in /var/www/app/app

The first file we will edit is models/Connection.py:
nano models/Connection.py

In the file, you will see a line (line 6) that looks like the following:
engine = create_engine("postgresql+psycopg2://catalog_app:schoolrocks@localhost/course_catalog", implicit_returning=True)

You will need to edit the username and password to match your database credentials. In the case above, catalog_app is my username and schoolrocks is my password. So edit those two values to match your credentials. If your database name is different, you can edit where it says course_catalog.

The final file we will edit is app.py:
nano app.py

The line we edit here will be similar:
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://catalog_app:schoolrocks@localhost/course_catalog'

You will similar edit the same values to adjust the username and password for the database.

After saving, Flask should now be able to talk to your database.

Think

SQLAlchemy is a package that will deeply connect our database and Python. We use sqlacodegen to generate Python objects (classes) for our various database objects so we can skip writing boilerplate code and custom classes to perform database actions. Take a look at the models/models.py file to see the generated objects. You can see how this is used in the various views/*.py files. You can see how sqlacodegen is used here: https://db.tannercrook.com/python-objects-from-database-for-sqlalchemy/

 

Seeing the Results

Well now everything is set. All we have left to do now is restart apache, and check it out!

sudo service apache2 restart

Now you can navigate to your server IP and view the website.
ex http://10.0.0.60

 

 

Run With It

This is just a very small glimpse into the whole process. I hope that from here, you will begin to explore on your own and discover software of your own.

Your app.bak directory will provide you with a blank slate for Flask that will allow you to repurpose it into something of your own creation. You can then use the example laid out to connect to your own database, and implement it.

Ponder

I hope that you can take it, and run with it.

“Know ye not that they which run in a race run all, but one receiveth the prize? So run, that ye may obtain.” – 1 Corinthians 9:24

 

 

Designing and Building a Database

Instructions

You should have a development environment all set up. Now we will be setting up our database and building the necessary structures for our database.

 

 

Create a PostgreSQL Database

The first thing we need to do is create our database where we will house all of our structures. When we installed postgreSQL, a new user was automatically created that has rights to access the database. So we will switch to that user:

First, switch to root:
sudo su
Then switch to postgres
su postgres

While we could create the database and user with SQL, we will use PostgreSQL utilities to let it do it for us:

Create the database:
createdb course_catalog

We want a user that is specific to the one piece of software for security reasons. So we create a user:
createuser catalog_app
We need to give permissions to the user, so we will log into the psql utility:
psql
And then connect to the course_catalog database:
\connect course_catalog
Set (or change) our users password:
ALTER USER catalog_app WITH PASSWORD 'somesecurepassword';
Finally, grant all privileges on the database to our user:
GRANT ALL PRIVILEGES ON DATABASE course_catalog TO catalog_app;

We can test that everything worked correctly by disconnecting from our postgres user and connecting with our new catalog_app user:

Quit the psql utility:
\q
Exit the postgres user:
exit
Exit the root user:
exit

Now, attempt to connect with our new info:
psql course_catalog catalog_app
You should get an error here. This is because we have not loosened the security to allow logins in this method.

We can change the postgreSQL config to allow this:

Edit the config file:
sudo nano /etc/postgresql/12/main/pg_hba.conf
Use your arrow keys to scroll clear to the bottom and then look for the following line:

local   all             all                                peer

And change it so peer is md5 Like the following:

Screenshot

 

Save and exit the file:
ctrl+o
ctrl+x

If you want to connect to postgres from another computer (like your host with an IDE) you will need to make additional config changes.

sudo nano /etc/postgresql/12/main/pg_hba.conf
and add the following line at the end:

host   all             all              0.0.0.0/0                  peer

Save and exit ctrl+o ctrl+x
sudo nano /etc/postgresql/12/main/postgresql.conf
and uncomment and change the following line:

listen_address = '*'

Save and exit ctrl+o ctrl+x

Restart postgresql to reload with the new configuration:
sudo service postgresql restart

Now, we should be able to login with our credentials:
psql course_catalog catalog_app
Enter the password you used previously and you should be logged in!

You can exit the psql utility at any time by using the \q command.

 

 

Creating the Database

You can get the database creation script from here: https://github.com/tannercrook/Flask-LAMP/blob/master/course_catalog_init.sql

You can download it directly in your server using the terminal:
wget http://tannercrook.com/downloads/course_catalog_init.sql

Then, we can execute the script in psql:
psql course_catalog catalog_app Login using specifying the database and user.
\i course_catalog_init.sql
If everything went well, you can exit psql \q

 

 

Think

I would strongly encourage you to review the database design and structure. While you are doing so, consider how you might change design if courses were managed at the district level, rather than at the school level.

Preparing for Python Integration

We need to make sure we have the proper software to connect to our database with Python.
sudo apt install postgresql-server-dev-12

 

 

 

 

 

 

Python Objects from Database for SQLAlchemy

One of the most tedious tasks of web development is managing the data objects between your database and the web backend language. There is an amazing tool that can really help improve development time for Flask called SQLAlchemy. Many use SQLAlchemy to build the database as well, but for those like me who already have a database, or just like the fine control of SQL, there is a package called sqlacodegen that will create Python objects for SQLAlchemy from an existing database for you.

It is quite simple to generate your models:
sqlacodegen --outfile models.py postgresql://user:password@localhost/database

This will create the models.py folder at the current directory which can then be placed in your Flask application to be imported and used.