CIT225 Subquery Supplemental Exercises
Written by Tanner Crook using Sakila sample data on MySQL
The book subquery exercises are a little awkward so it might be cleary and more fun to try out your knowledge of subqueries against real world scenarios.
Because sakila data is from 2005 and 2006, pretend that the current date (today) is
2006-01-01
.
Scenario 1: Monthly Segments
It’s the end of the year and we want to see how the video store trended. A good way to see how performance is trending over time is to look at monthly averages in segments:
- L12M Avg Monthly Revenue
- L6M Avg Monthly Revenue
- L3M Avg Monthly Revenue
- L1M Avg Monthly Revenue
We hope to see that our avg revenue goes up in each segment, which means that on avg we are trending upwards.
FYI Tip: Using the Avg Monthly by segment vs Monthly Total with smooth out any outliers.
Step 1: Write Single Query for Segment
Write the query to pull the Monthly Total Revenue for the Last 12 Months (L12M).
You can use the DATE_SUB
function to find payments in that period:
WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 12 MONTH)
Your results should look like this:
+----------------+----------+----------+
| period_segment | mnth | revenue |
+----------------+----------+----------+
| L12M | May | 4824.43 |
| L12M | June | 9631.88 |
| L12M | July | 28373.89 |
| L12M | August | 24072.13 |
| L12M | February | 514.18 |
+----------------+----------+----------+
5 rows in set (0.04 sec)
Step 2: Write the Report Query to show Avg Monthly for all L12M
Write the outer query using the query from step 1 as a subquery in the FROM
clause.
Your results should look like the following:
+----------------+-----------------+
| period_segment | avg_monthly_rev |
+----------------+-----------------+
| L12M | 13483.302000 |
+----------------+-----------------+
1 row in set (0.06 sec)
We can now see that in the last twelve months what the average monthly revenue is.
Step 3: Repeat Step 1 for All Segments and put in Report Query
Using Step 1 as a template, create a query for each segment listed in the scenario and integrate into the reporting query.
You can use a UNION ALL
to link your queries into a single table:
(
SELECT
'L12M' AS period_segment,
...
FROM
...
UNION ALL
SELECT
'L6M' AS period_segment,
...
FROM
...
)
Final Results
Your final results should show the following:
+----------------+-----------------+
| period_segment | avg_monthly_rev |
+----------------+-----------------+
| L12M | 13483.302000 |
| L6M | 17653.400000 |
| L3M | 514.180000 |
| L1M | 514.180000 |
+----------------+-----------------+
4 rows in set (0.06 sec)
Uh Oh! Looks like maybe we should have invested in Netflix instead of starting a brick-and-mortar video rental.
Scenario 2: Avg Monthly Rentals by Rating and Quarter
Challenge Scenario: This exercise won’t be as detailed or contain detailed step-by-step walkthroughs, but use the previous scenario for hints.
We rearrange our shelves on a quarterly basis. We want to analyze how rentals by ratings are trending by quarters so we can organize accordingly if there are any noticable trends.
Your Mission should you choose to accept it.
Get a dataset containing:
Dimensions: year, quarter, rating
Metrics: average monthly rentals
The result set should look like the following:
+-------------+----------------+--------+---------------------+
| rental_year | rental_quarter | rating | avg_monthly_rentals |
+-------------+----------------+--------+---------------------+
| 2005 | 2 | G | 302.5000 |
| 2005 | 2 | PG | 337.5000 |
| 2005 | 2 | PG-13 | 391.0000 |
| 2005 | 2 | R | 353.0000 |
| 2005 | 2 | NC-17 | 349.5000 |
| 2005 | 3 | G | 1067.5000 |
| 2005 | 3 | PG | 1251.5000 |
| 2005 | 3 | PG-13 | 1380.5000 |
| 2005 | 3 | R | 1223.0000 |
| 2005 | 3 | NC-17 | 1275.0000 |
| 2006 | 1 | G | 33.0000 |
| 2006 | 1 | PG | 34.0000 |
| 2006 | 1 | PG-13 | 42.0000 |
| 2006 | 1 | R | 29.0000 |
| 2006 | 1 | NC-17 | 44.0000 |
+-------------+----------------+--------+---------------------+
15 rows in set (0.06 sec)
Tips
Start by creating a subquery to get the monthly, rating total rentals.
We need the rental_month_start in date form so we can use the
QUARTER()
function when we aggregate in the main query. TheYEAR()
andMONTH()
functions can be used withCONCAT()
to fabricate this value.
CONCAT(YEAR(rental_date),'-',MONTH(rental_date),'-01') AS rental_month_start,
Subquery Data:
+-------------+--------------+--------------------+--------+-----------------+
| rental_year | rental_month | rental_month_start | rating | monthly_rentals |
+-------------+--------------+--------------------+--------+-----------------+
| 2005 | July | 2005-7-01 | PG | 1348 |
| 2005 | August | 2005-8-01 | PG | 1155 |
| 2005 | May | 2005-5-01 | PG | 216 |
| 2005 | June | 2005-6-01 | PG | 459 |
| 2005 | August | 2005-8-01 | G | 946 |
...
You will not need to leverage
UNION ALL
in this scenario.
Solutions
Don’t read further if you you don’t want to see the solutions!
Scenario 1
SELECT
period_segment,
AVG(revenue) AS avg_monthly_rev
FROM
(
SELECT
'L12M' AS period_segment,
MONTHNAME(payment_date) AS mnth,
SUM(amount) AS revenue
FROM payment
WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 12 MONTH)
GROUP BY period_segment, mnth
UNION ALL
SELECT
'L6M' AS period_segment,
MONTHNAME(payment_date) AS mnth,
SUM(amount) AS revenue
FROM payment
WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 6 MONTH)
GROUP BY period_segment, mnth
UNION ALL
SELECT
'L3M' AS period_segment,
MONTH(payment_date) AS mnth,
SUM(amount) AS revenue
FROM payment
WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 3 MONTH)
GROUP BY period_segment, mnth
UNION ALL
SELECT
'L1M' AS period_segment,
MONTH(payment_date) AS mnth,
SUM(amount) AS revenue
FROM payment
WHERE payment_date >= DATE_SUB('2006-01-01', INTERVAL 1 MONTH)
GROUP BY period_segment, mnth
) am
GROUP BY period_segment;
Scenario 2
SELECT
rental_year,
QUARTER(rental_month_start) AS rental_quarter,
rating,
AVG(monthly_rentals) AS avg_monthly_rentals
FROM
(
SELECT
YEAR(rental_date) AS rental_year,
MONTHNAME(rental_date) AS rental_month,
CONCAT(YEAR(rental_date),'-',MONTH(rental_date),'-01') AS rental_month_start,
f.rating AS rating,
COUNT(rental_id) AS monthly_rentals
FROM rental r
INNER JOIN inventory i
ON r.inventory_id = i.inventory_id
INNER JOIN film f
ON i.film_id = f.film_id
GROUP BY rental_year, rental_month, rental_month_start, rating
) mr
GROUP BY rental_year, rental_quarter, rating
ORDER BY 1, 2;