Country Club Data Warehouse Design in MySQL
Summary
- Goal: Design and create a data warehouse for Blue Hill Country Club’s database
- Data: Membership and transactional data related to one year of operations of Blue Hill Country Club
- Tool: MySQL
- Steps:
- Build two versions of data warehouses both containing relevant membership and revenue data, one in long shape and the other in wide shape
- Create ad-hoc analytics queries using the data warehouse built in the first step to answer relevant business questions the management team may be interested in.
Scenario
In this project, my task was to design and create a data warehouse for Blue Hill Country Club’s database. The goal was to design a data warehouse that would add value to the organization in terms of ease of use (for a technically competent user but not necessarily a programmer) and efficacy in identifying relevant patterns and trends.
Data
The data I have access to is the membership and transactional data related to one year of operations of Blue Hill Country Club. Below is the EER schema of this database as well as the simple description of each table.
Membership data tables:
- memberships – one record per membership
- members - one record per person (multiple family members can be included on the same membership account)
Revenue data tables:
- dining – transaction records from dining room
- golf – transaction records from golf course
- pool – transaction records from pool
- tennis – transaction records from tennis courts
- other - transaction records from miscellaneous activities
Other data tables:
- promoone – list of members who signed up for one of the club’s promotional offers
- promotwo - list of members who signed up for the second of the club’s promotional offers
- special - A series of Boolean variables indicating whether the member attended one of 4 special functions in 2018.
Data Warehouse Design
After exploring the dataset, I decided to design a data warehouse that includes the following information in order for future users to easily extract relevant data and identify patterns about memberships and revenue:
- Membership unique identifier: Member ID
- Membership basic information: Membership Type, Years of Membership, Number of Family, Number of Children
- Revenue data: Revenue type (Dining, Pool, Golf, Tennis, Other), Revenue Amount
- Promotion data: PromoOne, PromoTwo, Other Special Events
For future different use purposes, I designed two versions of data warehouse containing the same information: a long shape and a wide shape format.
##### Long data warehouse #####
CREATE TABLE revenue_long AS
SELECT Member_Number, SUM(Revenue) AS Revenue, Revenue_Type
FROM
(
SELECT Member_Number, Total AS Revenue, "Dining" AS Revenue_Type FROM dining
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Golf" AS Revenue_Type FROM Golf
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Pool" AS Revenue_Type FROM Pool
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Tennis" AS Revenue_Type FROM Tennis
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Other" AS Revenue_Type FROM Other
) AS members
GROUP BY Member_Number, Revenue_Type;
CREATE TABLE dw_long AS
SELECT memberships.member_number, membership_type,
(2019 - Year_Joined) AS Years,
COALESCE(Number_Family, 0) AS Number_Family,
COALESCE(Number_Children,0) AS Number_Children,
Revenue, Revenue_Type,
CASE WHEN promoone.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoOne,
CASE WHEN promotwo.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoTwo,
COALESCE(`Private Function`,0) AS Private_Function,
COALESCE(`4th of July`,0) AS July_Fourth,
COALESCE(Thanksgiving,0) AS Thanksgiving,
COALESCE(`Easter Brunch`,0) AS Easter_Brunch
FROM memberships
LEFT JOIN revenue_long ON memberships.Member_Number = revenue_long.Member_Number
LEFT JOIN promoone ON memberships.Member_Number = promoone.Member_Number
LEFT JOIN promotwo ON memberships.Member_Number = promotwo.Member_Number
LEFT JOIN special ON memberships.Member_Number = special.Member_Number
LEFT JOIN (SELECT Member_Number, COUNT(*) AS Number_Family
FROM members
GROUP BY Member_Number) AS family
ON memberships.Member_Number = family.Member_Number
LEFT JOIN ( SELECT Member_Number, COUNT(*) AS Number_Children
FROM members
WHERE Relationship_to_Member = "Child"
GROUP BY Member_Number) AS children
ON memberships.Member_Number = children.Member_Number
ORDER BY memberships.Member_Number;
Long Shape: one record per person
##### Wide data warehouse #####
CREATE TABLE revenue AS
SELECT Member_Number, SUM(Pool) AS Pool, SUM(Golf) AS Golf, SUM(Tennis) AS Tennis,
SUM(Dining) AS Dining, SUM(Other) AS Other
FROM
(SELECT Member_Number, Total AS Dining, 0 AS Golf, 0 AS Pool, 0 AS Tennis, 0 AS Other FROM dining
UNION ALL
SELECT Member_Number, 0 AS Dining, Amount AS Golf, 0 AS Pool, 0 AS Tennis, 0 AS Other FROM Golf
UNION ALL
SELECT Member_Number, 0 AS Dining, 0 AS Golf, Amount AS Pool, 0 AS Tennis, 0 AS Other FROM Pool
UNION ALL
SELECT Member_Number, 0 AS Dining, 0 AS Golf, 0 AS Pool, Amount AS Tennis, 0 AS Other FROM Tennis
UNION ALL
SELECT Member_Number, 0 AS Dining, 0 AS Golf, 0 AS Pool, 0 AS Tennis, Amount AS Other
FROM Other) AS members
GROUP BY Member_Number;
CREATE TABLE dw AS
SELECT memberships.member_number, membership_type,
(2019 - Year_Joined) AS Years,
COALESCE(Number_Family, 0) AS Number_Family,
COALESCE(Number_Children,0) AS Number_Children,
Pool, Golf, Tennis, Dining, Other,
CASE WHEN promoone.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoOne,
CASE WHEN promotwo.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoTwo,
COALESCE(`Private Function`,0) AS Private_Function,
COALESCE(`4th of July`,0) AS July_Fourth,
COALESCE(Thanksgiving,0) AS Thanksgiving,
COALESCE(`Easter Brunch`,0) AS Easter_Brunch
FROM memberships
LEFT JOIN revenue ON memberships.Member_Number = revenue.Member_Number
LEFT JOIN promoone ON memberships.Member_Number = promoone.Member_Number
LEFT JOIN promotwo ON memberships.Member_Number = promotwo.Member_Number
LEFT JOIN special ON memberships.Member_Number = special.Member_Number
LEFT JOIN (SELECT Member_Number, COUNT(*) AS Number_Family
FROM members
GROUP BY Member_Number) AS family
ON memberships.Member_Number = family.Member_Number
LEFT JOIN ( SELECT Member_Number, COUNT(*) AS Number_Children
FROM members
WHERE Relationship_to_Member = "Child"
GROUP BY Member_Number) AS children
ON memberships.Member_Number = children.Member_Number
ORDER BY memberships.Member_Number;
Wide Shape: one record per membership
Analytics Queries
Using the designed data warehouse, I wrote the following analytics queries to explore some business questions that the management may be interested in:
Query 1
The management team might want to know different revenue patterns for different membership type groups. My approach is to aggregate the membership type to get their average spending on different activities.
# Query1
SELECT
Membership_Type,
COUNT(*),
SUM(Pool+golf+Tennis+Dining+Other) AS total_revenue,
AVG(Pool+golf+Tennis+Dining+Other) AS avg_revenue,
AVG(Dining),AVG(Golf),AVG(Tennis),AVG(Pool),AVG(Other)
FROM
dw
GROUP BY Membership_Type
ORDER BY COUNT(*) DESC;
From the table below we can see that Couples spent more on dining on average, family spent more on pool activities, while retirees spent more on golf and tennis.
Query 2
The management team might also want to explore the revenue difference of different family structures (# of members, w/o children, # of children). The following query is to aggregate the number of family and children info to get the average spending of each family structure type.
# Query2
SELECT
Number_Family,
Number_Children,
COUNT(*),
AVG(Pool+golf+Tennis+Dining+Other) AS avg_amount
FROM
dw
GROUP BY Number_Family, Number_Children
ORDER BY avg_amount DESC;
From the table below we can see that membership with 2 members without children and individual members spent on average the most.
Query 3
The purpose of next query is to analyze different special function patterns for different membership type groups.
# Query3
SELECT
Membership_Type, COUNT(*),
AVG(Private_Function), AVG(July_Fourth),AVG(Thanksgiving), AVG(Easter_Brunch)
FROM
dw
GROUP BY Membership_Type
ORDER BY COUNT(*) DESC;
From the table we can see that Family’s attendance for Private Function, 4th of July and Easter Brunch were the highest among all groups. Retirees attended Thanksgiving the most.
Query 4
Finally, I wanted to explore the marketing effects of Promotions 1&2 on different activities using the following query.
# promo 1, 2
SELECT
membership_type, PromoOne,AVG(Dining),AVG(Golf),AVG(Tennis),AVG(Pool),AVG(Other)
FROM
dw
GROUP BY membership_type , PromoOne
ORDER BY membership_type , PromoOne;
SELECT
membership_type,PromoTwo,AVG(Dining),AVG(Golf),AVG(Tennis),AVG(Pool),AVG(Other)
FROM
dw
GROUP BY membership_type , PromoTwo
ORDER BY membership_type , PromoTwo;
From the table we can see that members who signed up for Promo One had largely increased dining revenue.
The following chart shows more clearly the promotion effects on dining.
As for Promotion2, Members who signed up for Promo Two had greatly increased Other Revenue.
Key Words: SQL, Data Warehouse, ETL, Database Management, Ad-hoc Querying