Country Club Data Warehouse Design in MySQL

Summary

Scenario

In this project, my task was to design and create a data warehouse for Blue Hill Country Club’s database. The goal was to design a data warehouse that would add value to the organization in terms of ease of use (for a technically competent user but not necessarily a programmer) and efficacy in identifying relevant patterns and trends.

Data

The data I have access to is the membership and transactional data related to one year of operations of Blue Hill Country Club. Below is the EER schema of this database as well as the simple description of each table.

Membership data tables:

Revenue data tables:

Other data tables:

Data Warehouse Design

After exploring the dataset, I decided to design a data warehouse that includes the following information in order for future users to easily extract relevant data and identify patterns about memberships and revenue:

For future different use purposes, I designed two versions of data warehouse containing the same information: a long shape and a wide shape format.

##### Long data warehouse #####

CREATE TABLE revenue_long AS
SELECT Member_Number, SUM(Revenue) AS Revenue, Revenue_Type
FROM
(
SELECT Member_Number, Total AS Revenue, "Dining" AS Revenue_Type FROM dining
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Golf" AS Revenue_Type FROM Golf
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Pool" AS Revenue_Type FROM Pool
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Tennis" AS Revenue_Type FROM Tennis
UNION ALL
SELECT Member_Number, Amount AS Revenue, "Other" AS Revenue_Type FROM Other
) AS members
GROUP BY Member_Number, Revenue_Type;

CREATE TABLE dw_long AS
SELECT memberships.member_number, membership_type,
		(2019 - Year_Joined) AS Years,
		COALESCE(Number_Family, 0) AS Number_Family,
		COALESCE(Number_Children,0) AS Number_Children,
		Revenue, Revenue_Type, 
		CASE WHEN promoone.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoOne,
		CASE WHEN promotwo.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoTwo,
        COALESCE(`Private Function`,0) AS Private_Function,
        COALESCE(`4th of July`,0) AS July_Fourth,
        COALESCE(Thanksgiving,0) AS Thanksgiving,
        COALESCE(`Easter Brunch`,0) AS Easter_Brunch
FROM memberships 
	LEFT JOIN revenue_long ON memberships.Member_Number = revenue_long.Member_Number
    LEFT JOIN promoone ON memberships.Member_Number = promoone.Member_Number
    LEFT JOIN promotwo ON memberships.Member_Number = promotwo.Member_Number
    LEFT JOIN special ON memberships.Member_Number = special.Member_Number
    LEFT JOIN (SELECT Member_Number, COUNT(*) AS Number_Family
				FROM members
                GROUP BY Member_Number) AS family
			ON memberships.Member_Number = family.Member_Number
    LEFT JOIN (	SELECT Member_Number, COUNT(*) AS Number_Children 
				FROM members
				WHERE Relationship_to_Member = "Child"
				GROUP BY Member_Number) AS children
		ON memberships.Member_Number = children.Member_Number
ORDER BY memberships.Member_Number; 

Long Shape: one record per person

##### Wide data warehouse #####

CREATE TABLE revenue AS
SELECT Member_Number, SUM(Pool) AS Pool, SUM(Golf) AS Golf, SUM(Tennis) AS Tennis,
		SUM(Dining) AS Dining, SUM(Other) AS Other
FROM
(SELECT Member_Number, Total AS Dining, 0 AS Golf, 0 AS Pool, 0 AS Tennis, 0 AS Other FROM dining
UNION ALL
SELECT Member_Number, 0 AS Dining, Amount AS Golf, 0 AS Pool, 0 AS Tennis, 0 AS Other FROM Golf
UNION ALL
SELECT Member_Number, 0 AS Dining, 0 AS Golf, Amount AS Pool, 0 AS Tennis, 0 AS Other FROM Pool
UNION ALL
SELECT Member_Number, 0 AS Dining, 0 AS Golf, 0 AS Pool, Amount AS Tennis, 0 AS Other FROM Tennis
UNION ALL
SELECT Member_Number, 0 AS Dining, 0 AS Golf, 0 AS Pool, 0 AS Tennis, Amount AS Other
FROM Other) AS members
GROUP BY Member_Number;

CREATE TABLE dw AS
SELECT memberships.member_number, membership_type,
		(2019 - Year_Joined) AS Years,
		COALESCE(Number_Family, 0) AS Number_Family,
		COALESCE(Number_Children,0) AS Number_Children,
		Pool, Golf, Tennis, Dining, Other, 
		CASE WHEN promoone.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoOne,
		CASE WHEN promotwo.Member_Number IS NULL THEN 0 ELSE 1 END AS PromoTwo,
        COALESCE(`Private Function`,0) AS Private_Function,
        COALESCE(`4th of July`,0) AS July_Fourth,
        COALESCE(Thanksgiving,0) AS Thanksgiving,
        COALESCE(`Easter Brunch`,0) AS Easter_Brunch
FROM memberships 
	LEFT JOIN revenue ON memberships.Member_Number = revenue.Member_Number
    LEFT JOIN promoone ON memberships.Member_Number = promoone.Member_Number
    LEFT JOIN promotwo ON memberships.Member_Number = promotwo.Member_Number
    LEFT JOIN special ON memberships.Member_Number = special.Member_Number
    LEFT JOIN (SELECT Member_Number, COUNT(*) AS Number_Family
				FROM members
                GROUP BY Member_Number) AS family
			ON memberships.Member_Number = family.Member_Number
    LEFT JOIN (	SELECT Member_Number, COUNT(*) AS Number_Children 
				FROM members
				WHERE Relationship_to_Member = "Child"
				GROUP BY Member_Number) AS children
		ON memberships.Member_Number = children.Member_Number
ORDER BY memberships.Member_Number; 

Wide Shape: one record per membership

Analytics Queries

Using the designed data warehouse, I wrote the following analytics queries to explore some business questions that the management may be interested in:

Query 1

The management team might want to know different revenue patterns for different membership type groups. My approach is to aggregate the membership type to get their average spending on different activities.

 # Query1                        

SELECT 
    Membership_Type,
    COUNT(*),
    SUM(Pool+golf+Tennis+Dining+Other) AS total_revenue,
    AVG(Pool+golf+Tennis+Dining+Other) AS avg_revenue,
    AVG(Dining),AVG(Golf),AVG(Tennis),AVG(Pool),AVG(Other)
FROM
    dw
GROUP BY Membership_Type
ORDER BY COUNT(*) DESC; 

From the table below we can see that Couples spent more on dining on average, family spent more on pool activities, while retirees spent more on golf and tennis.

Query 2

The management team might also want to explore the revenue difference of different family structures (# of members, w/o children, # of children). The following query is to aggregate the number of family and children info to get the average spending of each family structure type.

 # Query2                       

SELECT 
    Number_Family,
    Number_Children,
    COUNT(*),
    AVG(Pool+golf+Tennis+Dining+Other) AS avg_amount
FROM
    dw
GROUP BY Number_Family, Number_Children
ORDER BY avg_amount DESC;

From the table below we can see that membership with 2 members without children and individual members spent on average the most.

Query 3

The purpose of next query is to analyze different special function patterns for different membership type groups.

 # Query3
SELECT 
    Membership_Type, COUNT(*),
    AVG(Private_Function), AVG(July_Fourth),AVG(Thanksgiving), AVG(Easter_Brunch)
FROM
    dw
GROUP BY Membership_Type
ORDER BY COUNT(*) DESC; 

From the table we can see that Family’s attendance for Private Function, 4th of July and Easter Brunch were the highest among all groups. Retirees attended Thanksgiving the most.

Query 4

Finally, I wanted to explore the marketing effects of Promotions 1&2 on different activities using the following query.

 # promo 1, 2

SELECT 
    membership_type, PromoOne,AVG(Dining),AVG(Golf),AVG(Tennis),AVG(Pool),AVG(Other)
FROM
    dw
GROUP BY membership_type , PromoOne
ORDER BY membership_type , PromoOne;

SELECT 
    membership_type,PromoTwo,AVG(Dining),AVG(Golf),AVG(Tennis),AVG(Pool),AVG(Other)
FROM
    dw
GROUP BY membership_type , PromoTwo
ORDER BY membership_type , PromoTwo;

From the table we can see that members who signed up for Promo One had largely increased dining revenue.

The following chart shows more clearly the promotion effects on dining.

As for Promotion2, Members who signed up for Promo Two had greatly increased Other Revenue.

Key Words: SQL, Data Warehouse, ETL, Database Management, Ad-hoc Querying


Comments