Investigate a bike-share company’s marketing campaigns to create a new one that is more profitable for the company.
🧾Information 🧾
For presentable analysis and script code in R & Python, click Appendix
Welcome to the Cyclistic bike-share analysis case study!
In this case study, I am here as a junior data analyst working in the marketing team at Cyclistic, a fictional bike-share company in Chicago. Lily Moreno, the marketing director, believes that the company’s future success depends on maximizing the number of annual memberships.
To ride a Cylcistic bike, you can unlock it from one station and return it to any other station in the system. There are a few pricing plans to purchase:
Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Our task here is to understand how casual riders and annual members use Cyclistic bikes differently. The marketing team will design a new marketing strategy from these insights to convert casual riders into annual members. But, Cyclistic executives must approve your recommendations with compelling data insights and visualization.
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geo-tracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system at any time.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign targeting all-new customers, Moreno believes there is a good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
Identify how casual and members riders use Cyclistic bikes differently to maximize the number of annual memberships.
Here are a few tools used in this analysis:
tidyverse
, ggplot2
, lubridate
, readxl
Here’s a step by step of processing raw data to be prepared for analysis:
To connect the data to the R environment, the step as follows:
readxl
library, load the data and give the name each month.m1_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202201-divvy-tripdata.xlsx")
m2_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202202-divvy-tripdata.xlsx")
m3_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202203-divvy-tripdata.xlsx")
m4_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202204-divvy-tripdata.xlsx")
m5_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202205-divvy-tripdata.xlsx")
m6_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202206-divvy-tripdata.xlsx")
m7_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202207-divvy-tripdata.xlsx")
m8_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202208-divvy-tripdata.xlsx")
m9_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202209-divvy-tripdata.xlsx")
m10_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202210-divvy-tripdata.xlsx")
m11_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202211-divvy-tripdata.xlsx")
m12_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202212-divvy-tripdata.xlsx")
#("C:/Users/LENOVO/Documents/bike-project/xlsx-version") still used because of web development circumstances
To make columns ready to analysis, here’s the step:
Change the data types if you find a column that differs from the rest of the columns in another file.
Lastly, combine all sheets into one single data frame and named it all_trips
.
all_trips <- bind_rows(m1_2022, m2_2022, m3_2022,
m4_2022, m5_2022, m6_2022,
m7_2022, m8_2022, m9_2022,
m10_2022, m11_2022, m12_2022)
After combine all month trips data, we can clean them to be ready to analysis. Here’s the step:
unique(all_trips$member_casual) #for unique value
[1] "casual" "member" NA
table(all_trips$member_casual) #total unique value
casual member
2322031 3345683
all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
str(all_trips)
ride_length
column still not in numeric types.is.numeric(all_trips$ride_length) #to check if the ride length was in number/factor
[1] FALSE
all_trips$ride_length <- as.numeric(all_trips$ride_length)
is.numeric(all_trips$ride_length)
[1] TRUE
ride_length
column. Remove bad data to make it consistent.ride length
column, there’s some data that peculiar (minus ride length). Remove it using subset
function, because its a bad data.all_trips_v2 <- subset(all_trips, ride_length>0) #include only ride length over 0 s
In order to identify trends and relationships in the data, we can conduct descriptive analysis. Here’s the steps:
ride_length
column. For example longest & shortest ride and average ride length.mean(all_trips_v2$ride_length)
[1] 1166.846
median(all_trips_v2$ride_length)
[1] 617
max(all_trips_v2$ride_length)
[1] 2483235
min(all_trips_v2$ride_length)
[1] 1
summary(all_trips_v2$ride_length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 349 617 1167 1108 2483235
member_compare <- all_trips_v2 %>% #call dataset
drop_na(member_casual) %>% #drop empty data
group_by(membership_type = member_casual) %>% #group the data by members type
summarize(average_ride_length = mean(ride_length), number_of_rides = n()) #summarize them with avg and number of rides
View(member_compare) #view new data frame
day_of_week
column.#Reorder the levels of day_of_week column with Sunday as the first
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
#Summarize them based on day of the week
avg_member_per_day <- all_trips_v2 %>%
drop_na(member_casual) %>%
group_by(membership_type = member_casual, day_of_week) %>%
summarize(average_ride_length = mean(ride_length))
View(avg_member_per_day)
ride_id
column.### first check the ride_id duplicate to count number of ride
sum(duplicated(all_trips_v2$ride_id))
[1] 0
For the next phase, we visualize what has been describe in the analysis process. Here’s the step:
ggplot(data = membertype_per_day, mapping = aes(x = day_of_week, y = (number_of_rides/1000), fill = membership_type)) +
geom_col(position = "dodge") +
labs(title = "Number of Rides by Membership Type in 2022",
x = "", y = "Number of Rides<br><span style = 'font-size:8pt'>(in thousand)</span>",
fill = "Membership Type")+
theme(axis.title.y = ggtext::element_markdown())
#save the data viz in .png format
ggsave(filename = "number-of-ride.png", width = 6.86, height = 4.11, dpi=300)
ggplot(data = membertype_per_day, mapping = aes(x = day_of_week, y = average_ride_length, fill = membership_type)) +
geom_col(position = "dodge") +
labs(title = "Average Ride Length by Membership Type in 2022",
x = "", y = "Average Ride Length<br><span style = 'font-size:8pt'>(in second)</span>",
fill = "Membership Type")+
theme(axis.title.y = ggtext::element_markdown())
#save the data viz in .png format
ggsave(filename = "avg-length.png", width = 6.86, height = 4.11, dpi=300)
#filter the dataframe by only casual riders
highest_day_nor <- membertype_per_day %>%
filter(membership_type == "casual") %>%
arrange(desc(number_of_rides))
ggplot(data = highest_day_nor, mapping = aes(x = (number_of_rides/1000), y = (day_of_week = reorder(day_of_week, number_of_rides)))) +
geom_col() +
labs(title = "Number of Rides Per Day by Casual Riders",
x = "Number of Rides<br><span style = 'font-size:8pt'>(in thousand)</span>", y = "")+
theme(axis.title.x = ggtext::element_markdown())
#save the graph
ggsave(filename = "high-nor-day.png", width = 6.86, height = 4.11, dpi=300)
To export a summary data frame into a csv format, here’s the code
To understand each rider behavior of using bikes in 2022, we need to utilize number of rides and ride duration data. This will helps describe how they differ from each other as we can see from Fig. 2.
In Fig. 2, we see a comparison between two types of rider that using Cyclistic bikes. Then compare them based on number of rides and average ride length. We discovered that members riders have a higher number of rides than casual riders. However, casual riders tend to have a longer ride duration (1,749 second) even with less number of rides (2.321 mil).
This section will explain about how frequently they use Cyclistic bikes during the week. This will assist Cyclistic marketing to determining which day they are most likely to ride. We can see more detail in this Fig. 3.
This graph Fig. 3. above illustrates the number of rides done from Sunday to Saturday throughout 2022. Casual riders are represented by the red column, while Cyclistic members are represented by the blue column. From this graph Fig. 3. , we discovered that casual riders maintained a lower number of rides than members from Monday to Friday. On weekends, however, casual riders have a slightly higher number of rides than members. This suggest that casual rider less likely to ride on weekday.
This section will explain about how longer they ride using Cyclistic bikes during the week. This will assist Cyclistic marketing to determining which day they are most likely to ride longer. We can see more detail in this Fig. 4.
This graph Fig. 4. above illustrates the average ride length done during the week throughout 2022. From this graph Fig. 4. , we observed that casual riders shown longer ride lengths than members every day. In facts, casual riders have a longer ride on weekday, even with less number of rides than members. This suggest that casual rider more likely to ride longer.
In this section, we observed the number of rides taken by casual riders. This data will help us to determine potential days to promote the membership program to casual riders who are already familiar with Cyclistic. This will also help the company generate more annual members, which will be beneficial. Here is Fig. 5. about number of rides per day by casual rider.
This graph Fig. 5. above illustrates the number of rides taken by casual riders during the week in 2022. From this graph Fig. 5. , we observed that casual riders show a higher number of ride on weekend. On Saturday, 473 thousand ride are completed. On the other hand, weekdays appear to have fewer rides, ranging from 264 to 334 thousand rides. This suggest that on weekends, there are more casual riders.
In this section, we create a hypothesis after identifying how casual and members use Cyclistic bikes differently. Here are a few potential causes of why casual riders do not purchase membership programs:
Our first hypothesis is that casual riders do not join the membership program due to a marketing mismatch. So, we need to analyze the Cyclistic marketing strategies in 2022. To realize this, we need more data required such as:
After gathering data, we can determine whether there are significantly different methods of discovering Cyclistic for casual riders and members. Then we look at how digital media influences casual riders to become members.
Our second hypothesis is that casual riders are not interested by current membership benefits as they can obtain them as a casual. As a result, we need to investigate the membership plan. To accomplish this, we need to take steps such as:
After breakdown the data we have and conducting a survey, we can understand how casual and member riders use the bikes differently. As a result, we understand why the use of Cyclistic bikes varies by rider type.
After conducting a descriptive analysis to identify how different riders use Cyclistic, our strategies to increase annual membership as follow:
As we learn before, casual riders have a longer ride duration. We can then design pricing plans based on limiting the ride duration. So, more people have the urge to be Cyclistic members with unlimited duration benefits.
Based on casual riders longer ride length, we can create new pricing plans using The Decoy Effect strategies. For casual, make the pricing plans based on limited ride duration. This will make the membership plan more profitable.
Following the establishment of pricing plans, we can plan the marketing strategy on a day with higher number of rides by casual riders (Saturday and Sunday). Highlight the main benefits of members in a targeted manners.
Run the program on weekly basis to evaluate which pricing plans are working and which aren’t. Create a KPI’s to define a successful program (e.g. new members growth, number of rides by casual, day of the week, etc.)
Presentable data analysis → PPT
Data Analysis Process in:
R scripts → R Code
Python scripts → Python Code
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Hadi (2023, Jan. 29). DataHadi: Bike Share Marketing Campaigns Analysis using R or Python. Retrieved from https://miftahulhadii.github.io/index.html/posts/2023-01-29-bike-share-analysis-using-python-r/
BibTeX citation
@misc{hadi2023bike, author = {Hadi, Miftahul}, title = {DataHadi: Bike Share Marketing Campaigns Analysis using R or Python}, url = {https://miftahulhadii.github.io/index.html/posts/2023-01-29-bike-share-analysis-using-python-r/}, year = {2023} }