Bike Share Marketing Campaigns Analysis using R or Python

ETL Marketing Analysis Data Visualization R Python

Investigate a bike-share company’s marketing campaigns to create a new one that is more profitable for the company.

Miftahul Hadi
Jan 29, 2023

🧾Information 🧾

For presentable analysis and script code in R & Python, click Appendix

Introduction

Scenario

Welcome to the Cyclistic bike-share analysis case study!

Fig. 1. Bike-share station

In this case study, I am here as a junior data analyst working in the marketing team at Cyclistic, a fictional bike-share company in Chicago. Lily Moreno, the marketing director, believes that the company’s future success depends on maximizing the number of annual memberships.

To ride a Cylcistic bike, you can unlock it from one station and return it to any other station in the system. There are a few pricing plans to purchase:

Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Our task here is to understand how casual riders and annual members use Cyclistic bikes differently. The marketing team will design a new marketing strategy from these insights to convert casual riders into annual members. But, Cyclistic executives must approve your recommendations with compelling data insights and visualization.

More about company’s details

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geo-tracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system at any time.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign targeting all-new customers, Moreno believes there is a good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Objectives

Identify how casual and members riders use Cyclistic bikes differently to maximize the number of annual memberships.

Preparation

Here are a few tools used in this analysis:

Processing the Data

Here’s a step by step of processing raw data to be prepared for analysis:

Import your data

To connect the data to the R environment, the step as follows:

library(tidyverse)
library(ggplot2)
library(lubridate)
library(readxl)
#setwd("C:/Users/LENOVO/Documents/bike-project/xlsx-version")
#setwd is not used because of web development circumstances
m1_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202201-divvy-tripdata.xlsx")
m2_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202202-divvy-tripdata.xlsx")
m3_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202203-divvy-tripdata.xlsx")
m4_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202204-divvy-tripdata.xlsx")
m5_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202205-divvy-tripdata.xlsx")
m6_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202206-divvy-tripdata.xlsx")
m7_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202207-divvy-tripdata.xlsx")
m8_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202208-divvy-tripdata.xlsx")
m9_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202209-divvy-tripdata.xlsx")
m10_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202210-divvy-tripdata.xlsx")
m11_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202211-divvy-tripdata.xlsx")
m12_2022 <- read_xlsx("C:/Users/LENOVO/Documents/bike-project/xlsx-version/202212-divvy-tripdata.xlsx")

#("C:/Users/LENOVO/Documents/bike-project/xlsx-version") still used because of web development circumstances
Make columns consistent and merge into a single data frame

To make columns ready to analysis, here’s the step:

colnames(m1_2022)
colnames(m2_2022)
colnames(m3_2022)
colnames(m4_2022)
colnames(m5_2022)
colnames(m6_2022)
colnames(m7_2022)
colnames(m8_2022)
colnames(m9_2022)
colnames(m10_2022)
colnames(m11_2022)
colnames(m12_2022)
str(m1_2022)
str(m2_2022)
str(m3_2022)
str(m4_2022)
str(m5_2022)
str(m6_2022)
str(m7_2022)
str(m8_2022)
str(m9_2022)
str(m10_2022)
str(m11_2022)
str(m12_2022)
all_trips <- bind_rows(m1_2022, m2_2022, m3_2022, 
                       m4_2022, m5_2022, m6_2022, 
                       m7_2022, m8_2022, m9_2022, 
                       m10_2022, m11_2022, m12_2022)
Clean the data

After combine all month trips data, we can clean them to be ready to analysis. Here’s the step:

all_trips <- all_trips %>%
  select(-c(start_lat, start_lng, end_lat, end_lng))
colnames(all_trips)   #list of column names
nrow(all_trips)       #total row
dim(all_trips)        #total dimension
head(all_trips)       #view first 6 rows
str(all_trips)        #see list of data types
summary(all_trips)    #statistical summary
unique(all_trips$member_casual) #for unique value
[1] "casual" "member" NA      
table(all_trips$member_casual)  #total unique value

 casual  member 
2322031 3345683 
all_trips <- all_trips %>% 
mutate(member_casual = recode(member_casual, "Subscriber" = "member", "Customer" = "casual"))
all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
str(all_trips)
is.numeric(all_trips$ride_length) #to check if the ride length was in number/factor
[1] FALSE
all_trips$ride_length <- as.numeric(all_trips$ride_length)
is.numeric(all_trips$ride_length)
[1] TRUE
all_trips <- select(all_trips, -c(ride_of_length))
all_trips_v2 <- subset(all_trips, ride_length>0)     #include only ride length over 0 s
Conduct descriptive analysis

In order to identify trends and relationships in the data, we can conduct descriptive analysis. Here’s the steps:

mean(all_trips_v2$ride_length)
[1] 1166.846
median(all_trips_v2$ride_length)
[1] 617
max(all_trips_v2$ride_length)
[1] 2483235
min(all_trips_v2$ride_length)
[1] 1
summary(all_trips_v2$ride_length)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1     349     617    1167    1108 2483235 
member_compare <- all_trips_v2 %>%                #call dataset
  drop_na(member_casual) %>%                      #drop empty data
  group_by(membership_type = member_casual) %>%   #group the data by members type
  summarize(average_ride_length = mean(ride_length), number_of_rides = n())  #summarize them with avg and number of rides

View(member_compare)      #view new data frame
#Reorder the levels of day_of_week column with Sunday as the first

all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

#Summarize them based on day of the week

avg_member_per_day <- all_trips_v2 %>%
  drop_na(member_casual) %>% 
  group_by(membership_type = member_casual, day_of_week) %>% 
  summarize(average_ride_length = mean(ride_length))

View(avg_member_per_day)
### first check the ride_id duplicate to count number of ride
sum(duplicated(all_trips_v2$ride_id))
[1] 0
### if zero, lets goo
membertype_per_day <- all_trips_v2 %>%
  drop_na(member_casual) %>% 
  group_by(membership_type = member_casual, day_of_week) %>% 
  summarise(number_of_rides = n(), average_ride_length = mean(ride_length)) %>% #n() for counting 
  arrange(day_of_week)

View(membertype_per_day)
Design a data visualization

For the next phase, we visualize what has been describe in the analysis process. Here’s the step:

ggplot(data = membertype_per_day, mapping = aes(x = day_of_week, y = (number_of_rides/1000), fill = membership_type)) + 
  geom_col(position = "dodge") +
  labs(title = "Number of Rides by Membership Type in 2022", 
       x = "", y = "Number of Rides<br><span style = 'font-size:8pt'>(in thousand)</span>", 
       fill = "Membership Type")+
  theme(axis.title.y = ggtext::element_markdown())
#save the data viz in .png format
ggsave(filename = "number-of-ride.png", width = 6.86, height = 4.11, dpi=300)
ggplot(data = membertype_per_day, mapping = aes(x = day_of_week, y = average_ride_length, fill = membership_type)) + 
  geom_col(position = "dodge") +
  labs(title = "Average Ride Length by Membership Type in 2022", 
       x = "", y = "Average Ride Length<br><span style = 'font-size:8pt'>(in second)</span>", 
       fill = "Membership Type")+
  theme(axis.title.y = ggtext::element_markdown())
#save the data viz in .png format
ggsave(filename = "avg-length.png", width = 6.86, height = 4.11, dpi=300)
#filter the dataframe by only casual riders
highest_day_nor <- membertype_per_day %>% 
  filter(membership_type == "casual") %>% 
  arrange(desc(number_of_rides))

  ggplot(data = highest_day_nor, mapping = aes(x = (number_of_rides/1000), y = (day_of_week = reorder(day_of_week, number_of_rides)))) + 
    geom_col() +
    labs(title = "Number of Rides Per Day by Casual Riders",
        x = "Number of Rides<br><span style = 'font-size:8pt'>(in thousand)</span>", y = "")+
    theme(axis.title.x = ggtext::element_markdown())
#save the graph  
ggsave(filename = "high-nor-day.png", width = 6.86, height = 4.11, dpi=300)
Export a summary file for further analysis

To export a summary data frame into a csv format, here’s the code

write.csv(member_compare, file = "all_member_compare.csv")
write.csv(membertype_per_day, file = "by_day_member_compare.csv")

Analysis

The Cyclistic bikes usage difference by rider type

To understand each rider behavior of using bikes in 2022, we need to utilize number of rides and ride duration data. This will helps describe how they differ from each other as we can see from Fig. 2.

Fig. 2. Casual and members riders difference based on number of rides and ride duration.

In Fig. 2, we see a comparison between two types of rider that using Cyclistic bikes. Then compare them based on number of rides and average ride length. We discovered that members riders have a higher number of rides than casual riders. However, casual riders tend to have a longer ride duration (1,749 second) even with less number of rides (2.321 mil).

Number of rides by day

This section will explain about how frequently they use Cyclistic bikes during the week. This will assist Cyclistic marketing to determining which day they are most likely to ride. We can see more detail in this Fig. 3.

Fig. 3. The number of rides done by each rider during the week.

This graph Fig. 3. above illustrates the number of rides done from Sunday to Saturday throughout 2022. Casual riders are represented by the red column, while Cyclistic members are represented by the blue column. From this graph Fig. 3. , we discovered that casual riders maintained a lower number of rides than members from Monday to Friday. On weekends, however, casual riders have a slightly higher number of rides than members. This suggest that casual rider less likely to ride on weekday.

Average ride length by day

This section will explain about how longer they ride using Cyclistic bikes during the week. This will assist Cyclistic marketing to determining which day they are most likely to ride longer. We can see more detail in this Fig. 4.

Fig. 4. The average ride lengths done by each rider during the week.

This graph Fig. 4. above illustrates the average ride length done during the week throughout 2022. From this graph Fig. 4. , we observed that casual riders shown longer ride lengths than members every day. In facts, casual riders have a longer ride on weekday, even with less number of rides than members. This suggest that casual rider more likely to ride longer.

Potential days to promote membership program

In this section, we observed the number of rides taken by casual riders. This data will help us to determine potential days to promote the membership program to casual riders who are already familiar with Cyclistic. This will also help the company generate more annual members, which will be beneficial. Here is Fig. 5. about number of rides per day by casual rider.

Fig. 5. The number of rides per day by casual riders.

This graph Fig. 5. above illustrates the number of rides taken by casual riders during the week in 2022. From this graph Fig. 5. , we observed that casual riders show a higher number of ride on weekend. On Saturday, 473 thousand ride are completed. On the other hand, weekdays appear to have fewer rides, ranging from 264 to 334 thousand rides. This suggest that on weekends, there are more casual riders.

Why casual riders do not convert to members?

In this section, we create a hypothesis after identifying how casual and members use Cyclistic bikes differently. Here are a few potential causes of why casual riders do not purchase membership programs:

Marketing mismatch

Our first hypothesis is that casual riders do not join the membership program due to a marketing mismatch. So, we need to analyze the Cyclistic marketing strategies in 2022. To realize this, we need more data required such as:

After gathering data, we can determine whether there are significantly different methods of discovering Cyclistic for casual riders and members. Then we look at how digital media influences casual riders to become members.

Incentive mismatch

Our second hypothesis is that casual riders are not interested by current membership benefits as they can obtain them as a casual. As a result, we need to investigate the membership plan. To accomplish this, we need to take steps such as:

After breakdown the data we have and conducting a survey, we can understand how casual and member riders use the bikes differently. As a result, we understand why the use of Cyclistic bikes varies by rider type.

Recommendation

After conducting a descriptive analysis to identify how different riders use Cyclistic, our strategies to increase annual membership as follow:

Define new membership program main benefits

As we learn before, casual riders have a longer ride duration. We can then design pricing plans based on limiting the ride duration. So, more people have the urge to be Cyclistic members with unlimited duration benefits.

Create new pricing plans based on ride length for casual riders

Based on casual riders longer ride length, we can create new pricing plans using The Decoy Effect strategies. For casual, make the pricing plans based on limited ride duration. This will make the membership plan more profitable.

Design a marketing strategies that highlight new membership benefits

Following the establishment of pricing plans, we can plan the marketing strategy on a day with higher number of rides by casual riders (Saturday and Sunday). Highlight the main benefits of members in a targeted manners.

Test and learn

Run the program on weekly basis to evaluate which pricing plans are working and which aren’t. Create a KPI’s to define a successful program (e.g. new members growth, number of rides by casual, day of the week, etc.)

Appendix

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Hadi (2023, Jan. 29). DataHadi: Bike Share Marketing Campaigns Analysis using R or Python. Retrieved from https://miftahulhadii.github.io/index.html/posts/2023-01-29-bike-share-analysis-using-python-r/

BibTeX citation

@misc{hadi2023bike,
  author = {Hadi, Miftahul},
  title = {DataHadi: Bike Share Marketing Campaigns Analysis using R or Python},
  url = {https://miftahulhadii.github.io/index.html/posts/2023-01-29-bike-share-analysis-using-python-r/},
  year = {2023}
}