Neil McDonald PhD
Cyclistic Capstone project
Below is a project I completed as part of the Google data analytics professional certificate. The certificate offered three options (two defined projects set by Google and a chance to choose your own project) from which to complete a capstone project and this is the first I will also be completing the second project offered as an extra portfolio project before using the skills gained to begin to explore other data projects outside of the course.
​
Project Outline - How to Cyclistic members and casual users differ in there use?
A company named Cyclistic is looking at maximizing membership by converting casual users to member. The big question is how do casual users and members differ in the usage of the bikeshare program. For this project I utilized R programming to perform data cleaning and analysis as well the majority of the data visualization. Other visualization and data collection was performed in Excel. The code used can be found at https://www.kaggle.com/code/neilmcdonald/cyclistic-capstone-project
​
The data itself comes from the DIVVY dataset and is organized by month, this posed the first problem as the project specifies that analysis is required from the previous 12 months. The first step was to therefore collate the data into one single data frame. Prior to doing this however the first step is explore the data and ensure that all columns are correctly named so that the dataframes can be collated. With this complete the separate data frames can be bound together into a single data frame named all_trips. I could then remove all of the unnecessary columns from the data frame that were not required and begin to change the formats of the start/end time columns formatting them using the datetime format. This allowed me to calculate the ride length for each trip in addition to the date and weekday that it took place one. Later in the project I would use the same steps as above to collate different months together allowing me to analyze the data by season in addition to as one whole set.
​
Following this is was important to ensure that I was only analyzing data from active cycles and trips longer than 0 seconds. A quick sort found that no stations existed representing a depo or workshop as had appeared in previous editions of the data. With this sorted I proceeded to create a new dataset using the following code:
all_trips_v2 <- all_trips[!(all_trips$ride_length<0),]
​
This allowed me to remove all trips less than 0 seconds in ride length. The final organizational steps were to aggregate the data so as to compute the statistics relating the usertype and ride length by weekday. This requires aggregating ride length and usertype as well as ordering the data by weekday. With this complete the analysis of the data could be completed.
The first analysis I completed was to look at the break down of usertypes. This is shown below in figure 1. As you can see the majority of rides completed (57%) are members of the program suggesting a large uptake in membership by users.
.
​
​
​
​
​
​
​
Figure 1. Total number of rides for members (blue) and casual users (orange).
​
Next, I explored the number of rides by Usertype on each weekday to see if this differed between casual users and members. The results of this are shown in figure 2. This shows that the number of rides by members is lowest on weekends and highest on Tuesday, Wednesday and Thursday. Whilst casual users peak on Saturday and Sunday with a large reduction seen during the week. This suggest that casual users are using this service more for recreational purposes than members.
​
​
​
​
​
Figure 2: Total number of rides for members (blue) and casual users (pink) per weekday.
​
To investigate this I plotted the average ride length for Usertype on each weekday. As can be seen in Figure 3, the average ride length for members is fairly constant with a mean of 780 seconds (13 Minutes) across the week. Casual users have much longer average ride times with a mean of 1711 seconds (29 minutes) across the week. This is higher as see before when only Saturday and Sunday are analyzed with members having a mean of 873 seconds (15 Minutes) and casual users having a Figure mean of 1974 seconds (33 Minutes).
​
​
​
​
​
Figure 3: Average ride duration for casual users (pink) and members (blue) per each day of the week.
​
From this it is possible to conclude that whilst the majority of users are members there is a large number of non-members who are likely to use the service for longer rides on the weekends. Members are also more likely to ride during the week suggesting that this may be part of their daily commute.
To explore this further the data was split into different seasons. This was done by grouping months together as follows.
Autumn – September, October, and November
Winter – December, January, and February
Spring – March, April, and May
Summer – June, July, and August
Below is a representation of how members and casual users use the Cyclistic service through the different seasons. As figure 4 shows the number of users in total significantly drops in the winter and spring months with the highest usage being between Jun-Nov. It is these months were casual usage is at its highest and therefore where campaigns aimed at converting casual users into members would be most effective. Figure 4 also shows that in both the summer and autumn months the difference between the number of rides by casual users and number of rides by members is at its lowest. With 53% if rides being made by members and 47% being made by casual users, compared to the winter months when 73% of rides are made by members and only 27% by casual users.
​
A B C D
E
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Figure 4: Breakdown of usage per Usertype for each season. (A) Autumn (Sep-Nov), (B) Winter (Dec-Feb), (C) Spring (Mar-May) and (D) Summer (Jun-Aug). Casual users are shown in orange, members are shown in Blue. (E) Total number of rides per season for casual users (pink) and members (blue)
​
Having compared the number of rides per season, further analysis was then carried out to see if the findings shown in figure 2, stay the same across the different seasons. As can be seen in figure 5 below. In all cases casual users ride on average longer than members with the longest highest averages seen on weekends for both casual users and members.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Figure 5: The average duration of rides throughout Autumn(top left), winter (top right), Spring (bottom left) and summer (bottom right) for casual users (pink) and members (blue).
From the data shown above it is possible to conclude that:
-
Members use the Cyclistic bikeshare service more during the week and casual users are more likely to use the service at weekends
-
Members ride on average for shorter durations than casual users
-
The total number of rides drops during the winter months before rising through spring and peaking in summer and autumn.
The objective of this project was to identify how casual user and members use the Cyclistic bikeshare program differently with the aim of identifying how Cyclistic can increase its membership.
​
The recommendations that I would suggest are as follows,
-
Introduce a monthly/half yearly membership option. Casual riders use this service more during the summer and autumn months meaning that they are more likely to by a pass during these months than opt for a full year pass.
-
Media and digital marketing should focus on leisure activities as the majority of casual users use the service at the weekend.
-
Casual users ride for longer durations than members. Digital marketing should therefore focus on the health benefits of cycling and the different types of bikes on offer to members and casual users.
​
​
​
​











