Bellabeat is a wellness company headquartered in San Francisco that develops wearable computers for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market.
Analyze data of non-Bellabeat consumers’ use of their health tracking devices to identify potential growth opportunities and give recommendations for the next steps of the marketing strategy.
The data for this analysis comes from this FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty Fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
Setting up my R environment by installing and loading the ‘tidyverse’ and ‘readr’ packages
install.packages("tidyverse")
install.packages("readr")
install.packages("dplyr")
install.packages("ggplot2")
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)
The data was imported and turned into data frames with simplified names for a more straightforward analysis
library(readr)
steps <- read_csv("Zip Data/dailySteps_merged.csv")
activity <- read_csv("Zip Data/dailyActivity_merged.csv")
calories <- read_csv("Zip Data/dailyCalories_merged.csv")
intensities <- read_csv("Zip Data/dailyIntensities_merged.csv")
sleep <- read_csv("Zip Data/sleepDay_merged.csv")
weight <- read_csv("Zip Data/weightLogInfo_merged.csv")
I already viewed and explored the data in Google Sheets. I just need to make sure that everything imported correctly by using View() and head() functions.
head(activity)
## # A tibble: 6 × 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2016 13162 8.5 8.5 0
## 2 1.50e9 4/13/2016 10735 6.97 6.97 0
## 3 1.50e9 4/14/2016 10460 6.74 6.74 0
## 4 1.50e9 4/15/2016 9762 6.28 6.28 0
## 5 1.50e9 4/16/2016 12669 8.16 8.16 0
## 6 1.50e9 4/17/2016 9705 6.48 6.48 0
## # … with 9 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>
view(intensities)
Using the n_distinct() function to determine which Fitbit features were used more than others.
n_distinct(activity$Id)
## [1] 33
n_distinct(calories$Id)
## [1] 33
n_distinct(intensities$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(steps$Id)
## [1] 33
n_distinct(weight$Id)
## [1] 8
These distinctions summarized that 100% of users (33) all used the ‘Activity’, ‘Calories’, ‘Intensities’, and ‘Steps’ features. About 73% of users (24) used the ‘Sleep’ feature and only 24% of users (8) use the ‘Weight Log’ feature.
activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes, Calories) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes Calories
## Min. : 0 Min. : 0.000 Min. : 0.0 Min. : 0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8 1st Qu.:1828
## Median : 7406 Median : 5.245 Median :1057.5 Median :2134
## Mean : 7638 Mean : 5.490 Mean : 991.2 Mean :2304
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5 3rd Qu.:2793
## Max. :36019 Max. :28.030 Max. :1440.0 Max. :4900
sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.000 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.0
## Median :1.000 Median :433.0 Median :463.0
## Mean :1.119 Mean :419.5 Mean :458.6
## 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.000 Max. :796.0 Max. :961.0
Some Interesting Notes From These Summaries:
The average total steps per day is 7,638 which is slightly lower than the CDC’s recommendation. It was found that 8,000 steps per day was associated with a 51% lower risk for all-cause mortality. Taking 12,000 steps per day was associated with a 65% lower risk compared with taking 4,000 steps.
On average, participants sleep 1 time per day for nearly 7 hours exactly. This meets the CDC’s recommendation of sleep for most adults.
We can classify the users into ‘Sedentary’, ‘Lightly Active’, ‘Fairly Active’ and ‘Very Active’ categories by considering their daily steps. This helps determine what types of people generally use health tracking devices.
steps_new <- mutate(steps, Category = ifelse(StepTotal < 5000,
"Sedentary",
ifelse(StepTotal %in% 5000:7499,
"Lightly Active",
ifelse(StepTotal %in% 7500:9999,
"Fairly Active",
ifelse(StepTotal >= 10000, "Very Active",
"NA")))))
view(steps_new)
Calculating the percentages of each user category to determine what activity level is the most common in this Fitbit user sample.
categories <- c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")
percentages <- c(round((sum(steps_new$Category == 'Sedentary')/
nrow(steps_new))*100, 2),
round((sum(steps_new$Category == 'Lightly Active')/
nrow(steps_new))*100, 2),
round((sum(steps_new$Category == 'Fairly Active')/
nrow(steps_new))*100, 2),
round((sum(steps_new$Category == 'Very Active')/
nrow(steps_new))*100, 2))
category_percentages <- data.frame(categories, percentages)
This shows that the 2 outer groups (‘Sedentary’ and ‘Very Active’) are both at at about 32%, whereas the 2 inner groups (‘Lightly Active’ and ‘Fairly Active’) have fewer users with 17-18%.
Using the merge() function to join two data frames to determine if there is a direct correlation between daily steps / user category and daily calories burned.
steps_calories <- merge(x = steps_new, y = calories, all = TRUE)
Using the cor() function to determine if there is a positive correlation between steps taken and calories burned.
cor(x = steps_calories$StepTotal, y = steps_calories$Calories)
## [1] 0.5915681
ggsave("steps_calories_plot.png")
## Saving 7 x 5 in image
ggsave("steps_average_plot.png")
## Saving 7 x 5 in image