index.Rmd

---
title: "index.Rmd"
author: "Daniella Tsing, Brandon Nguyen, Eugene Kim, Roberto Raftery"
date: '2022-05-12'
output: html_document
---
### **Introduction**
The data set contains information about the indicators of diabetes for 253,680 survey respondents. Indicators include factors such as cholesterol levels, BMI, and income. The data was collected by the Centers for Disease Control and Prevention (CDC) via a telephone survey called the Behavioral Risk Factor Surveillance System (BRFSS) in 2015. The data set contains 22 columns and 253,680 rows. From this dataset, we are able to determine factors that influence an individual's likelihood of becoming diagnosed with diabetes. These factors include social and behavioral factors along with health factors, so we will also be able to analyze the intersection between an individual's social environment and an individual's physical well-being. The social factors provided in this dataset include age, income, and sex. From these categories, we may discern whether men or women are more likely to become diabetics, how income factors into healthcare, and what age group has the highest number of diabetics. By analyzing social factors, we can ask further questions to explore the cause and effect relationships created by these factors. Moreover, the dataset contains information about an individual's behavioral, such as smoking and eating habits, and how it predisposes them to diabetes. Along with health factors such as cholesterol levels, these social and behavioral patterns will give us a more well-rounded understanding of the contributors to diabetes.

### **Summary**
```{r include = FALSE}
source("summary.R", local = knitr::knit_global())
```
`r diabetes_summary`

The summary function calculates several values on the diabetes dataset, such as the highs and averages within the data. The first value of summary (`r diabetes_summary$avg_bmi_diabetes`) includes the average BMI of those who are diabetic, which allows us to look at how diabetes impacts a patient's BMI. The second (`r diabetes_summary$low_income_trend`) and third values (`r diabetes_summary$high_income_trend`) of summary includes the ratio of those in low income and high income who have diabetes, allowing us to see the trend of if having diabetes is correlated to income, in which it is. The fourth value of summary (`r diabetes_summary$age_trend`) consists of the age with the highest counts of diabetic cases, examining what age is most likely to have diabetes based on the highest amount of cases. The last value of summary (`r diabetes_summary$heart_disease_with_diabetes`) displays the likelihood for someone with diabetes to have heart disease, which is quantified from filtering diabetes cases, and analyzing the cases where patients do have both diabetes and heart disease, or simply diabetes.

### **Aggregated Table**
```{r include = FALSE}
source("table_summary.R", local = knitr::knit_global())
``` 

```{r echo = FALSE}
summary_table <- group_by(diabetes_df, Diabetes_012) %>%
  summarise(meanAge = mean(Age), meanBMI = mean(BMI), meanIncome = mean(Income))
```
The addition of this table allows us to view a possible correlation between the variables of high cholesterol, high blood pressure, mean variables of age, BMI, and income and diabetes. Through this table, we can see the distribution in the number of respondents while also comparing the variables that we are measuring. We can see that the average age for people with diabetes lies within the age-group of 9, which signifies the age range of 60-64 in the 13-level age category. We can also see that a majority of the respondents of the survey do not have diabetes, which is then followed by people with diabetes, and lastly, pre-diabetes. This trend is also noticeable in their responses on if they have high cholesterol and high blood pressure. Another trend is seeing how people without diabetes tend to have a lower average BMI compared to those that are pre-diabetes and those who have diabetes, which can indicate the lack of insulin in non-diabetic people. Alternatively, the income variable, shows that people without diabetes also tend to have a a higher income compared to those with diabetes and pre-diabetes, which might be attributed to a lack of better healthcare, diets, etc.

### **Charts**
```{r include = FALSE}
source("chart1.R", local = knitr::knit_global())
print(plot1)
``` 
```{r echo = FALSE}
plot1
```

The purpose of this chart allows us to see a comparison of the different levels of income, and how they relate to diabetes. With 1 being the lowest and 8 being the highest, we can notice some trends between people without diabetes (0), pre-diabetes (1), and people with diabetes(2). As we go from 0 to 2 on the x-axis, we can see how the income ranges shift upward in frequency. This is indicative of a trend that displays how incomes correlates with diabetes. We can also see it as an inversely proportional relationship as we can see that people without diabetes tend to have more in higher income

```{r include = FALSE}
source("chart2.R", local = knitr::knit_global())
print(plot2)
``` 
```{r echo = FALSE}
plot2
```

We included this graph in order to determine what age ranges corresponded to the different diabetes groups. This chart examines the age distribution among the diabetes classification. Age ranges are given values ranging from 1 to 13. Age-group 1 includes ages 18-24, and past this group, the age ranges increase in increments of 4 with age-group 13 including ages 80 or older (An easy way to calculate this can be to understand that age ranges are in increments of 4, so the starting age of any given age range can be found by 15 + 5*age_group, eg. age_group 5 is the age range between 15 + 5x5 = 40 which is the starting age, and the range ranges by 4 years so the resulting age range is 40-44). We can identify trends in the age ranges in the max, min, and interquartile ranges of the box plot. Among all classifications of diabetes, the youngest age is in age-group 1 and the highest age is in age-group 13. However, we can see a shift in age range of the interquartile range with each classification as the range shifts upwards. We found that people without diabetes are generally younger, while pre-diabetics are older and diabetics older still. This indicates that people develop diabetes later into life.

```{r include = FALSE}
source("chart3.R", local = knitr::knit_global())
print(plot3)
``` 
```{r echo = FALSE}
plot3
```

We included this chart in order to see how BMI differs between people who don't have diabetes and people who are prediabetic or diabetic. Using this graph, we found that the highest average BMI is in the diabetic group, with the second highest being pre-diabetics, and the lowest being the group with no diabetes. The high BMI for diabetics indicates that diabetes is related to overeating, which corresponds with medical research showing that diabetes is caused by high insulin levels.