Today I continued to preprocess the data and engineer relevant features. This included merging the tables to represent the combined effects of obesity and inactivity on diabetes risk. After analyzing I found only 354 FIPS are in common within obesity, inactivity, and diabetes and there are 1370 FIPS in common between inactivity and diabetes. So, I decided to analyze the data on inactivity and diabetes first to get a better picture of the dataset after merging.
Building on my initial EDA, I created visualizations to illustrate the relationships between inactivity and diabetes. These visualizations will play a crucial role in communicating our findings. I did a bivariate analysis, and heatmap analysis of inactivity and diabetes to get a better understanding of the relation between the two. Further, I plot the probability density function and Cumulative distribution function for both to understand the distribution of data points.
This week I’m planning to develop machine learning models to predict diabetes risk based on inactivity levels. My preliminary model will include logistic regression.