The Cost of Health Care Insurance

Cynthia Fonderson | Dec 12, 2023 min read

The goal of this project is to predict the health insurance cost incurred by individuals based on their age, gender, BMI, number of childrem, smoking habits and location. The dataset was exported from Kaggle and curated by user Miri Choi.

VariableDescription
ageAge of primary beneficiary
sexInsurance contractor gender, female, male
bmiBody mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg/m^2) using the ratio of height to weight, ideally 18.5 to 24.9
childrenNumber of children covered by health insurance / Number of dependents
smokerSmoking
regionThe beneficiary’s residential area in the US, northeast, southeast, southwest, northwest
chargesIndividual medical costs billed by health insurance



Project Outline

  1. Data ingestion and cleaning
  2. Feature engineering
  3. Evaluating linear regression models
  4. Summary

Project Summary

Following data ingestion and cleaning, categorical variables were encoded using numerical values. Exploratory data analysis was also conducted, to gain a better understanding of the dataset. corr_mat

Next, a linear regression model was used to predict health care premiums based on the relevant features. The linear regression model performed moderately well (accuracy ~75%), using an artifical neural network (trained for 100 epochs) performed better with an accuracy of approximately 85%. regression

Full Project