This analysis explores a synthetic medical dataset to understand its structure, contents, and data quality. The examination focuses on basic characteristics and missing data patterns.
Initial steps observed the dataset dimensions: The dataset contains X rows (observations) and Y columns (variables)
Column names were obsered using a function to list main column categories
Variable types includes both numerical and categorical variables
The analysis later focused on missing Values. We examined missing data using multiple approaches:
-
Column-wise missing value counts
-
Summary statistics showing NA distributions
-
Comprehensive missing data profiling
Notes: This were done without using missing data packages. We will cover them in subsequent lessons
Key descriptive statistics for all variables including:
-
Central tendency measures (mean, median)
-
Dispersion metrics (standard deviation, range)
-
Distribution characteristics (skewness, kurtosis)
Initial Data Inspection
-Verified successful data loading
-Checked variable names and types
-Missing Data Evaluation
-Quantified missing values per variable
-Assessed patterns of missingness
-Descriptive Statistics
-Generated comprehensive numerical summaries
-Calculated distribution metrics
-Data Cleaning(partial)
-Applied basic missing data handling
-Verified cleaning results
The analysis uses standard R packages (tidyverse, skimr, psych)
All operations are reproducible with the provided code
Dataset should be placed in appropriate path before running