This repository demonstrates a complete, reproducible workflow for descriptive statistics and exploratory data analysis (EDA) on a synthetic medical dataset. It uses an R Markdown report to load data, assess structure and missingness, summarize variables, and visualize numerical and categorical patterns. A small utility script (descriptive.R) adds helper functions for streamlined numeric summaries and pairwise plots.
- Descriptive_Statistics.Rmd
Main analysis report (R Markdown) titled “Descriptive Statistics”.
- descriptive.R
Helper functions referenced by the R Markdown (e.g., numeric handlers and plotting utilities).
Synthetic medical dataset loaded from public URL.
tidyverse — Data wrangling and plotting
mlbench — Example datasets and utilities
DataExplorer — Automated EDA (missingness, histograms, correlations, boxplots)
skimr — Compact data summaries
psych — Descriptive statistics for numeric variables
knitr — Chunk options and report rendering (via R Markdown)
Dataset: medical_synthetic.csv (downloaded directly in the report)
Contents: Demographics (age, sex, race), vitals, labs (e.g., glucose, creatinine), and derived indicators suitable for basic descriptive analysis.
- Project Setup
Loads libraries and sources descriptive.R.
Sets chunk options for reproducibility and clean output.
- Data Import & Structure
Downloads the medical dataset from a public URL.
Prints structure and a compact overview (types, ranges, examples).
- Missingness & Summary
Missingness map: Visualizes proportion and distribution of NAs.
skim() summary: Variable types, completeness, and distribution summaries.
psych::describe(): Descriptive statistics for numeric columns.
- Numerical Data Exploration
Histograms for continuous variables.
Boxplots stratified by sex and by race.
scatter plot for selected numeric features.
Numeric subset extraction for focused analysis.
handle_numeric() — Standardized numeric summaries.
plot_numeric() — Pairwise numeric plots for selected variables.
- Categorical Data Exploration Frequency tables for race and sex (ordered factor for race).
Clean display of counts for quick inspection.
- Numeric × Categorical Summaries Grouped means of age and glucose by sex (with NA-safe handling for glucose).
Simple cross-tabulation of sex × race.
Missingness plot (overview of NAs)
Skim summary and psych descriptives (tabular)
Histograms (numeric distributions)
Boxplots by sex and by race (group comparisons)
Correlation heatmap (numeric relationships)
Pairwise numeric plot for selected variables (e.g., age, creatinine, glucose)
Frequency tables (race, sex)
Grouped means (e.g., mean age/glucose by sex)
Open the R Markdown file in RStudio (or your preferred editor).
Ensure required packages are installed.
Knit/render the report to HTML to reproduce the tables and figures.
The report references descriptive.R. Keep this file in the expected path (as referenced in the YAML/script) to ensure helper functions are available.
Provide a clear template for descriptive statistics on tabular medical data.
Standardize numeric and categorical summaries for quick reporting.
Produce publication-ready figures and tables via automated EDA tools.
Notes If you relocate files or change folder names, update paths in the report header or at the top of the document.
For larger datasets, consider chunk-wise processing or saving intermediate outputs in a dedicated folder.

