Skip to content

This project covers the process of cleaning, transforming, and organizing raw data into a more suitable format for analysis, modeling and visualization

Notifications You must be signed in to change notification settings

Fausford/Data_Manipulation_R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data_Manipulation_R

This project covers the process of cleaning, transforming, and organizing raw data into a more suitable format for analysis, modeling and visualization

Medical Data Exploration and Preprocessing

Overview

This repository documents a structured workflow for exploring, cleaning, and transforming a medical dataset (med_data). The project demonstrates step-by-step processes using both base R and the tidyverse for efficient, readable data analysis.

The workflow includes:

  • Initial dataset inspection to understand structure and contents

  • Slicing and subsetting data for focused analysis

  • Renaming variables for clarity, conciseness, and unit specification

  • Filtering based on demographic and clinical conditions

  • Feature engineering to derive new, useful variables

  • Factor handling and visualization for categorical data analysis

Repository Contents

1) Dataset Inspection

Examine row and column counts

View column names, first and last rows, and summary statistics

Establish an initial understanding of dataset structure and variable distributions

2) Slicing and Subsetting

Select specific rows and columns using base R and tidyverse syntax

Create subsets for targeted analysis

3) Renaming Variables

Apply clear, short, and consistent names

Add unit labels where necessary

Use medically relevant abbreviations (e.g., sbp for systolic blood pressure)

4) Filtering

Create subsets for:

Male participants

Older adults (≥ 60 years)

Young adults (20–40 years)

Enable targeted exploratory analysis for specific population groups

5) Feature Engineering

Convert glucose from g/dL to mmol/L using:

glu(mmol/L) = glu(g/dL) ÷ 18

Remove redundant variables after conversion

Standardize units for consistency across the dataset

6) Factor Handling and Visualization

Count category frequencies for race

Convert race to factor type and reorder levels (Asian, Black, White, Other)

Visualize age distribution by race using boxplots

Character variables plot Factor levels plot

These plots compare the dataset before and after converting race into an ordered factor.

Goals

Provide a learning reference for tidyverse-style data preprocessing

Produce publication-ready summaries and visualizations

Encourage consistent naming and unit standardization in medical datasets

Requirements

R (≥ 4.0.0)

Suggested packages:

tidyverse

About

This project covers the process of cleaning, transforming, and organizing raw data into a more suitable format for analysis, modeling and visualization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published