Data_Manipulation_R

This project covers the process of cleaning, transforming, and organizing raw data into a more suitable format for analysis, modeling and visualization

Medical Data Exploration and Preprocessing

Overview

This repository documents a structured workflow for exploring, cleaning, and transforming a medical dataset (med_data). The project demonstrates step-by-step processes using both base R and the tidyverse for efficient, readable data analysis.

The workflow includes:

Initial dataset inspection to understand structure and contents
Slicing and subsetting data for focused analysis
Renaming variables for clarity, conciseness, and unit specification
Filtering based on demographic and clinical conditions
Feature engineering to derive new, useful variables
Factor handling and visualization for categorical data analysis

Repository Contents

1) Dataset Inspection

Examine row and column counts

View column names, first and last rows, and summary statistics

Establish an initial understanding of dataset structure and variable distributions

2) Slicing and Subsetting

Select specific rows and columns using base R and tidyverse syntax

Create subsets for targeted analysis

3) Renaming Variables

Apply clear, short, and consistent names

Add unit labels where necessary

Use medically relevant abbreviations (e.g., sbp for systolic blood pressure)

4) Filtering

Create subsets for:

Male participants

Older adults (≥ 60 years)

Young adults (20–40 years)

Enable targeted exploratory analysis for specific population groups

5) Feature Engineering

Convert glucose from g/dL to mmol/L using:

glu(mmol/L) = glu(g/dL) ÷ 18

Remove redundant variables after conversion

Standardize units for consistency across the dataset

6) Factor Handling and Visualization

Count category frequencies for race

Convert race to factor type and reorder levels (Asian, Black, White, Other)

Visualize age distribution by race using boxplots

These plots compare the dataset before and after converting race into an ordered factor.

Goals

Provide a learning reference for tidyverse-style data preprocessing

Produce publication-ready summaries and visualizations

Encourage consistent naming and unit standardization in medical datasets

Requirements

R (≥ 4.0.0)

Suggested packages:

tidyverse

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data_Manipulation.Rmd		Data_Manipulation.Rmd
README.md		README.md
character.png		character.png
factor_level.png		factor_level.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data_Manipulation_R

Medical Data Exploration and Preprocessing

Overview

Repository Contents

1) Dataset Inspection

2) Slicing and Subsetting

3) Renaming Variables

4) Filtering

5) Feature Engineering

6) Factor Handling and Visualization

Goals

Requirements

About

Uh oh!

Releases

Packages

Fausford/Data_Manipulation_R

Folders and files

Latest commit

History

Repository files navigation

Data_Manipulation_R

Medical Data Exploration and Preprocessing

Overview

Repository Contents

1) Dataset Inspection

2) Slicing and Subsetting

3) Renaming Variables

4) Filtering

5) Feature Engineering

6) Factor Handling and Visualization

Goals

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages