-
Notifications
You must be signed in to change notification settings - Fork 7
v3.0.0 infrastructure with CCHS master file harmonization #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
number of cigs per month wasn't being converted to packeyars per month, which is done by dividing by 20
The test case and expected values are stored in the `pack_years.csv` file It uses a newly added function called `test_derived_function` to run the expectations. This function can also be used by other functions that want to use the same workflow for testing Added instructions for using the new `test_derived_function` in a README.md file
- Convert SVG logo text to paths to eliminate font dependency issues - Add docs/ and ..Rcheck/ to .gitignore to exclude generated content - Update DESCRIPTION with additional package dependencies - Update _pkgdown.yml configuration - Regenerate all favicon files from corrected logo
Fix logo font rendering issues and regenerate favicons
Packyears fix
- Add schema validation system with cross-platform compatibility - Add CSV standardization tools for git collaboration - Add metadata schemas for variables and variable_details - Foundation for v2.2.0 enhancements
- Add 28 new variables with full metadata - Enhance 91 existing variables with _i cycle database support - Add systematic version tracking for all variables - Maintain backward compatibility
- Add 3 new functions: DemPoRT_ICES_code.R, adl_score_6.R, missing-data-helpers.R - Major enhancement to smoking.R (1547 changes) - improved _i cycle support - Substantial updates to bmi.R (509 changes) - enhanced database compatibility - Significant improvements to adl.R (264 changes) - expanded functionality - Enhanced alcohol.R (193 changes) - better cycle support - Updated utility functions for v2.2.0 compatibility
- Add test-csv-helpers.R for CSV standardization validation - Add test-yaml-validation.R for schema testing - Add test-dependency-helpers.R for dependency analysis - Add test-missing-data-helpers.R for missing data handling - Enhance helper-utils.R with v2.2.0 testing infrastructure - Add CHANGELOG_v2.2.0.md documenting all enhancements
- Update DESCRIPTION to version 2.2.0 with current date - Add yaml and readr dependencies for validation infrastructure - Remove DemPoRT_ICES_code.R (not needed for this release) - Package ready for comprehensive testing and validation
- Change title from "Recodeflow Schema Validation System" to "Schema Validation" - Update @name from "recodeflow_schema_validation" to "schema_validation" - Generalize description for broader applicability - Bug fixes for required field extraction as noted in session status
- Update variable_details.csv with 3,577 comprehensive entries - Update variables.csv with enhanced metadata tracking - Add version tracking, harmonization status, and review notes - Implement structured metadata framework for v2.2.0
- Update BMI functions (bmi_fun, adjusted_bmi_fun) with v2.2.0 @note metadata - Update ADL functions (adl_fun, adl_score_5_fun, adl_score_6_fun) with versioning - Update alcohol functions (ALCDTTM, binge_drinker_fun, low_drink_score_fun, ALCDTYP_A) with metadata - Update smoking functions (SMKDSTY_fun, time_quit_smoking_fun, smoke_simple_fun, pack_years_fun, pack_years_fun_cat) with versioning - All 14 functions include machine-readable @note format: v2.2.0, last updated: 2025-06-30, status: active
- Update schema validation with improved required field extraction - Enhance templates.yaml with comprehensive versioning framework - Add metadata validation utilities for function versioning - Improve error handling and validation consistency
- Update @note metadata in all 14 versioned functions to v3.0.0 - Rename CHANGELOG_v2.2.0.md to CHANGELOG_v3.0.0.md - Update schema files with v3.0.0 versioning - Reflect major version due to breaking changes (_s deprecation, function modernization)
- Create modern tidyverse development vignette with v3.0.0 patterns - Document copy-paste functionality across scalar, vector, and rec_with_table() contexts - Include complex case_when patterns with missing data handling examples - Add comprehensive input validation and data checking framework - Provide complete documentation standards with transformation warnings - Establish function versioning system with structured @note metadata - Based on smoking function modernization as reference implementation
…ntions - Update BMI functions (bmi_fun, adjusted_bmi_fun, bmi_fun_cat) to standard roxygen2 patterns - Update alcohol binge_drinker_fun documentation following community standards - Remove custom formatting (bold headings, non-standard sections) - Add mandatory rec_with_table() examples as primary usage pattern - Standardize @return documentation with itemized missing data handling - Convert transformation warnings to plain text @details sections - Preserve legacy functions in backup files for validation - Document identified issues for team discussion 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…ace issue - Update low_drink_short_fun and low_drink_long_fun to R/Tidyverse standards - Remove custom formatting and add mandatory rec_with_table() examples - Fix critical namespace issue: tagged_na() → haven::tagged_na() in physical activity functions - Standardize @return documentation with itemized missing data handling - Add comprehensive @examples, @Seealso, and @references sections 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Mark physical activity namespace issue as completed - Mark function organization strategy as completed - Add documentation standardization completion status - Update priority tracking for remaining items 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Rename BMI functions: bmi_fun → calculate_bmi, adjusted_bmi_fun → adjust_bmi, bmi_fun_cat → categorize_bmi - Rename ADL functions: adl_fun → assess_adl, adl_score_5_fun → score_adl, adl_score_6_fun → score_adl_6 - Rename alcohol functions: binge_drinker_fun → assess_binge_drinking, low_drink_short_fun → assess_drinking_risk_short, low_drink_long_fun → assess_drinking_risk_long - Rename physical activity: energy_exp_fun → calculate_energy_expenditure - Update all internal function references and @Seealso links - Update development guide with naming standards and migration mapping - Follow verb-first naming pattern: calculate_*, assess_*, categorize_*, score_* 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Update variable_details.csv with all new function names: • Func::adl_fun → Func::assess_adl • Func::adl_score_5_fun → Func::score_adl • Func::adl_score_6_fun → Func::score_adl_6 • Func::binge_drinker_fun → Func::assess_binge_drinking • Func::low_drink_short_fun → Func::assess_drinking_risk_short • Func::low_drink_long_fun → Func::assess_drinking_risk_long • Func::energy_exp_fun → Func::calculate_energy_expenditure • Func::bmi_fun → Func::calculate_bmi • Func::adjusted_bmi_fun → Func::adjust_bmi • Func::bmi_fun_cat → Func::categorize_bmi - Rename test files: test-bmi-enhanced.R → test-calculate-bmi.R, test-adl-enhanced.R → test-assess-adl.R - Update all function calls in test files to use new naming conventions - Maintain consistency across metadata, functions, and tests 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Create deprecated aliases for all 11 renamed functions - Include comprehensive deprecation warnings with migration guidance - Functions will be removed in v4.0.0 - Maintains full backward compatibility during v3.x series
- Remove incorrect source() calls from enhanced test files - Enhanced functions loaded via devtools::load_all() - Allows tests to run in proper package environment
- Regenerate NAMESPACE with new function exports - Update all function documentation with new names - Add documentation for new modernized functions - Remove documentation for old energy_exp_fun - Include generated vignette HTML
Total Additions: 46 rows Variables Created: 1. SMK_09A_B_cont - Time since stopped smoking daily (former daily smokers) 2. SMKG09C_cont - Years since stopped smoking daily (former daily smokers) 3. SMKG203_A_cont - Age started smoking daily (current daily smokers) 4. SMKG207_A_cont - Age started smoking daily (former daily smokers) Mapping Types Implemented: Categorical → Continuous Mappings (27 rows): - SMK_09A_B_cont: 1→0.5, 2→1.5, 3→2.5 years - SMKG09C_cont: 1→4, 2→8, 3→12 years - SMKG203_A_cont: 1→8, 2→13, 3→16, 4→18.5, 5→22, 6→27, 7→32, 8→37, 9→42, 10→47 years - SMKG207_A_cont: Same age mappings as SMKG203_A_cont Continuous → Continuous Copy Operations (7 rows): - cchs2022_i: SPU_25I → smoking variables (proper cont-to-cont mapping) - Other databases: Range validation copies with [0,80] valid range NA Value Mappings (12 rows): - NA::a (not applicable): recStart values like 6, 996 - NA::b (missing): recStart values like [7,9], [997,999], else - Unique handling with _7 and _e suffixes for multiple rules per NA category 🗃️ Database Coverage: - cchs2003_p through cchs2023_i - Comprehensive CCHS cycle coverage - Public (p), Shared (s), ICES (i) - All database types supported - Special handling for cchs2022_i - Uses SPU_25I continuous source ✨ Key Features Implemented: - Smart dummyVariable naming with recStart identifiers (_7, _e) - Range validation entries for all continuous variables [0,80] - Consistent variableStartShortLabel system (stpd_cat, stpdy_cont, etc.) - Clean notes field with special characters removed - Zero duplicates - all variable+database+source combinations unique 🎨 DummyVariable Patterns Created: - Categorical: SMK_09A_B_cont_05, SMKG09C_cont_4 - Copy cchs2022_i: SMK_09A_B_copy_cont_cchs2022_i - Copy others: SMKG203_A_cont_copy - NA mappings: SMK_09A_B_cont_NAb_cchs2003_p_7, SMKG09C_cont_NAa_cchs2022_i
Smoking Status Function Reorganization: - Add comprehensive documentation and examples Test Suite Added: - Add 6 new test functions covering all SMKDSTY_A categories (1-6) - Test missing data handling (tagged_na patterns) - Test vector input processing and edge cases - Add CCHS codebook validation tests - Add legacy compatibility tests with detailed descriptions Bug Fix - Legacy Compatibility: - Fix condition order for "Never smoked" classification - SMK_005=3 & SMK_01A=2 → category 6 (regardless of SMK_030) - All smoking status tests pass (139 total test assertions) - Maintains 100% legacy compatibility for smoking history generator models
- Smoking status functions (SMKDSTY_A, SMKDSTY_B, SMKDSTY_cat5, SMKDSTY_cat3) are complete - All 148 tests passing - Enhanced roxygen examples for all smoking status functions with rec_with_table() workflows - Added missing data and edge case examples showing CCHS code handling - Fixed smoke_simple boundary condition for 5-year threshold - Updated assessment documentation and working guide with comprehensive examples
- work-in-progress for adding smoking initiation to smoking.R - clean variable_details.csv for these variable. More cleaning needed.
updated function working. Tests all working.
…on-infrastructure
Update recFrom and recTo. recFrom usually doesn't have a defined range. rectTo defined from Smoking History Generator models.
- Add regex constraints for recEnd field validation (prevents issues like "5+" in categorical data) - Document proper recStart N/A usage guidelines for derived variables - Add CCHS-specific data consistency requirements - Update variable_details.csv schema compliance - Establish standardized formatting rules for categorical values
- New validate_csv_comprehensive() function for structured validation checks - R CMD check style output with clear pass/fail status indicators - Three-layer validation system (basic, verbose, full investigation) - Complete usage examples and integration documentation - Helper functions and dependencies for team workflows - Ready for development team adoption and testing Enables teams to validate variable_details.csv and variables.csv files with consistent, reliable feedback for data quality assurance.
…on-infrastructure
|
CSV Validation Infrastructure (Draft - Feedback Requested) 🛠️ New Development Tool: CSV Validation System A CSV validation infrastructure to help review and validate 📋 Quick Start Example Validate your CSV filessource("development/csv-validation-improvements/validate_csv_comprehensive.R Basic validation (recommended for daily use)validate_csv_comprehensive("inst/extdata/variable_details.csv") Verbose output for troubleshootingvalidate_csv_comprehensive("inst/extdata/variable_details.csv", verbose = 📚 Documentation & Integration
Key Features
🔄 Status: Draft Infrastructure - Feedback needed Objective of the validation function:
Try testing these tools in your workflows and share feedback. |
|
This update adds 218 new rows to variable_details.csv, 🚬 Smoking variable infrastructure updates New Continuous Smoking Variables:
Enhanced Range Specifications:
Range implementation:
This addresses the issue where many categorical-to-continuous Pack years enhancement Updated pack_years_der dependencies to include all new continuous smoking
Data quality & validation Improvements Fixed Invalid recEnd Formatting:
Clean number_conditions variable:
General Data Cleaning:
✅ Validation The updated CSV file now passes comprehensive validation: |
I believe this was a typo Doug had pointed out previously that he wanted me to take a look into. The 2015-16 ADL_01 variable was reformatted to a 4 category variable in these survey cycles so their inclusion in these rows would be incorrect. Additional rows had been added to variable_details to account for their harmonization back to a two category variable as well as rows to have them coded as their original 4-category variable. There also should not be 2015-2016 or 2017-18 master cycles for these variables as the module wasn't mandatory and therefore not collected in Ontario for those cycles.
1fe0a47 to
c625a2d
Compare
Oral health updates to variables/variable-details
Summary
Complete v3.0.0 transformation featuring CCHS Master file harmonization, smoking variable updates to 2024 standards, function naming modernization, and comprehensive tidyverse enhancement.
🔥 Breaking Changes (v3.0.0)
Function Naming Modernization ✨ NEW
bmi_fun()→calculate_bmi()adl_fun()→assess_adl()binge_drinker_fun()→assess_binge_drinking()low_drink_*_fun()→assess_drinking_risk_*()energy_exp_fun()→calculate_energy_expenditure()rec_with_table()examplesCCHS Data Coverage Standardization
_ssuffixes → standardized_mapproach for all cycles_sdatabase references need updating to_m_s) and master (_m) filesEnhanced Function Ecosystem
if_else2()→dplyr::if_else()for type safetyif_else2()usage will generate deprecation warnings🎯 Key Features
1. Function Naming Modernization ✨ NEW
calculate_*for computational functions (BMI, energy expenditure)assess_*for evaluation functions (ADL, drinking risk, binge drinking)categorize_*for classification functions (BMI categories)score_*for scoring functions (ADL scores)2. CCHS Data Coverage Enhancement
Standardized master file approach across all cycles:
Introduced continuous variables where available:
DHHGAGE_cont(age by single year vs categories)HWTGHTM,HWTGWTK(height/weight continuous)SMKG203_cont,SMKG207_cont(smoking quantity continuous)HWTDBMI,HWTDHTM,HWTDWTK)Expanded categorical variables with greater granularity:
score_adl_6())3. Updated Smoking Variables (2024 Standards)
4. Tidyverse Modernization
haven::tagged_na()implementation for semantic missing datadplyr::case_when()logic replacing legacy patterns5. Expanded Testing Infrastructure
📊 Technical Impact
Data Coverage Enhancement:
Code Quality:
🔧 Infrastructure Improvements
Data Coverage Standardization
_msuffix approach for consistent database referencingVersioning System
@note v3.0.0metadata for all enhanced functions🔗 External References
ODESI Shared Files: https://search.odesi.ca/ (2009, 2010, 2012 comprehensive coverage)
Statistics Canada Methodologies: 2024 research standards compliance
R/Tidyverse Style Guide: Function naming conventions adopted
✅ Migration Guide
Function Naming (v3.0.0) ✨ NEW
All old function names remain available with deprecation warnings:
Migration timeline:
For Data Coverage Standardization:
Cycles 2009, 2010, 2012:
cchs20XX_stocchs20XX_pAll cycles:
For
if_else2()users:if_else2()calls withdplyr::if_else()📋 Testing & Validation
Ready for Review - Complete v3.0.0 transformation with modern R conventions, enhanced data coverage, and comprehensive documentation.