Skip to content

Within the framework of the project during the study period, various datasets from different spheres of life were collected and analyzed, followed by visualization and verification of various statistical hypotheses using statistical criteria.

License

Notifications You must be signed in to change notification settings

BorDch/Data-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-analysis-project

pic

Static Badge Static Badge Static Badge

Description

This project was carried out as part of a computer technology workshop training course in which the task was to analyze data on arbitrary datasets, as well as test various statistical hypotheses. The entire implementation was integrated via Jupyter Notebook, so the final version looks in the format .ipynb and .html. Separate source code implementation files are also provided.

Documentation

  • Directories:
    • 1-2 steps: main directory with final realizations of two steps of project where 1 step is the data analysis and visualization and 2 step is the test of various statisitical hypotheses.
    • Scripts: directory with source code of main functions from programmes in two steps of project.
  • Methods (Functions):
    • Python:

      • extract_sport(string): function to extract sport category;

      • grubbs_test(array): function implements Grubbs test;

      • q_dixon_test(array): function implements Dixon-Q Test;

      • plot_ecdf(array, label, ax): function for creation ECDF with seaborn.

      • custom_ecdf(array): function creates ECDF of data;

      • envelope method(array, n): function implements envelope methods using bootstrap algorithm and function 'custom_ecdf(data)';

      • perform_normality_tests(array, name):: function implements tests to check hypothesis about normality of data;

      • f_test_variance(array_x, array_y, alpha):: function implements F test to check hypothesis the equality of variances;

      • compute_chi2_statistic(table)(tble_name):: function calculates chi-square statistics;

      • fit_polynomial_regression(degree):: function fits polynomial regression with polynoms with degree.

    • R:

      • envelope_ecdf <- function(data) {...}__: function creates polygon for envelope method via ECDF;
      • other methods are the same...

Main code realizations in .ipynb files in 1-2 steps directory.

Developers

Additional useful links

All files for downlodad on the Yandex disk

Python documentation

R documentation

Test of Hypotheses using statistics

License

MITlicense

About

Within the framework of the project during the study period, various datasets from different spheres of life were collected and analyzed, followed by visualization and verification of various statistical hypotheses using statistical criteria.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published