Unsupervised_Learning

Apply Unsupervised learning algorithms to two different datasets. I applied two clustering algorithms along with different dimensionality reduction algorithms and evaluated their performance in classifying the data correctly.

Avocado Dataset

This dataset is comprised of information regarding avocado sales [Date, AveragePrice, TotalVolume, SmHass, LgHass, ExLgHass, TotalBags, SmallBags, LargeBags, XLargeBags, Type, Year, Region]. The data is particularly interesting because the avocado Type feature which indicates whether an avocado is standard or organic is relatively simple (binary) and easily separable as seen in previous assignments. Unsupervised learning should be able to learn this feature without explicitly knowing the labels. Some features such as volume are also separated into multiple highly related columns (small bags sold and large bags sold) which indicate dimensionality reduction may perform well. Prior to implementing clustering and dimensionality reduction, the categorical features (region sold and avocado type) were removed. I experimented with including them after one-hot encoding and standardizing the data but clustering performed best, as expected, with strictly the numerical data.

Abalone Dataset

The Abalone dataset consists of physical measurements of abalone in Tasmania from the North Coast and Islands of Bass Strait provided by the Marine Resources Division, Marine Research Laboratories, and the Department of Primary Industry and Fisheries. The dataset contains the following attributes: [Sex, Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight, Shell weight, Rings] Unsupervised Learning methods will be using the information available about the abalone to correctly predict its sex [‘m’, ‘f’, ‘infant’]. This dataset is interesting, in comparison to the avocado problem, because it is easier to approach and interpret. There are only 8 predictor variables and all but one of the predictor categories are continuous.

Environment Setup

Language: Python 2

IDE I used: Jupyter (Python)

OS I used: Mac OS

How to run code: Make sure you have python 2.7 installed: Python 2.7 :: Anaconda custom (x86_64) iPython 6.1.0 Homebrew and pip

Necessary Libraries (can also be installed with conda): Numpy - pip install numpy Pandas - pip install pandas Seaborn - pip install seaborn Matplotlib - brew install pkg-config, - pip install matplotlib Scikit-learn - pip install -U scikit-learn SciPy - pip install scipy

Dataset Links: https://archive.ics.uci.edu/ml/datasets/abalone https://www.kaggle.com/neuromusic/avocado-prices/home

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PS3_Avocado_Data_Clustering.ipynb		PS3_Avocado_Data_Clustering.ipynb
PS3_Code_Abalone_Data_Clustering.ipynb		PS3_Code_Abalone_Data_Clustering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised_Learning

Avocado Dataset

Abalone Dataset

Environment Setup

About

Uh oh!

Releases

Packages

Languages

ConnieC14/Unsupervised_Learning

Folders and files

Latest commit

History

Repository files navigation

Unsupervised_Learning

Avocado Dataset

Abalone Dataset

Environment Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages