diff --git a/Sumit_Gaur_Numpy and Pandas.ipynb b/Sumit_Gaur_Numpy and Pandas.ipynb new file mode 100644 index 00000000..110ba857 --- /dev/null +++ b/Sumit_Gaur_Numpy and Pandas.ipynb @@ -0,0 +1,1239 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5dacfd59", + "metadata": {}, + "source": [ + "# ANALYTICAL PYTHON" + ] + }, + { + "cell_type": "markdown", + "id": "dab926de", + "metadata": {}, + "source": [ + "## NUMPY" + ] + }, + { + "cell_type": "markdown", + "id": "2c38f678", + "metadata": {}, + "source": [ + "Numpy is the most used library for fundamental scientific computing in python it provides multidimesinal array obejct.It is an extension module for Python,with a lot of it written in C.Since it is written in C this enables it to make the computation fast and ensures that the processing speeds are low.It offers a great computational speeds compared to build in python arrays. Dimensions in NumPy are called as Axes.\n", + "\n", + "### Features of NumPy\n", + "- NumPy arrays have their size fixed, unlike lists they cannot be changed\n", + "- Elements in a NumPy array cannot be of differnt size, they only have to be of the same type.\n", + "- We can store large numbers in a NumPy array, it consumes less size compared to the conventional arrays and offer much better effeciency.\n", + "- It is an open source library.\n", + "- Offers effecient calculations of mathematical functions that operate on arrays and matrices." + ] + }, + { + "cell_type": "markdown", + "id": "23e96f96", + "metadata": {}, + "source": [ + "### Installtion of NumPy\n", + "#### Installation procedure for Windows,macOS and Linux Systems\n", + "- For Begineers reccomended way to use NumPy is to first install Anaconda Installer from [Anaconda Website](www.anaconda.com)\n", + "- choose a Python 3.6 graphical and download the file\n", + "- After the file gets downloaded click on the downloaded file to start the installation\n", + "- Click Next\n", + "- Read the licensing terms and click on I agree\n", + "- Select an install for \"Just Me unless you want the program for all users.\n", + "- Select the destination where you want it to be installed.\n", + "- Select Add Anaconda to my PATH environment variable which means that you would be able to use anaconda on the command prompt and click install.\n", + "- Click Finish" + ] + }, + { + "cell_type": "markdown", + "id": "78fe0269", + "metadata": {}, + "source": [ + "#### Alternatively NumPy can be installed with pip by using" + ] + }, + { + "cell_type": "markdown", + "id": "bb0be93d", + "metadata": {}, + "source": [ + "
pip install numpy
" + ] + }, + { + "cell_type": "markdown", + "id": "ab8c9227", + "metadata": {}, + "source": [ + "##### Arrays in NumPy\n", + "- Arrays is collection of same type of data types\n", + "- Offers more features than the standard python library class array.array which can only handle 1 dimension and are indexed by non negetive integers." + ] + }, + { + "cell_type": "code", + "execution_count": 233, + "id": "4f0321b3", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.20.1\n" + ] + } + ], + "source": [ + "#To get the version of NumPy\n", + "print(np.__version__)" + ] + }, + { + "cell_type": "code", + "execution_count": 232, + "id": "48d0b390", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "#Defining an array in NumPy\n", + "import numpy as np #importing numpy as np\n", + "arr=type(np.array([1,2,3,4]))\n", + "print(arr)" + ] + }, + { + "cell_type": "code", + "execution_count": 230, + "id": "0f47a625", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[ 1 2 3 4 5]\n", + " [ 6 7 8 9 10]]\n" + ] + } + ], + "source": [ + "b=np.array([1,2,3,4,5,6,7,8,9,10])\n", + "new_arr=b.reshape(2,5)\n", + "print(new_arr)" + ] + }, + { + "cell_type": "markdown", + "id": "5e8c7533", + "metadata": {}, + "source": [ + "### Dimensions of arrays \n", + "- Dimensions of arrays are the number of keys or indices that we need to specify individual elements of the array\n", + "- An array can be of multiple dimensions" + ] + }, + { + "cell_type": "code", + "execution_count": 235, + "id": "b5db9153", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1 2 3 4]\n" + ] + } + ], + "source": [ + "#Defining a 1 Dimension array using NumPy\n", + "print(np.array([1,2,3,4]))" + ] + }, + { + "cell_type": "code", + "execution_count": 249, + "id": "58960bac", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[1 2 3 4]\n", + " [5 6 7 8]]\n" + ] + } + ], + "source": [ + "#Defining a 2 Dimension array using NumPy\n", + "two_dimension_array=np.array([[1,2,3,4],[5,6,7,8]])\n", + "print(two_dimension_array)" + ] + }, + { + "cell_type": "code", + "execution_count": 254, + "id": "584a1489", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2\n" + ] + } + ], + "source": [ + "#defining a 3 dimension array using NumPy\n", + "three_dimension_array=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])" + ] + }, + { + "cell_type": "markdown", + "id": "ea21f9aa", + "metadata": {}, + "source": [ + "###### Dimensions of an array can be checked by using shape function" + ] + }, + { + "cell_type": "code", + "execution_count": 257, + "id": "0b866123", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "dimensions=three_dimension_array.ndim\n", + "print(dimensions)" + ] + }, + { + "cell_type": "markdown", + "id": "7dae6b38", + "metadata": {}, + "source": [ + "##### Total number of elements in an array can be checked by using size" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "id": "99e52976", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10\n" + ] + } + ], + "source": [ + "b=np.array([1,2,3,4,5,6,7,8,9,10])\n", + "new_arr=b.reshape(2,5)\n", + "print(new_arr.size)" + ] + }, + { + "cell_type": "markdown", + "id": "49af86d7", + "metadata": {}, + "source": [ + "#### Creation of arrays in numpy" + ] + }, + { + "cell_type": "code", + "execution_count": 118, + "id": "b5c124cf", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "int32\n" + ] + } + ], + "source": [ + "b=np.array([1,2,3,4,5,6,7,8,9,10])\n", + "print(b.dtype)" + ] + }, + { + "cell_type": "markdown", + "id": "db5f81ff", + "metadata": {}, + "source": [ + "#### Zeros,ones and empty in numpy\n", + "- Zeros is used to define a numpy array which has all the elements as zeros\n", + "- Ones is used to define a numpy array which has all the elements as ones\n", + "- Empty is used to created a numpy array with random values" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "id": "cb88d8fe", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[0. 0. 0. 0. 0.]\n", + " [0. 0. 0. 0. 0.]]\n" + ] + } + ], + "source": [ + "a=np.zeros((2,5))\n", + "print(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 123, + "id": "3472a7bb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[1 1 1 1 1 1]\n", + " [1 1 1 1 1 1]]\n" + ] + } + ], + "source": [ + "a=np.ones((2,6),dtype=np.int16)\n", + "print(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 124, + "id": "d8377547", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[0. 0. 0. 0. 0.]\n", + " [0. 0. 0. 0. 0.]]\n" + ] + } + ], + "source": [ + "a=np.empty((2,5))\n", + "print(a)" + ] + }, + { + "cell_type": "markdown", + "id": "e55d7003", + "metadata": {}, + "source": [ + "#### arange in NumPy\n", + "- It is used to create sequence of arrays\n", + "- It can be used to generate multidimension arrays" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "id": "b004b428", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0 1 2 3 4 5 6 7 8 9]\n", + "[[0 1 2 3 4]\n", + " [5 6 7 8 9]]\n" + ] + } + ], + "source": [ + "#For 1 Dimension array\n", + "a=np.arange(10)\n", + "print(a)\n", + "\n", + "#For 2 Dimension array\n", + "a= np.arange(10).reshape(2,5)\n", + "print(a)" + ] + }, + { + "cell_type": "markdown", + "id": "04045bc6", + "metadata": {}, + "source": [ + "#### Mathematical operations in NumPy\n", + "- Basic mathematical operations can be performed in NumPy element wise" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "id": "71c4802b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[35 46 54 61]\n", + "[25 34 46 59]\n", + "[150 240 200 60]\n" + ] + } + ], + "source": [ + "a=np.array([30,40,50,60])\n", + "b=np.array([5,6,4,1])\n", + "\n", + "print(a+b)\n", + "print(a-b)\n", + "print(a*b)" + ] + }, + { + "cell_type": "markdown", + "id": "4f1afbf1", + "metadata": {}, + "source": [ + "#### Other operations that can be performed" + ] + }, + { + "cell_type": "code", + "execution_count": 129, + "id": "2c3c4766", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "105\n", + "14\n", + "0\n" + ] + } + ], + "source": [ + "a=np.arange(15)\n", + "print(a.sum())\n", + "print(a.max())\n", + "print(a.min())" + ] + }, + { + "cell_type": "markdown", + "id": "cb5e553f", + "metadata": {}, + "source": [ + "### Matrix operations on NumPy\n", + "- NumPy can perform matrix operations with high effeciency and less time" + ] + }, + { + "cell_type": "markdown", + "id": "2f7c9ac1", + "metadata": {}, + "source": [ + "### Dot multiplication of matrices" + ] + }, + { + "cell_type": "code", + "execution_count": 224, + "id": "60a6e84f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "650\n" + ] + } + ], + "source": [ + "a=np.array([30,40,50,60])\n", + "b=np.array([5,6,4,1])\n", + "dot_multiplication=np.dot(a,b)\n", + "print(dot_multiplication)" + ] + }, + { + "cell_type": "markdown", + "id": "a5c74a46", + "metadata": {}, + "source": [ + "### Square root of elements of Matrices" + ] + }, + { + "cell_type": "code", + "execution_count": 163, + "id": "e923422b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[5.47722558 6.32455532 7.07106781 7.74596669]\n", + "[2.23606798 2.44948974 2. 1. ]\n" + ] + } + ], + "source": [ + "print(a.T)\n", + "print(b.T)" + ] + }, + { + "cell_type": "markdown", + "id": "7cc9ebef", + "metadata": {}, + "source": [ + "### Transpose of Matrix" + ] + }, + { + "cell_type": "code", + "execution_count": 166, + "id": "42537bff", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[30 40 50 60]\n", + "[5 6 4 1]\n" + ] + } + ], + "source": [ + "print(a.T)\n", + "print(b.T)" + ] + }, + { + "cell_type": "markdown", + "id": "2b23de07", + "metadata": {}, + "source": [ + "# Pandas" + ] + }, + { + "cell_type": "markdown", + "id": "31496b63", + "metadata": {}, + "source": [ + "- Pandas is a fast, opensource tool used for data science and data analytics\n", + "- It makes working with “relational” or “labeled” data more easy and intuative.\n", + "- Pandas is used for reading and handling large amout of data.\n", + "- It can handle missing data with ease." + ] + }, + { + "cell_type": "markdown", + "id": "f9ff92ad", + "metadata": {}, + "source": [ + "### Installation\n", + "- Pandas can be installed using simple Pypi package when run on comand prompt" + ] + }, + { + "cell_type": "markdown", + "id": "5614c353", + "metadata": {}, + "source": [ + "> pip install pandas" + ] + }, + { + "cell_type": "markdown", + "id": "6033fdc4", + "metadata": {}, + "source": [ + "### Series\n", + "- Series is a type of 1D array that can hold any type of data\n", + "- Series can hold integer,string,float,etc. types of data" + ] + }, + { + "cell_type": "markdown", + "id": "4e4837b6", + "metadata": {}, + "source": [ + "### Defining a series" + ] + }, + { + "cell_type": "code", + "execution_count": 226, + "id": "2783c4a7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " names class\n", + "0 Jo 10\n", + "1 bala 40\n", + "2 ralo 20\n", + "\n" + ] + } + ], + "source": [ + "information={\n", + " \"names\":[\"Joseph\",\"Daniel\",\"Merry\"],\n", + " \"class\":[10,40,20]\n", + "}\n", + "student=pd.DataFrame(information)\n", + "print(student)\n", + "print(type(student))" + ] + }, + { + "cell_type": "markdown", + "id": "60840f6d", + "metadata": {}, + "source": [ + "### DataFrames\n", + "- Dataframe is 2 Dimensional data structure that can store any type of data inside it.\n", + "- DataFrame consists of 2 Dimensional Series dataset" + ] + }, + { + "cell_type": "markdown", + "id": "0e7025cd", + "metadata": {}, + "source": [ + "#### Defining a DataFrame using dictionary" + ] + }, + { + "cell_type": "code", + "execution_count": 141, + "id": "20b19e09", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " names class\n", + "0 raju 10\n", + "1 bala 40\n", + "2 ralo 20\n", + "\n" + ] + } + ], + "source": [ + "information={\n", + " \"names\":[\"Joesph\",\"Robert\",\"elly\"],\n", + " \"class\":[10,40,20]\n", + "}\n", + "student=pd.DataFrame(information)\n", + "print(student)\n", + "print(type(student))" + ] + }, + { + "cell_type": "markdown", + "id": "0708f450", + "metadata": {}, + "source": [ + "### Operations that can be performed on a DataFrame or Series\n", + "- To calculate maximum value max() function can be used\n", + "- To calculate minimum value min() function can be used\n", + "- To show more infromation about the dataFrame describe() function can be used" + ] + }, + { + "cell_type": "code", + "execution_count": 142, + "id": "1be07789", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "58\n", + "5\n" + ] + } + ], + "source": [ + "#For calculating maximum number\n", + "numbers = pd.Series([10, 5, 58])\n", + "print(numbers.max())\n", + "\n", + "#For calculating minimum number\n", + "\n", + "print(numbers.min())" + ] + }, + { + "cell_type": "markdown", + "id": "f7b03770", + "metadata": {}, + "source": [ + "- To get first 3 rows" + ] + }, + { + "cell_type": "code", + "execution_count": 144, + "id": "ecfd16ff", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namesclass
0raju10
1bala40
2ralo20
\n", + "
" + ], + "text/plain": [ + " names class\n", + "0 raju 10\n", + "1 bala 40\n", + "2 ralo 20" + ] + }, + "execution_count": 144, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "student.head(3)" + ] + }, + { + "cell_type": "markdown", + "id": "8561a86a", + "metadata": {}, + "source": [ + "- To see the tehnical summary of the DataFrame " + ] + }, + { + "cell_type": "code", + "execution_count": 145, + "id": "d85e394d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 3 entries, 0 to 2\n", + "Data columns (total 2 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 names 3 non-null object\n", + " 1 class 3 non-null int64 \n", + "dtypes: int64(1), object(1)\n", + "memory usage: 176.0+ bytes\n" + ] + } + ], + "source": [ + "student.info()" + ] + }, + { + "cell_type": "markdown", + "id": "f2c91c45", + "metadata": {}, + "source": [ + "### Filter specific data from DataFrame\n", + "- Data from a DataFrame can be easily filtered out for any specific condition" + ] + }, + { + "cell_type": "markdown", + "id": "1373dcaf", + "metadata": {}, + "source": [ + "#### Finding students with age greater than " + ] + }, + { + "cell_type": "code", + "execution_count": 208, + "id": "db321d6b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " names age\n", + "1 bala 40\n", + "2 ralo 20\n" + ] + } + ], + "source": [ + "print(student[student[\"age\"]>10])" + ] + }, + { + "cell_type": "markdown", + "id": "f02a79a6", + "metadata": {}, + "source": [ + "### To get number of rows and columns" + ] + }, + { + "cell_type": "code", + "execution_count": 150, + "id": "129c32dd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3 2\n" + ] + } + ], + "source": [ + "rows,columns= student.shape\n", + "print(rows,columns)" + ] + }, + { + "cell_type": "markdown", + "id": "aa556f9f", + "metadata": {}, + "source": [ + "### Converting DataFrame to different format\n", + "- DataFrame can be converted into multiple formats such as csv,json,excel,etc\n", + "- to_csv() is used to convert DataFrame into CSV fromat" + ] + }, + { + "cell_type": "code", + "execution_count": 211, + "id": "7af8351f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " countries medals\n", + "0 Germany 32\n", + "1 Australia 36\n", + "2 United States of America 77\n", + "3 China 70\n", + "4 Great Britain 48\n", + "5 France 25\n" + ] + } + ], + "source": [ + "medals={\n", + " \"countries\":[\"Germany\",\"Australia\",\"United States of America\",\"China\",\"Great Britain\",\"France\"],\n", + " \"medals\":[32,36,77,70,48,25]\n", + "}\n", + "\n", + "medals_data_1=pd.DataFrame(medals)\n", + "print(medals_data_1)\n", + "student.to_csv(\"updated_data.csv\") #To convert file into CSV\n", + "student.to_excel(\"updated_data.xlsx\") #To convert file into excel format" + ] + }, + { + "cell_type": "markdown", + "id": "a45daac5", + "metadata": {}, + "source": [ + "### More operations on DataFrame" + ] + }, + { + "cell_type": "markdown", + "id": "7f923b14", + "metadata": {}, + "source": [ + "- Calculating Mean" + ] + }, + { + "cell_type": "code", + "execution_count": 212, + "id": "f275cdd8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "48.0\n" + ] + } + ], + "source": [ + "mean=medals_data_1[\"medals\"].mean()\n", + "print(mean)" + ] + }, + { + "cell_type": "markdown", + "id": "bb04cf44", + "metadata": {}, + "source": [ + "- Getting more information about the DataFrame" + ] + }, + { + "cell_type": "code", + "execution_count": 215, + "id": "b4ca9809", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "des=medals_data_1[[\"countries\",\"medals\"]].describe\n", + "print(des)" + ] + }, + { + "cell_type": "markdown", + "id": "8aa8f050", + "metadata": {}, + "source": [ + "### Combine data of multiple DataFrames" + ] + }, + { + "cell_type": "code", + "execution_count": 199, + "id": "083a64ed", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " names class age\n", + "0 raju 10.0 NaN\n", + "1 bala 40.0 NaN\n", + "2 ralo 20.0 NaN\n", + "0 Facebook NaN 10.0\n", + "1 google NaN 40.0\n", + "2 yahoo NaN 20.0\n", + "3 linkedin NaN 30.0\n", + "4 netflix NaN 50.0\n", + "5 apple NaN 35.0\n" + ] + } + ], + "source": [ + "new_df=pd.concat([student_data_1,student_data_2], join=\"outer\")\n", + "print(new_df)" + ] + }, + { + "cell_type": "markdown", + "id": "9c18bb87", + "metadata": {}, + "source": [ + "### Reading tabular data file from Pandas\n", + "- Pandas can be used to read tabular data.\n", + "- Pandas read and convert the read tabular data into a DataFrame\n", + "- If the data is not in the root directory in which we are running the file then we have to declare whole path of the file" + ] + }, + { + "cell_type": "code", + "execution_count": 218, + "id": "56470220", + "metadata": {}, + "outputs": [ + { + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: 'Medals.csv'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mpandas\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mmedal_info\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mpandas\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"Medals.csv\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmedal_info\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread_csv\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)\u001b[0m\n\u001b[0;32m 608\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkwds_defaults\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 609\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 610\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 611\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 612\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 460\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 461\u001b[0m \u001b[1;31m# Create the parser.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 462\u001b[1;33m \u001b[0mparser\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 463\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 464\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mchunksize\u001b[0m \u001b[1;32mor\u001b[0m \u001b[0miterator\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[0;32m 817\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"has_index_names\"\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"has_index_names\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 818\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 819\u001b[1;33m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 820\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 821\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_make_engine\u001b[1;34m(self, engine)\u001b[0m\n\u001b[0;32m 1048\u001b[0m )\n\u001b[0;32m 1049\u001b[0m \u001b[1;31m# error: Too many arguments for \"ParserBase\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1050\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mmapping\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mengine\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mf\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# type: ignore[call-arg]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1051\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1052\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0m_failover_to_python\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, src, **kwds)\u001b[0m\n\u001b[0;32m 1865\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1866\u001b[0m \u001b[1;31m# open handles\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1867\u001b[1;33m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_open_handles\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1868\u001b[0m \u001b[1;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mhandles\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1869\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mkey\u001b[0m \u001b[1;32min\u001b[0m \u001b[1;33m(\u001b[0m\u001b[1;34m\"storage_options\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"encoding\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"memory_map\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"compression\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_open_handles\u001b[1;34m(self, src, kwds)\u001b[0m\n\u001b[0;32m 1360\u001b[0m \u001b[0mLet\u001b[0m \u001b[0mthe\u001b[0m \u001b[0mreaders\u001b[0m \u001b[0mopen\u001b[0m \u001b[0mIOHanldes\u001b[0m \u001b[0mafter\u001b[0m \u001b[0mthey\u001b[0m \u001b[0mare\u001b[0m \u001b[0mdone\u001b[0m \u001b[1;32mwith\u001b[0m \u001b[0mtheir\u001b[0m \u001b[0mpotential\u001b[0m \u001b[0mraises\u001b[0m\u001b[1;33m.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1361\u001b[0m \"\"\"\n\u001b[1;32m-> 1362\u001b[1;33m self.handles = get_handle(\n\u001b[0m\u001b[0;32m 1363\u001b[0m \u001b[0msrc\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1364\u001b[0m \u001b[1;34m\"r\"\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\common.py\u001b[0m in \u001b[0;36mget_handle\u001b[1;34m(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)\u001b[0m\n\u001b[0;32m 640\u001b[0m \u001b[0merrors\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m\"replace\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 641\u001b[0m \u001b[1;31m# Encoding\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 642\u001b[1;33m handle = open(\n\u001b[0m\u001b[0;32m 643\u001b[0m \u001b[0mhandle\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 644\u001b[0m \u001b[0mioargs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmode\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'Medals.csv'" + ] + } + ], + "source": [ + "medal_info=pd.read_csv(\"Medals.csv\")\n", + "print(medal_info)" + ] + }, + { + "cell_type": "markdown", + "id": "b19d037b", + "metadata": {}, + "source": [ + "### How Pandas is better than conventional methods\n", + "- Convetional methods uses file handling to read data from a file\n", + "- Pandas library makes it easier to read a file and perform suitable operations upon it" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e51bb3ea", + "metadata": {}, + "outputs": [], + "source": [ + "tempratures=[]\n", + "with open(\"Medals.csv\") as data_file:\n", + " data=csv.reader(data_file)\n", + " medal=[]\n", + " for row in data:\n", + " medal.append(int(row[1]))\n", + "print(medal)" + ] + }, + { + "cell_type": "markdown", + "id": "0fca54ce", + "metadata": {}, + "source": [ + "### Checking Duplicate rows in a DataFrame" + ] + }, + { + "cell_type": "code", + "execution_count": 222, + "id": "926c3adb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 False\n", + "1 False\n", + "2 False\n", + "3 True\n", + "4 False\n", + "5 False\n", + "dtype: bool\n" + ] + } + ], + "source": [ + "information={\n", + " \"names\":[\"raju\",\"bala\",\"ralo\",\"raju\",\"Elli\",\"Lilly\"],\n", + " \"age\":[10,40,20,10,50,35]\n", + "}\n", + "student=pd.DataFrame.from_dict(information)\n", + "print(student.duplicated())" + ] + }, + { + "cell_type": "markdown", + "id": "b81d8a0f", + "metadata": {}, + "source": [ + "### Checking percentage change of Data\n", + "- DataFrame.pct_change() is used to calculate change between the previous and the next elements" + ] + }, + { + "cell_type": "code", + "execution_count": 228, + "id": "4743ebe5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 2011 2012 2013\n", + "Jacob NaN -0.061224 -0.163043\n", + "Joseph NaN 0.113636 -0.112245\n", + "Angela NaN 0.155844 0.011236\n" + ] + } + ], + "source": [ + "change_in_marks=pd.DataFrame({\n", + " '2011':[98, 88,77],\n", + " '2012':[92, 98,89], \n", + " '2013':[77,87,90]},\n", + " index = ['Jacob','Joseph','Angela'])\n", + "\n", + "print(change_in_marks.pct_change(axis='columns'))" + ] + }, + { + "cell_type": "markdown", + "id": "190bf3ae", + "metadata": {}, + "source": [ + "### Removing NaN values from DataFrame\n", + "- NaN values are empty values in the dataset\n", + "- NaN values can be converted into zeros in order to avoid errors usng the function fillna(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 227, + "id": "b8d08c46", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 2011 2012 2013\n", + "Jacob 0.0 -0.061224 -0.163043\n", + "Joseph 0.0 0.113636 -0.112245\n", + "Angela 0.0 0.155844 0.011236\n" + ] + } + ], + "source": [ + "change_in_marks=pd.DataFrame({\n", + " '2011':[98, 88,77],\n", + " '2012':[92, 98,89], \n", + " '2013':[77,87,90]},\n", + " index = ['Jacob','Joseph','Angela'])\n", + "\n", + "print(change_in_marks.pct_change(axis='columns').fillna(0))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}