diff --git a/Sumit_Gaur_Numpy and Pandas.ipynb b/Sumit_Gaur_Numpy and Pandas.ipynb
new file mode 100644
index 00000000..110ba857
--- /dev/null
+++ b/Sumit_Gaur_Numpy and Pandas.ipynb
@@ -0,0 +1,1239 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "5dacfd59",
+ "metadata": {},
+ "source": [
+ "# ANALYTICAL PYTHON"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dab926de",
+ "metadata": {},
+ "source": [
+ "## NUMPY"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2c38f678",
+ "metadata": {},
+ "source": [
+ "Numpy is the most used library for fundamental scientific computing in python it provides multidimesinal array obejct.It is an extension module for Python,with a lot of it written in C.Since it is written in C this enables it to make the computation fast and ensures that the processing speeds are low.It offers a great computational speeds compared to build in python arrays. Dimensions in NumPy are called as Axes.\n",
+ "\n",
+ "### Features of NumPy\n",
+ "- NumPy arrays have their size fixed, unlike lists they cannot be changed\n",
+ "- Elements in a NumPy array cannot be of differnt size, they only have to be of the same type.\n",
+ "- We can store large numbers in a NumPy array, it consumes less size compared to the conventional arrays and offer much better effeciency.\n",
+ "- It is an open source library.\n",
+ "- Offers effecient calculations of mathematical functions that operate on arrays and matrices."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "23e96f96",
+ "metadata": {},
+ "source": [
+ "### Installtion of NumPy\n",
+ "#### Installation procedure for Windows,macOS and Linux Systems\n",
+ "- For Begineers reccomended way to use NumPy is to first install Anaconda Installer from [Anaconda Website](www.anaconda.com)\n",
+ "- choose a Python 3.6 graphical and download the file\n",
+ "- After the file gets downloaded click on the downloaded file to start the installation\n",
+ "- Click Next\n",
+ "- Read the licensing terms and click on I agree\n",
+ "- Select an install for \"Just Me unless you want the program for all users.\n",
+ "- Select the destination where you want it to be installed.\n",
+ "- Select Add Anaconda to my PATH environment variable which means that you would be able to use anaconda on the command prompt and click install.\n",
+ "- Click Finish"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "78fe0269",
+ "metadata": {},
+ "source": [
+ "#### Alternatively NumPy can be installed with pip by using"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bb0be93d",
+ "metadata": {},
+ "source": [
+ "
pip install numpy
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab8c9227",
+ "metadata": {},
+ "source": [
+ "##### Arrays in NumPy\n",
+ "- Arrays is collection of same type of data types\n",
+ "- Offers more features than the standard python library class array.array which can only handle 1 dimension and are indexed by non negetive integers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 233,
+ "id": "4f0321b3",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1.20.1\n"
+ ]
+ }
+ ],
+ "source": [
+ "#To get the version of NumPy\n",
+ "print(np.__version__)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 232,
+ "id": "48d0b390",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Defining an array in NumPy\n",
+ "import numpy as np #importing numpy as np\n",
+ "arr=type(np.array([1,2,3,4]))\n",
+ "print(arr)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 230,
+ "id": "0f47a625",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[ 1 2 3 4 5]\n",
+ " [ 6 7 8 9 10]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "b=np.array([1,2,3,4,5,6,7,8,9,10])\n",
+ "new_arr=b.reshape(2,5)\n",
+ "print(new_arr)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5e8c7533",
+ "metadata": {},
+ "source": [
+ "### Dimensions of arrays \n",
+ "- Dimensions of arrays are the number of keys or indices that we need to specify individual elements of the array\n",
+ "- An array can be of multiple dimensions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 235,
+ "id": "b5db9153",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[1 2 3 4]\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Defining a 1 Dimension array using NumPy\n",
+ "print(np.array([1,2,3,4]))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 249,
+ "id": "58960bac",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[1 2 3 4]\n",
+ " [5 6 7 8]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Defining a 2 Dimension array using NumPy\n",
+ "two_dimension_array=np.array([[1,2,3,4],[5,6,7,8]])\n",
+ "print(two_dimension_array)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 254,
+ "id": "584a1489",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2\n"
+ ]
+ }
+ ],
+ "source": [
+ "#defining a 3 dimension array using NumPy\n",
+ "three_dimension_array=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ea21f9aa",
+ "metadata": {},
+ "source": [
+ "###### Dimensions of an array can be checked by using shape function"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 257,
+ "id": "0b866123",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2\n"
+ ]
+ }
+ ],
+ "source": [
+ "import numpy as np\n",
+ "dimensions=three_dimension_array.ndim\n",
+ "print(dimensions)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7dae6b38",
+ "metadata": {},
+ "source": [
+ "##### Total number of elements in an array can be checked by using size"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 117,
+ "id": "99e52976",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "10\n"
+ ]
+ }
+ ],
+ "source": [
+ "b=np.array([1,2,3,4,5,6,7,8,9,10])\n",
+ "new_arr=b.reshape(2,5)\n",
+ "print(new_arr.size)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "49af86d7",
+ "metadata": {},
+ "source": [
+ "#### Creation of arrays in numpy"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 118,
+ "id": "b5c124cf",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "int32\n"
+ ]
+ }
+ ],
+ "source": [
+ "b=np.array([1,2,3,4,5,6,7,8,9,10])\n",
+ "print(b.dtype)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "db5f81ff",
+ "metadata": {},
+ "source": [
+ "#### Zeros,ones and empty in numpy\n",
+ "- Zeros is used to define a numpy array which has all the elements as zeros\n",
+ "- Ones is used to define a numpy array which has all the elements as ones\n",
+ "- Empty is used to created a numpy array with random values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 121,
+ "id": "cb88d8fe",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[0. 0. 0. 0. 0.]\n",
+ " [0. 0. 0. 0. 0.]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "a=np.zeros((2,5))\n",
+ "print(a)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 123,
+ "id": "3472a7bb",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[1 1 1 1 1 1]\n",
+ " [1 1 1 1 1 1]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "a=np.ones((2,6),dtype=np.int16)\n",
+ "print(a)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 124,
+ "id": "d8377547",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[0. 0. 0. 0. 0.]\n",
+ " [0. 0. 0. 0. 0.]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "a=np.empty((2,5))\n",
+ "print(a)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e55d7003",
+ "metadata": {},
+ "source": [
+ "#### arange in NumPy\n",
+ "- It is used to create sequence of arrays\n",
+ "- It can be used to generate multidimension arrays"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 125,
+ "id": "b004b428",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0 1 2 3 4 5 6 7 8 9]\n",
+ "[[0 1 2 3 4]\n",
+ " [5 6 7 8 9]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "#For 1 Dimension array\n",
+ "a=np.arange(10)\n",
+ "print(a)\n",
+ "\n",
+ "#For 2 Dimension array\n",
+ "a= np.arange(10).reshape(2,5)\n",
+ "print(a)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "04045bc6",
+ "metadata": {},
+ "source": [
+ "#### Mathematical operations in NumPy\n",
+ "- Basic mathematical operations can be performed in NumPy element wise"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 126,
+ "id": "71c4802b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[35 46 54 61]\n",
+ "[25 34 46 59]\n",
+ "[150 240 200 60]\n"
+ ]
+ }
+ ],
+ "source": [
+ "a=np.array([30,40,50,60])\n",
+ "b=np.array([5,6,4,1])\n",
+ "\n",
+ "print(a+b)\n",
+ "print(a-b)\n",
+ "print(a*b)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4f1afbf1",
+ "metadata": {},
+ "source": [
+ "#### Other operations that can be performed"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 129,
+ "id": "2c3c4766",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "105\n",
+ "14\n",
+ "0\n"
+ ]
+ }
+ ],
+ "source": [
+ "a=np.arange(15)\n",
+ "print(a.sum())\n",
+ "print(a.max())\n",
+ "print(a.min())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cb5e553f",
+ "metadata": {},
+ "source": [
+ "### Matrix operations on NumPy\n",
+ "- NumPy can perform matrix operations with high effeciency and less time"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2f7c9ac1",
+ "metadata": {},
+ "source": [
+ "### Dot multiplication of matrices"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 224,
+ "id": "60a6e84f",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "650\n"
+ ]
+ }
+ ],
+ "source": [
+ "a=np.array([30,40,50,60])\n",
+ "b=np.array([5,6,4,1])\n",
+ "dot_multiplication=np.dot(a,b)\n",
+ "print(dot_multiplication)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a5c74a46",
+ "metadata": {},
+ "source": [
+ "### Square root of elements of Matrices"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 163,
+ "id": "e923422b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[5.47722558 6.32455532 7.07106781 7.74596669]\n",
+ "[2.23606798 2.44948974 2. 1. ]\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(a.T)\n",
+ "print(b.T)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7cc9ebef",
+ "metadata": {},
+ "source": [
+ "### Transpose of Matrix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 166,
+ "id": "42537bff",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[30 40 50 60]\n",
+ "[5 6 4 1]\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(a.T)\n",
+ "print(b.T)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b23de07",
+ "metadata": {},
+ "source": [
+ "# Pandas"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31496b63",
+ "metadata": {},
+ "source": [
+ "- Pandas is a fast, opensource tool used for data science and data analytics\n",
+ "- It makes working with “relational” or “labeled” data more easy and intuative.\n",
+ "- Pandas is used for reading and handling large amout of data.\n",
+ "- It can handle missing data with ease."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f9ff92ad",
+ "metadata": {},
+ "source": [
+ "### Installation\n",
+ "- Pandas can be installed using simple Pypi package when run on comand prompt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5614c353",
+ "metadata": {},
+ "source": [
+ "> pip install pandas"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6033fdc4",
+ "metadata": {},
+ "source": [
+ "### Series\n",
+ "- Series is a type of 1D array that can hold any type of data\n",
+ "- Series can hold integer,string,float,etc. types of data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e4837b6",
+ "metadata": {},
+ "source": [
+ "### Defining a series"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 226,
+ "id": "2783c4a7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " names class\n",
+ "0 Jo 10\n",
+ "1 bala 40\n",
+ "2 ralo 20\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "information={\n",
+ " \"names\":[\"Joseph\",\"Daniel\",\"Merry\"],\n",
+ " \"class\":[10,40,20]\n",
+ "}\n",
+ "student=pd.DataFrame(information)\n",
+ "print(student)\n",
+ "print(type(student))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "60840f6d",
+ "metadata": {},
+ "source": [
+ "### DataFrames\n",
+ "- Dataframe is 2 Dimensional data structure that can store any type of data inside it.\n",
+ "- DataFrame consists of 2 Dimensional Series dataset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0e7025cd",
+ "metadata": {},
+ "source": [
+ "#### Defining a DataFrame using dictionary"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 141,
+ "id": "20b19e09",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " names class\n",
+ "0 raju 10\n",
+ "1 bala 40\n",
+ "2 ralo 20\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "information={\n",
+ " \"names\":[\"Joesph\",\"Robert\",\"elly\"],\n",
+ " \"class\":[10,40,20]\n",
+ "}\n",
+ "student=pd.DataFrame(information)\n",
+ "print(student)\n",
+ "print(type(student))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0708f450",
+ "metadata": {},
+ "source": [
+ "### Operations that can be performed on a DataFrame or Series\n",
+ "- To calculate maximum value max() function can be used\n",
+ "- To calculate minimum value min() function can be used\n",
+ "- To show more infromation about the dataFrame describe() function can be used"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 142,
+ "id": "1be07789",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "58\n",
+ "5\n"
+ ]
+ }
+ ],
+ "source": [
+ "#For calculating maximum number\n",
+ "numbers = pd.Series([10, 5, 58])\n",
+ "print(numbers.max())\n",
+ "\n",
+ "#For calculating minimum number\n",
+ "\n",
+ "print(numbers.min())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7b03770",
+ "metadata": {},
+ "source": [
+ "- To get first 3 rows"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 144,
+ "id": "ecfd16ff",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " names | \n",
+ " class | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " raju | \n",
+ " 10 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " bala | \n",
+ " 40 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " ralo | \n",
+ " 20 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " names class\n",
+ "0 raju 10\n",
+ "1 bala 40\n",
+ "2 ralo 20"
+ ]
+ },
+ "execution_count": 144,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "student.head(3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8561a86a",
+ "metadata": {},
+ "source": [
+ "- To see the tehnical summary of the DataFrame "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 145,
+ "id": "d85e394d",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "RangeIndex: 3 entries, 0 to 2\n",
+ "Data columns (total 2 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 names 3 non-null object\n",
+ " 1 class 3 non-null int64 \n",
+ "dtypes: int64(1), object(1)\n",
+ "memory usage: 176.0+ bytes\n"
+ ]
+ }
+ ],
+ "source": [
+ "student.info()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f2c91c45",
+ "metadata": {},
+ "source": [
+ "### Filter specific data from DataFrame\n",
+ "- Data from a DataFrame can be easily filtered out for any specific condition"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1373dcaf",
+ "metadata": {},
+ "source": [
+ "#### Finding students with age greater than "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 208,
+ "id": "db321d6b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " names age\n",
+ "1 bala 40\n",
+ "2 ralo 20\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(student[student[\"age\"]>10])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f02a79a6",
+ "metadata": {},
+ "source": [
+ "### To get number of rows and columns"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 150,
+ "id": "129c32dd",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "3 2\n"
+ ]
+ }
+ ],
+ "source": [
+ "rows,columns= student.shape\n",
+ "print(rows,columns)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aa556f9f",
+ "metadata": {},
+ "source": [
+ "### Converting DataFrame to different format\n",
+ "- DataFrame can be converted into multiple formats such as csv,json,excel,etc\n",
+ "- to_csv() is used to convert DataFrame into CSV fromat"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 211,
+ "id": "7af8351f",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " countries medals\n",
+ "0 Germany 32\n",
+ "1 Australia 36\n",
+ "2 United States of America 77\n",
+ "3 China 70\n",
+ "4 Great Britain 48\n",
+ "5 France 25\n"
+ ]
+ }
+ ],
+ "source": [
+ "medals={\n",
+ " \"countries\":[\"Germany\",\"Australia\",\"United States of America\",\"China\",\"Great Britain\",\"France\"],\n",
+ " \"medals\":[32,36,77,70,48,25]\n",
+ "}\n",
+ "\n",
+ "medals_data_1=pd.DataFrame(medals)\n",
+ "print(medals_data_1)\n",
+ "student.to_csv(\"updated_data.csv\") #To convert file into CSV\n",
+ "student.to_excel(\"updated_data.xlsx\") #To convert file into excel format"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a45daac5",
+ "metadata": {},
+ "source": [
+ "### More operations on DataFrame"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7f923b14",
+ "metadata": {},
+ "source": [
+ "- Calculating Mean"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 212,
+ "id": "f275cdd8",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "48.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "mean=medals_data_1[\"medals\"].mean()\n",
+ "print(mean)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bb04cf44",
+ "metadata": {},
+ "source": [
+ "- Getting more information about the DataFrame"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 215,
+ "id": "b4ca9809",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "des=medals_data_1[[\"countries\",\"medals\"]].describe\n",
+ "print(des)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8aa8f050",
+ "metadata": {},
+ "source": [
+ "### Combine data of multiple DataFrames"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 199,
+ "id": "083a64ed",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " names class age\n",
+ "0 raju 10.0 NaN\n",
+ "1 bala 40.0 NaN\n",
+ "2 ralo 20.0 NaN\n",
+ "0 Facebook NaN 10.0\n",
+ "1 google NaN 40.0\n",
+ "2 yahoo NaN 20.0\n",
+ "3 linkedin NaN 30.0\n",
+ "4 netflix NaN 50.0\n",
+ "5 apple NaN 35.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "new_df=pd.concat([student_data_1,student_data_2], join=\"outer\")\n",
+ "print(new_df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9c18bb87",
+ "metadata": {},
+ "source": [
+ "### Reading tabular data file from Pandas\n",
+ "- Pandas can be used to read tabular data.\n",
+ "- Pandas read and convert the read tabular data into a DataFrame\n",
+ "- If the data is not in the root directory in which we are running the file then we have to declare whole path of the file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 218,
+ "id": "56470220",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "FileNotFoundError",
+ "evalue": "[Errno 2] No such file or directory: 'Medals.csv'",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[1;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
+ "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mpandas\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mmedal_info\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mpandas\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"Medals.csv\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmedal_info\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread_csv\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)\u001b[0m\n\u001b[0;32m 608\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkwds_defaults\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 609\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 610\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 611\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 612\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 460\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 461\u001b[0m \u001b[1;31m# Create the parser.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 462\u001b[1;33m \u001b[0mparser\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 463\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 464\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mchunksize\u001b[0m \u001b[1;32mor\u001b[0m \u001b[0miterator\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[0;32m 817\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"has_index_names\"\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"has_index_names\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 818\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 819\u001b[1;33m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 820\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 821\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_make_engine\u001b[1;34m(self, engine)\u001b[0m\n\u001b[0;32m 1048\u001b[0m )\n\u001b[0;32m 1049\u001b[0m \u001b[1;31m# error: Too many arguments for \"ParserBase\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1050\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mmapping\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mengine\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mf\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# type: ignore[call-arg]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1051\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1052\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0m_failover_to_python\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, src, **kwds)\u001b[0m\n\u001b[0;32m 1865\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1866\u001b[0m \u001b[1;31m# open handles\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1867\u001b[1;33m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_open_handles\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1868\u001b[0m \u001b[1;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mhandles\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1869\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mkey\u001b[0m \u001b[1;32min\u001b[0m \u001b[1;33m(\u001b[0m\u001b[1;34m\"storage_options\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"encoding\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"memory_map\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"compression\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_open_handles\u001b[1;34m(self, src, kwds)\u001b[0m\n\u001b[0;32m 1360\u001b[0m \u001b[0mLet\u001b[0m \u001b[0mthe\u001b[0m \u001b[0mreaders\u001b[0m \u001b[0mopen\u001b[0m \u001b[0mIOHanldes\u001b[0m \u001b[0mafter\u001b[0m \u001b[0mthey\u001b[0m \u001b[0mare\u001b[0m \u001b[0mdone\u001b[0m \u001b[1;32mwith\u001b[0m \u001b[0mtheir\u001b[0m \u001b[0mpotential\u001b[0m \u001b[0mraises\u001b[0m\u001b[1;33m.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1361\u001b[0m \"\"\"\n\u001b[1;32m-> 1362\u001b[1;33m self.handles = get_handle(\n\u001b[0m\u001b[0;32m 1363\u001b[0m \u001b[0msrc\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1364\u001b[0m \u001b[1;34m\"r\"\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\io\\common.py\u001b[0m in \u001b[0;36mget_handle\u001b[1;34m(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)\u001b[0m\n\u001b[0;32m 640\u001b[0m \u001b[0merrors\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m\"replace\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 641\u001b[0m \u001b[1;31m# Encoding\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 642\u001b[1;33m handle = open(\n\u001b[0m\u001b[0;32m 643\u001b[0m \u001b[0mhandle\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 644\u001b[0m \u001b[0mioargs\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmode\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'Medals.csv'"
+ ]
+ }
+ ],
+ "source": [
+ "medal_info=pd.read_csv(\"Medals.csv\")\n",
+ "print(medal_info)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b19d037b",
+ "metadata": {},
+ "source": [
+ "### How Pandas is better than conventional methods\n",
+ "- Convetional methods uses file handling to read data from a file\n",
+ "- Pandas library makes it easier to read a file and perform suitable operations upon it"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e51bb3ea",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "tempratures=[]\n",
+ "with open(\"Medals.csv\") as data_file:\n",
+ " data=csv.reader(data_file)\n",
+ " medal=[]\n",
+ " for row in data:\n",
+ " medal.append(int(row[1]))\n",
+ "print(medal)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0fca54ce",
+ "metadata": {},
+ "source": [
+ "### Checking Duplicate rows in a DataFrame"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 222,
+ "id": "926c3adb",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0 False\n",
+ "1 False\n",
+ "2 False\n",
+ "3 True\n",
+ "4 False\n",
+ "5 False\n",
+ "dtype: bool\n"
+ ]
+ }
+ ],
+ "source": [
+ "information={\n",
+ " \"names\":[\"raju\",\"bala\",\"ralo\",\"raju\",\"Elli\",\"Lilly\"],\n",
+ " \"age\":[10,40,20,10,50,35]\n",
+ "}\n",
+ "student=pd.DataFrame.from_dict(information)\n",
+ "print(student.duplicated())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b81d8a0f",
+ "metadata": {},
+ "source": [
+ "### Checking percentage change of Data\n",
+ "- DataFrame.pct_change() is used to calculate change between the previous and the next elements"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 228,
+ "id": "4743ebe5",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 2011 2012 2013\n",
+ "Jacob NaN -0.061224 -0.163043\n",
+ "Joseph NaN 0.113636 -0.112245\n",
+ "Angela NaN 0.155844 0.011236\n"
+ ]
+ }
+ ],
+ "source": [
+ "change_in_marks=pd.DataFrame({\n",
+ " '2011':[98, 88,77],\n",
+ " '2012':[92, 98,89], \n",
+ " '2013':[77,87,90]},\n",
+ " index = ['Jacob','Joseph','Angela'])\n",
+ "\n",
+ "print(change_in_marks.pct_change(axis='columns'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "190bf3ae",
+ "metadata": {},
+ "source": [
+ "### Removing NaN values from DataFrame\n",
+ "- NaN values are empty values in the dataset\n",
+ "- NaN values can be converted into zeros in order to avoid errors usng the function fillna(0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 227,
+ "id": "b8d08c46",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 2011 2012 2013\n",
+ "Jacob 0.0 -0.061224 -0.163043\n",
+ "Joseph 0.0 0.113636 -0.112245\n",
+ "Angela 0.0 0.155844 0.011236\n"
+ ]
+ }
+ ],
+ "source": [
+ "change_in_marks=pd.DataFrame({\n",
+ " '2011':[98, 88,77],\n",
+ " '2012':[92, 98,89], \n",
+ " '2013':[77,87,90]},\n",
+ " index = ['Jacob','Joseph','Angela'])\n",
+ "\n",
+ "print(change_in_marks.pct_change(axis='columns').fillna(0))"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}