diff --git a/.ipynb_checkpoints/harry-potter-04a-checkpoint.ipynb b/.ipynb_checkpoints/harry-potter-04a-checkpoint.ipynb new file mode 100644 index 0000000..2080e43 --- /dev/null +++ b/.ipynb_checkpoints/harry-potter-04a-checkpoint.ipynb @@ -0,0 +1,811 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "841c0957-15dd-47c6-b886-3a621d8960e2", + "metadata": {}, + "source": [ + "# The Gemika's Magical Guide to Sorting Hogwarts Students using the Decision Tree Algorithm (Part #4A)\n", + "\n", + "![machine-learning-03.jpg](images/machine-learning-28.jpg)" + ] + }, + { + "cell_type": "markdown", + "id": "adecb9d3-c42f-4ebf-933f-7fa0c9f03cfc", + "metadata": {}, + "source": [ + "## 4A. Unveiling the Mysteries: Data Exploration (EDA) 🔍" + ] + }, + { + "cell_type": "markdown", + "id": "8d13604a-37c9-4864-9852-2250abca3aba", + "metadata": {}, + "source": [ + "Our first step is to take a glimpse at the first few rows of our dataset, much like opening the [Marauder's Map](https://harrypotter.fandom.com/wiki/Marauder%27s_Map) for the first time. This will give us an initial understanding of the structure and contents of our data. And dear sorcerers, if you wished to follow along on this magical journey, make a fork of the dataset from my Github account and download the dataset to your local machine from this [magical address](https://github.com/leonism/the-gemikas-magical-guide-to-sorting-hogwarts-students-using-the-decision-tree-algorithm)." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "57263cab-ffd4-4220-85b6-8256c9430ebd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " name gender age origin specialty \\\n", + "0 Harry Potter Male 11 England Defense Against the Dark Arts \n", + "1 Hermione Granger Female 11 England Transfiguration \n", + "2 Ron Weasley Male 11 England Chess \n", + "3 Draco Malfoy Male 11 England Potions \n", + "4 Luna Lovegood Female 11 Ireland Creatures \n", + "\n", + " house blood_status pet wand_type patronus \\\n", + "0 Gryffindor Half-blood Owl Holly Stag \n", + "1 Gryffindor Muggle-born Cat Vine Otter \n", + "2 Gryffindor Pure-blood Rat Ash Jack Russell Terrier \n", + "3 Slytherin Pure-blood Owl Hawthorn NaN \n", + "4 Ravenclaw Half-blood NaN Fir Hare \n", + "\n", + " quidditch_position boggart favorite_class \\\n", + "0 Seeker Dementor Defense Against the Dark Arts \n", + "1 NaN Failure Arithmancy \n", + "2 Keeper Spider Charms \n", + "3 Seeker Lord Voldemort Potions \n", + "4 NaN Her mother Creatures \n", + "\n", + " house_points \n", + "0 150.0 \n", + "1 200.0 \n", + "2 50.0 \n", + "3 100.0 \n", + "4 120.0 \n" + ] + } + ], + "source": [ + "# Inspecting the first few rows of the dataset\n", + "dataset_path = 'data/hogwarts-students.csv' # Path to our dataset\n", + "hogwarts_df = pd.read_csv(dataset_path)\n", + "print(hogwarts_df.head())" + ] + }, + { + "cell_type": "markdown", + "id": "ef99dd67-61fc-492d-a76a-15d2331231fe", + "metadata": {}, + "source": [ + "As we peer into these rows, we see a variety of features such as **student names**, **house affiliations**, and **various traits**. Each **row is a story**, each **column a chapter**. We might notice, for example, that **Harry, Hermione, and Ron** are all in **_Gryffindor_**, characterized by their bravery and determination. This initial inspection helps us understand the scope and scale of our dataset." + ] + }, + { + "cell_type": "markdown", + "id": "62d15617-d3e1-44ed-bf86-306496b8efa5", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "903cf4e8-6ea5-4739-9210-2acc83acfa5a", + "metadata": {}, + "source": [ + "### **4A.2 Checking Dataset Features**" + ] + }, + { + "cell_type": "markdown", + "id": "5562cdb9-db13-43d5-8582-b361ece6e983", + "metadata": {}, + "source": [ + "Next, we delve deeper into the columns of our DataFrame, much like how Hermione would meticulously study her textbooks. Each column represents a different feature of our students, from their house to their magical abilities.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f20408f5-2446-442c-ae99-dbee06e14636", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Index(['name', 'gender', 'age', 'origin', 'specialty', 'house', 'blood_status',\n", + " 'pet', 'wand_type', 'patronus', 'quidditch_position', 'boggart',\n", + " 'favorite_class', 'house_points'],\n", + " dtype='object')\n" + ] + } + ], + "source": [ + "# Displaying the columns of the dataset\n", + "print(hogwarts_df.columns)\n" + ] + }, + { + "cell_type": "markdown", + "id": "73dee7e5-64f6-4d32-bb57-19884fe896d8", + "metadata": {}, + "source": [ + "As the magic spell finished its wizardry, the previous magical spell reveal the following **hidden artifacts**." + ] + }, + { + "cell_type": "markdown", + "id": "42290e5a-6013-4b6c-859e-8dda5aef026c", + "metadata": {}, + "source": [ + "```\n", + "Index(['name', 'gender', 'age', 'origin', 'specialty', 'house', 'blood_status',\n", + " 'pet', 'wand_type', 'patronus', 'quidditch_position', 'boggart',\n", + " 'favorite_class', 'house_points'],\n", + " dtype='object')\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "b1a72bb3-6691-4edf-9cfe-80e0e4e02dee", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(52, 14)\n" + ] + } + ], + "source": [ + "# Displaying the how many rows and columns in the dataset\n", + "print(hogwarts_df.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "1233f472-dc69-4ddf-8458-7410b6db1ff8", + "metadata": {}, + "source": [ + "And you're guessing correctly sorcerers, the dataset consists of 52 rows and 14 columns.✨🌟" + ] + }, + { + "cell_type": "markdown", + "id": "d23e0b4c-ca27-4b0b-b577-59f2871b5263", + "metadata": {}, + "source": [ + "Let us explore these features, each as significant as a spell component in a well-crafted incantation:\n", + "\n", + "- **Name**: The given name of our witch or wizard, from the illustrious Harry Potter to the enigmatic Luna Lovegood. 🌟\n", + "- **Gender**: Whether they are a young wizard or witch, reflecting the diversity of Hogwarts.\n", + "- **Age**: Their age at the time of sorting, for even the youngest students have their place in the castle's storied history.\n", + "- **Origin**: The place they hail from, be it the rolling hills of England, the rugged highlands of Scotland, or the enchanting isles of Ireland. 🏞️\n", + "- **Specialty**: Their area of magical expertise, such as Potions, Transfiguration, or Defense Against the Dark Arts, much like Professor Snape’s mastery of the subtle art of potion-making.\n", + "- **House**: The revered house to which they belong—Gryffindor, Hufflepuff, Ravenclaw, or Slytherin—each with its own rich traditions and values.\n", + "- **Blood Status**: Whether they are Pure-blood, Half-blood, or Muggle-born, a detail that, while significant in the wizarding world, never diminishes their magical potential.\n", + "- **Pet**: Their chosen magical companion, be it an owl, a cat, or a toad, reminiscent of Harry's loyal Hedwig or Hermione's clever Crookshanks. 🦉🐈\n", + "- **Wand Type**: The wood and core of their wand, the very tool of their magical prowess.\n", + "- **Patronus**: The form their Patronus takes, a magical manifestation of their innermost self, like Harry's proud stag or Snape's ethereal doe. 🦌\n", + "- **Quidditch Position**: Their role in the beloved wizarding sport, whether Seeker, Chaser, Beater, or Keeper, or perhaps no position at all.\n", + "- **Boggart**: The form their Boggart takes, a glimpse into their deepest fears.\n", + "- **Favorite Class**: The subject they excel in or enjoy the most, akin to Hermione's love for Arithmancy or Neville's talent in Herbology.\n", + "- **House Points**: Points they have contributed to their house, reflecting their achievements and misadventures alike.\n", + "\n", + "With this compendium of magical features, we craft our dataset with the precision of a spell-wright composing a new enchantment. Each character's details are meticulously recorded, ensuring that our data is as rich and detailed as the tapestry of Hogwarts itself.🧙‍♂️🏰\n", + "\n", + "By examining these features, we gain a deeper understanding of the dataset's richness, much like a wizard learning about the different properties of magical creatures. As we assemble this treasure trove of information, we prepare ourselves for the next step in our magical journey—transforming these attributes into the foundations upon which our **Decision Tree** algorithm will cast its spell. Let us proceed, dear sorcerers, for the magic is only just beginning.✨🧙‍♂️ \n" + ] + }, + { + "cell_type": "markdown", + "id": "0e0fce39-4e08-42e3-b06c-3b3fdf12a3f9", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "f1029ad6-d57e-405c-8463-4384f2552d92", + "metadata": {}, + "source": [ + "### **4A.3 Inspecting Data Types**" + ] + }, + { + "cell_type": "markdown", + "id": "894243b9-0a21-4bd0-8977-79e807e3ea7e", + "metadata": {}, + "source": [ + "With a clear understanding of our features, we now turn our attention to the data types. This step is akin to examining the ingredients of a potion, ensuring each component is appropriate for its intended use." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "0705d956-ad45-43ba-9834-6b3e5ec2927f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "name object\n", + "gender object\n", + "age int64\n", + "origin object\n", + "specialty object\n", + "house object\n", + "blood_status object\n", + "pet object\n", + "wand_type object\n", + "patronus object\n", + "quidditch_position object\n", + "boggart object\n", + "favorite_class object\n", + "house_points float64\n", + "dtype: object\n" + ] + } + ], + "source": [ + "# Checking the data types of each column\n", + "print(hogwarts_df.dtypes)\n" + ] + }, + { + "cell_type": "markdown", + "id": "173547d1-6acc-43d8-bb21-a89bc06aa5c6", + "metadata": {}, + "source": [ + "And in return, the previous magic spell would yield us, dear sorcerers the following incarnations." + ] + }, + { + "cell_type": "markdown", + "id": "78459af8-f7bb-4d5e-8b82-81d3229707d1", + "metadata": {}, + "source": [ + "```\n", + "name object\n", + "gender object\n", + "age int64\n", + "origin object\n", + "specialty object\n", + "house object\n", + "blood_status object\n", + "pet object\n", + "wand_type object\n", + "patronus object\n", + "quidditch_position object\n", + "boggart object\n", + "favorite_class object\n", + "house_points float64\n", + "dtype: object\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "ad0e5ffc-cf1d-494d-be40-6c4a18ab41a4", + "metadata": {}, + "source": [ + "_Wow_, would you look at that, we've just discovered a lot of data types inconsistencies within the dataset. The data types had told us whether each column contains numerical values, text, or other forms of data. For instance, Age should be a `numerical type`, while `Name` and `House` are `text (or string)` types. Ensuring these types are correct is crucial for our subsequent analyses and visualizations." + ] + }, + { + "cell_type": "markdown", + "id": "7d0acfb5-63e5-4f55-ab71-cdaeb4373443", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "5fd0ec0f-6ea3-4e76-8908-6a96f0ea070f", + "metadata": {}, + "source": [ + "### **4A.4 Incorrect Data Type**" + ] + }, + { + "cell_type": "markdown", + "id": "71cfaade-7760-420b-8a79-e44fe176cb31", + "metadata": {}, + "source": [ + "Occasionally, we may find discrepancies in the data types, much like finding a rogue ingredient in a potion. Correcting these mismatches is essential to ensure the accuracy of our spells (or analyses). So let's just spin our wands (should I say Jupyter Lab), and try to fix them this time." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "82ff3822-150a-4536-9876-e9f5291564fc", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Index(['name', 'gender', 'age', 'origin', 'specialty', 'house', 'blood_status',\n", + " 'pet', 'wand_type', 'patronus', 'quidditch_position', 'boggart',\n", + " 'favorite_class', 'house_points'],\n", + " dtype='object')\n" + ] + } + ], + "source": [ + "# Converting data types if necessary\n", + "# First, let's check the columns again to identify the correct names\n", + "print(hogwarts_df.columns)" + ] + }, + { + "cell_type": "markdown", + "id": "99d46b29-34c6-4f26-9bf6-58046beb0bfc", + "metadata": {}, + "source": [ + "Among one of the requirements to perform the magical data sorcery tasks, is that you need to have a clean dataset that is by its naming convention is easy to follow and easy to work with at the same time. Now, let's try to change the data types according to its nature, by means to have an easier dataset to navigate with according to out next enchanted upcoming magical spells." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "be7658f1-0415-4951-ba34-36df08136656", + "metadata": {}, + "outputs": [], + "source": [ + "# Assuming we identified 'age' as the correct column name for age\n", + "hogwarts_df['age'] = pd.to_numeric(hogwarts_df['age'], errors='coerce') # Ensure Age is numeric\n", + "\n", + "# Ensuring 'gender' is categorical\n", + "hogwarts_df['gender'] = hogwarts_df['gender'].astype('category') # Ensure Gender is categorical\n", + "\n", + "# Ensuring 'specialty' is categorical\n", + "hogwarts_df['specialty'] = hogwarts_df['specialty'].astype('category') # Ensure specialty is categorical\n", + "\n", + "# Ensuring 'house' is categorical\n", + "hogwarts_df['house'] = hogwarts_df['house'].astype('category') # Ensure house is categorical\n", + "\n", + "# Ensuring 'blood_status' is categorical\n", + "hogwarts_df['blood_status'] = hogwarts_df['blood_status'].astype('category') # Ensure blood_status is categorical\n", + "\n", + "# Ensuring 'pet' is categorical\n", + "hogwarts_df['pet'] = hogwarts_df['pet'].astype('category') # Ensure pet is categorical\n", + "\n", + "# Ensuring 'wand_type' is categorical\n", + "hogwarts_df['wand_type'] = hogwarts_df['wand_type'].astype('category') # Ensure wand_type is categorical\n", + "\n", + "# Ensuring 'quidditch_position' is categorical\n", + "hogwarts_df['quidditch_position'] = hogwarts_df['quidditch_position'].astype('category') # Ensure quidditch_position is categorical\n", + "\n", + "# Ensuring 'favorite_class' is categorical\n", + "hogwarts_df['favorite_class'] = hogwarts_df['favorite_class'].astype('category') # Ensure favorite_class is categorical" + ] + }, + { + "cell_type": "markdown", + "id": "eb267f93-3737-4ac8-be45-a7bab5b1389c", + "metadata": {}, + "source": [ + "By casting these spells, we ensure that each column is of the appropriate type, ready for further exploration and manipulation. This step is much like Snape meticulously adjusting the ingredients of a complex potion to achieve the perfect brew. Now, once we've done the previous spell, the Jupyter would yield us the following updated results.\n", + "\n", + "Now let's verify the previous spell has done it magical course towards our dataset by invoking the following spell again." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "64a9433f-c3a6-4d20-b953-3495596c9a0a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "name object\n", + "gender category\n", + "age int64\n", + "origin object\n", + "specialty category\n", + "house category\n", + "blood_status category\n", + "pet category\n", + "wand_type category\n", + "patronus object\n", + "quidditch_position category\n", + "boggart object\n", + "favorite_class category\n", + "house_points float64\n", + "dtype: object\n" + ] + } + ], + "source": [ + "# Verify the data types after conversion\n", + "print(hogwarts_df.dtypes)" + ] + }, + { + "cell_type": "markdown", + "id": "08c6180b-7c4f-41e6-8f44-4bba847ee3d7", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "7fce1d76-726d-43bc-9190-9bf2e57d1dd8", + "metadata": {}, + "source": [ + "### **4A.5 Spells and Charms to Convert Data Types**" + ] + }, + { + "cell_type": "markdown", + "id": "68fa51a1-3a2d-4321-a000-9df7f0ce7010", + "metadata": {}, + "source": [ + "In case you dear sorcerers are wondering how many data types pandas is capable of supporting, following are all the list of them and ways to manipulate them in orders." + ] + }, + { + "cell_type": "markdown", + "id": "b16321bb-9662-4b25-ad2a-475153c691eb", + "metadata": {}, + "source": [ + "| Data Type | Description | Example Values | Conversion Method |\n", + "|-----------|-------------|----------------|--------------------|\n", + "| **int64** | Integer values | 1, 2, 3, -5, 0 | `pd.to_numeric(df['column'])` |\n", + "| **float64** | Floating point numbers | 1.0, 2.5, -3.4, 0.0 | `pd.to_numeric(df['column'])` |\n", + "| **bool** | Boolean values | True, False | `df['column'].astype('bool')` |\n", + "| **object** | String values | 'apple', 'banana', '123' | `df['column'].astype('str')` |\n", + "| **datetime64[ns]** | Date and time values | '2024-07-17', '2023-01-01 12:00' | `pd.to_datetime(df['column'])` |\n", + "| **timedelta[ns]** | Differences between datetimes | '1 days 00:00:00', '2 days 03:04:05' | `pd.to_timedelta(df['column'])` |\n", + "| **category** | Categorical data | 'A', 'B', 'C' | `df['column'].astype('category')` |\n" + ] + }, + { + "cell_type": "markdown", + "id": "dd9fdfaf-f2cf-4d4e-a5ec-681f615b7756", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "84135da3-5bc2-47bd-bfb3-54f7e6979868", + "metadata": {}, + "source": [ + "### **4A.6 Reinvestigate The Data Type in The Dataset**" + ] + }, + { + "cell_type": "markdown", + "id": "2e050f58-03f4-4117-bcd8-36f718e1ef7a", + "metadata": {}, + "source": [ + "Having ensured the correctness of our data types, it's time to take a more comprehensive look at our dataset. This step is akin to casting a revealing charm over a hidden room, allowing us to see everything at once." + ] + }, + { + "cell_type": "markdown", + "id": "d905275b-c727-4327-bbd4-2204e49cc96b", + "metadata": {}, + "source": [ + "By previewing the whole dataset, we gain a holistic view of its structure, contents, and summary statistics. This comprehensive overview helps us identify any remaining inconsistencies or areas that require further attention, much like a careful sweep of the castle grounds to ensure everything is in order, as the following results." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "395a08a3-fb49-4d56-85f8-938a9837f94a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 52 entries, 0 to 51\n", + "Data columns (total 14 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 52 non-null object \n", + " 1 gender 52 non-null category\n", + " 2 age 52 non-null int64 \n", + " 3 origin 52 non-null object \n", + " 4 specialty 52 non-null category\n", + " 5 house 52 non-null category\n", + " 6 blood_status 52 non-null category\n", + " 7 pet 27 non-null category\n", + " 8 wand_type 52 non-null category\n", + " 9 patronus 50 non-null object \n", + " 10 quidditch_position 10 non-null category\n", + " 11 boggart 52 non-null object \n", + " 12 favorite_class 51 non-null category\n", + " 13 house_points 50 non-null float64 \n", + "dtypes: category(8), float64(1), int64(1), object(4)\n", + "memory usage: 6.8+ KB\n", + "None\n" + ] + } + ], + "source": [ + "# Displaying a summary of the entire data types\n", + "print(hogwarts_df.info())" + ] + }, + { + "cell_type": "markdown", + "id": "a33551cd-3280-4d5a-b571-6bf347844daf", + "metadata": {}, + "source": [ + "### **4A.7 Detailed Summary of Dataset**" + ] + }, + { + "cell_type": "markdown", + "id": "6e9b92d3-29db-43a0-a143-e973bc782f22", + "metadata": {}, + "source": [ + "And here's the interesting part, how one sorcerers may see thing from a high level overview, while this time the spell would give us the following information about the dataset. It's a bit statistical for sure, but fear not dear sorcerers, as you scroll forward, you'll notice couple of other stunning facts around Hogwarts students." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "48a5ec2a-7863-4063-bf9d-23c62a05f4c6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " name gender age origin specialty house \\\n", + "count 52 52 52.000000 52 52 52 \n", + "unique 52 2 NaN 9 24 6 \n", + "top Harry Potter Male NaN England Charms Gryffindor \n", + "freq 1 27 NaN 35 7 18 \n", + "mean NaN NaN 14.942308 NaN NaN NaN \n", + "std NaN NaN 2.492447 NaN NaN NaN \n", + "min NaN NaN 11.000000 NaN NaN NaN \n", + "25% NaN NaN 13.250000 NaN NaN NaN \n", + "50% NaN NaN 16.000000 NaN NaN NaN \n", + "75% NaN NaN 17.000000 NaN NaN NaN \n", + "max NaN NaN 18.000000 NaN NaN NaN \n", + "\n", + " blood_status pet wand_type patronus quidditch_position boggart \\\n", + "count 52 27 52 50 10 52 \n", + "unique 4 9 28 15 5 11 \n", + "top Half-blood Owl Fir Non-corporeal Seeker Failure \n", + "freq 25 11 4 34 5 40 \n", + "mean NaN NaN NaN NaN NaN NaN \n", + "std NaN NaN NaN NaN NaN NaN \n", + "min NaN NaN NaN NaN NaN NaN \n", + "25% NaN NaN NaN NaN NaN NaN \n", + "50% NaN NaN NaN NaN NaN NaN \n", + "75% NaN NaN NaN NaN NaN NaN \n", + "max NaN NaN NaN NaN NaN NaN \n", + "\n", + " favorite_class house_points \n", + "count 51 50.000000 \n", + "unique 21 NaN \n", + "top Charms NaN \n", + "freq 8 NaN \n", + "mean NaN 119.200000 \n", + "std NaN 54.129097 \n", + "min NaN 10.000000 \n", + "25% NaN 72.500000 \n", + "50% NaN 125.000000 \n", + "75% NaN 160.000000 \n", + "max NaN 200.000000 \n" + ] + } + ], + "source": [ + "print(hogwarts_df.describe(include='all')) # Providing a detailed summary of the dataset" + ] + }, + { + "cell_type": "markdown", + "id": "c31ccb10-a332-4d83-9fa1-2d0386c023ea", + "metadata": {}, + "source": [ + "From the summary, we can infer several interesting points:\n", + "\n", + "- **House Distribution**: _Gryffindor_ has the highest count with 18 students, showing its prominence.\n", + "- **Age**: The average age of students is around **14.94** years, with the youngest being 11 and the oldest 18.\n", + "- **Gender**: The dataset includes **27 males** and **25 females**, showing a fairly balanced gender distribution.\n", + "- **Blood Status**: _Half-bloods_ are the most common, with 25 occurrences, indicating a diverse student body.\n", + "- **Wands and Pets**: There are 28 unique wand types and 9 different pet types, reflecting the unique personalities and preferences of the students.\n", + "- **Quidditch**: Only a few students play Quidditch, with **Seeker** being the most common position.\n", + "- **Favorite Class**: Charms is the most favored class among students, with 8 mentions.\n", + "- **House Points**: The average house points are 119.2, with a standard deviation of 54.13, indicating a wide range of performance." + ] + }, + { + "cell_type": "markdown", + "id": "3ce92d49-04cb-4686-b2d9-9058eab44b97", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "e1130888-301c-4e93-a81f-b999f690c852", + "metadata": {}, + "source": [ + "### **4A.8 Preview the whole Dataset**" + ] + }, + { + "cell_type": "markdown", + "id": "2b6b61ea-a9bb-4b24-8250-fb341b6ee173", + "metadata": {}, + "source": [ + "For the curious mind that their thoughts flew as fast as their broomstick, here's the magic spell to display the whole values within the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "b597bf14-6321-48e3-96c7-6cdbacdc6ee0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " name gender age origin specialty house blood_status pet wand_type patronus quidditch_position boggart favorite_class house_points\n", + "0 Harry Potter Male 11 England Defense Against the Dark Arts Gryffindor Half-blood Owl Holly Stag Seeker Dementor Defense Against the Dark Arts 150.0\n", + "1 Hermione Granger Female 11 England Transfiguration Gryffindor Muggle-born Cat Vine Otter NaN Failure Arithmancy 200.0\n", + "2 Ron Weasley Male 11 England Chess Gryffindor Pure-blood Rat Ash Jack Russell Terrier Keeper Spider Charms 50.0\n", + "3 Draco Malfoy Male 11 England Potions Slytherin Pure-blood Owl Hawthorn NaN Seeker Lord Voldemort Potions 100.0\n", + "4 Luna Lovegood Female 11 Ireland Creatures Ravenclaw Half-blood NaN Fir Hare NaN Her mother Creatures 120.0\n", + "5 Neville Longbottom Male 11 England Herbology Gryffindor Pure-blood Toad Cherry Non-corporeal NaN Severus Snape Herbology 70.0\n", + "6 Ginny Weasley Female 11 England Defense Against the Dark Arts Gryffindor Pure-blood Owl Yew Horse Chaser Tom Riddle Defense Against the Dark Arts 140.0\n", + "7 Cedric Diggory Male 15 England Quidditch Hufflepuff Pure-blood NaN Ash Non-corporeal Seeker Failure Defense Against the Dark Arts 160.0\n", + "8 Cho Chang Female 14 Scotland Charms Ravenclaw Half-blood Owl Hazel Swan Seeker Failure Charms 110.0\n", + "9 Severus Snape Male 16 England Potions Slytherin Half-blood NaN Elm Doe NaN Lily Potter Potions 90.0\n", + "10 Albus Dumbledore Male 17 England Transfiguration Gryffindor Half-blood Phoenix Elder Phoenix NaN Ariana's death Transfiguration 200.0\n", + "11 Minerva McGonagall Female 16 Scotland Transfiguration Gryffindor Half-blood Cat Fir Cat NaN Failure Transfiguration 190.0\n", + "12 Bellatrix Lestrange Female 15 England Dark Arts Slytherin Pure-blood NaN Walnut NaN Azkaban Dueling 80 NaN\n", + "13 Nymphadora Tonks Female 14 Wales Metamorphmagus Hufflepuff Half-blood Owl Blackthorn Wolf NaN Failure Defense Against the Dark Arts 130.0\n", + "14 Remus Lupin Male 16 England Defense Against the Dark Arts Gryffindor Half-blood Dog Cypress Non-corporeal NaN Full Moon Defense Against the Dark Arts 150.0\n", + "15 Sirius Black Male 16 England Transfiguration Gryffindor Pure-blood Owl Chestnut Dog Beater Full Moon Defense Against the Dark Arts 140.0\n", + "16 Horace Slughorn Male 16 England Potions Slytherin Half-blood NaN Cedar Non-corporeal NaN Failure Potions 100.0\n", + "17 Filius Flitwick Male 17 England Charms Ravenclaw Half-blood NaN Hornbeam Non-corporeal NaN Failure Charms 180.0\n", + "18 Pomona Sprout Female 16 England Herbology Hufflepuff Pure-blood Cat Pine Non-corporeal NaN Failure Herbology 170.0\n", + "19 Helena Ravenclaw Female 17 Scotland Charms Ravenclaw Pure-blood NaN Rowan Non-corporeal NaN Her mother Charms 160.0\n", + "20 Godric Gryffindor Male 17 England Dueling Gryffindor Pure-blood NaN Sword Lion NaN Failure Dueling 200.0\n", + "21 Helga Hufflepuff Female 17 Wales Herbology Hufflepuff Pure-blood NaN Cedar Non-corporeal NaN Failure Herbology 190.0\n", + "22 Rowena Ravenclaw Female 17 Scotland Charms Ravenclaw Pure-blood NaN Maple Eagle NaN Failure Charms 180.0\n", + "23 Salazar Slytherin Male 17 England Dark Arts Slytherin Pure-blood NaN Ebony Serpent NaN Failure Dark Arts 200.0\n", + "24 Molly Weasley Female 16 England Household Charms Gryffindor Pure-blood Owl Pine Non-corporeal NaN Failure Household Charms 80.0\n", + "25 Arthur Weasley Male 16 England Muggle Artifacts Gryffindor Pure-blood NaN Hornbeam Non-corporeal NaN Failure Muggle Studies 60.0\n", + "26 Lucius Malfoy Male 16 England Dark Arts Slytherin Pure-blood Owl Elm Non-corporeal NaN Failure Dark Arts 90.0\n", + "27 Narcissa Malfoy Female 15 England Potions Slytherin Pure-blood NaN Hawthorn Non-corporeal NaN Failure Potions 70.0\n", + "28 Pansy Parkinson Female 11 England Gossip Slytherin Pure-blood Cat Birch Non-corporeal NaN Failure Gossip 40.0\n", + "29 Vincent Crabbe Male 11 England Strength Slytherin Pure-blood NaN Oak Non-corporeal NaN Failure Strength 50.0\n", + "30 Gregory Goyle Male 11 England Strength Slytherin Pure-blood NaN Alder Non-corporeal NaN Failure Strength 50.0\n", + "31 Lily Evans Female 11 England Charms Gryffindor Muggle-born NaN Willow Doe NaN Failure Charms 150.0\n", + "32 James Potter Male 11 England Dueling Gryffindor Pure-blood Owl Walnut Stag Chaser Failure Dueling 160.0\n", + "33 Peter Pettigrew Male 11 England Transformation Gryffindor Half-blood Rat Ash Non-corporeal NaN Failure Transformation 30.0\n", + "34 Gilderoy Lockhart Male 15 England Memory Charms Ravenclaw Half-blood NaN Cherry Non-corporeal NaN Failure Memory Charms 70.0\n", + "35 Dolores Umbridge Female 15 England Dark Arts Slytherin Half-blood Cat Hemlock Non-corporeal NaN Failure Dark Arts 60.0\n", + "36 Newt Scamander Male 17 England Magical Creatures Hufflepuff Half-blood Demiguise Chestnut Non-corporeal NaN Failure Creatures 160.0\n", + "37 Tina Goldstein Female 17 USA Auror Hufflepuff Half-blood Owl Ash Non-corporeal NaN Failure Defense Against the Dark Arts 140.0\n", + "38 Queenie Goldstein Female 17 USA Legilimency Ravenclaw Half-blood Owl Cypress Non-corporeal NaN Failure Legilimency 130.0\n", + "39 Jacob Kowalski Male 17 USA Baking Hufflepuff No-mag NaN Birch Non-corporeal NaN Failure Baking 10.0\n", + "40 Theseus Scamander Male 17 England Auror Gryffindor Half-blood Dog Elder Non-corporeal NaN Failure Defense Against the Dark Arts 150.0\n", + "41 Leta Lestrange Female 16 England Potions Slytherin Pure-blood Cat Ebony Non-corporeal NaN Failure Potions 100.0\n", + "42 Nagini Female 18 Indonesia Transformation Slytherin Half-blood Snake Teak Non-corporeal NaN Failure Transformation 90.0\n", + "43 Grindelwald Male 18 Europe Dark Arts Slytherin Pure-blood NaN Elder Non-corporeal NaN Failure Dark Arts 200.0\n", + "44 Bathilda Bagshot Female 17 England History of Magic Ravenclaw Half-blood Cat Willow Non-corporeal NaN Failure NaN NaN\n", + "45 Aberforth Dumbledore Male 17 England Goat Charming Gryffindor Half-blood Goat Oak Non-corporeal NaN Failure Goat Charming 70.0\n", + "46 Ariana Dumbledore Female 14 England Obscurus Gryffindor Half-blood NaN Fir Non-corporeal NaN Failure Obscurus 20.0\n", + "47 Victor Krum Male 17 Bulgaria Quidditch Durmstrang Pure-blood NaN Hawthorn Non-corporeal Seeker Failure Quidditch 180.0\n", + "48 Fleur Delacour Female 17 France Charms Beauxbatons Half-blood NaN Rosewood Non-corporeal NaN Failure Charms 140.0\n", + "49 Gabrielle Delacour Female 14 France Charms Beauxbatons Half-blood NaN Alder Non-corporeal NaN Failure Charms 80.0\n", + "50 Olympe Maxime Female 17 France Strength Beauxbatons Half-blood NaN Fir Non-corporeal NaN Failure Strength 110.0\n", + "51 Igor Karkaroff Male 18 Europe Dark Arts Durmstrang Half-blood NaN Yew Non-corporeal NaN Failure Dark Arts 90.0\n" + ] + } + ], + "source": [ + "# Displaying a summary of the entire dataset\n", + "print(hogwarts_df.to_string())" + ] + }, + { + "cell_type": "markdown", + "id": "e1306469-b93f-44a0-85de-48d246700eab", + "metadata": {}, + "source": [ + "Once we've manipulated the data types from the dataset, it's time to save the existing dataset, so that it'd be ready for our next set of adventures." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "c702a322-f764-4c2a-a390-9dce31229928", + "metadata": {}, + "outputs": [], + "source": [ + "hogwarts_df.to_csv('data/hogwarts-students-01.csv')" + ] + }, + { + "cell_type": "markdown", + "id": "fcd50386-eb52-4bfa-893f-cd1b324e9719", + "metadata": {}, + "source": [ + "Now dear sorcerers, once you've invoked the previous spell, you may check within your `data` directory that there's a new CSV file already with the name of `hogwarts-students-01.csv` which will be utilizing through out the next of our magical journey onward." + ] + }, + { + "cell_type": "markdown", + "id": "9c54eb86-582b-43ea-aa35-ed027961c67d", + "metadata": {}, + "source": [ + "### **4A.9 Gemika's Pop-Up Quiz: Unveiling the Mysteries**" + ] + }, + { + "cell_type": "markdown", + "id": "046f8112-fbb5-45e9-9a23-644e321af449", + "metadata": {}, + "source": [ + "And now, young wizards and witches, my son Gemika Haziq Nugroho appears with a sparkle in his eye and a quiz at the ready. Are you prepared to test your newfound knowledge and prove your prowess in data exploration?\n", + "\n", + "1. **What function do we use to display the first few rows of a DataFrame**?\n", + "2. **Why is it important to check the data types of each column in our dataset**?\n", + "3. **How can we convert a column to a numeric type if it's not already**?\n", + "\n", + "Answer these questions with confidence, and you will demonstrate your mastery of the initial steps in data exploration. With our dataset now fully understood and prepared, we are ready to dive even deeper into its mysteries. Onward, to greater discoveries! 🌟✨🧙‍♂️\n", + "\n", + "By now, you should feel like a true data wizard, ready to uncover the hidden patterns and secrets within any dataset. Let us continue our journey with confidence and curiosity, for there is much more to discover in the magical world of data science! 🌌🔍" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2bb28730-c64b-4f95-bb95-0fbe8e2b7d9f", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/.ipynb_checkpoints/harry-potter-04b-checkpoint.ipynb b/.ipynb_checkpoints/harry-potter-04b-checkpoint.ipynb new file mode 100644 index 0000000..363fcab --- /dev/null +++ b/.ipynb_checkpoints/harry-potter-04b-checkpoint.ipynb @@ -0,0 +1,6 @@ +{ + "cells": [], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/harry-potter-04a.ipynb b/harry-potter-04a.ipynb new file mode 100644 index 0000000..2080e43 --- /dev/null +++ b/harry-potter-04a.ipynb @@ -0,0 +1,811 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "841c0957-15dd-47c6-b886-3a621d8960e2", + "metadata": {}, + "source": [ + "# The Gemika's Magical Guide to Sorting Hogwarts Students using the Decision Tree Algorithm (Part #4A)\n", + "\n", + "![machine-learning-03.jpg](images/machine-learning-28.jpg)" + ] + }, + { + "cell_type": "markdown", + "id": "adecb9d3-c42f-4ebf-933f-7fa0c9f03cfc", + "metadata": {}, + "source": [ + "## 4A. Unveiling the Mysteries: Data Exploration (EDA) 🔍" + ] + }, + { + "cell_type": "markdown", + "id": "8d13604a-37c9-4864-9852-2250abca3aba", + "metadata": {}, + "source": [ + "Our first step is to take a glimpse at the first few rows of our dataset, much like opening the [Marauder's Map](https://harrypotter.fandom.com/wiki/Marauder%27s_Map) for the first time. This will give us an initial understanding of the structure and contents of our data. And dear sorcerers, if you wished to follow along on this magical journey, make a fork of the dataset from my Github account and download the dataset to your local machine from this [magical address](https://github.com/leonism/the-gemikas-magical-guide-to-sorting-hogwarts-students-using-the-decision-tree-algorithm)." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "57263cab-ffd4-4220-85b6-8256c9430ebd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " name gender age origin specialty \\\n", + "0 Harry Potter Male 11 England Defense Against the Dark Arts \n", + "1 Hermione Granger Female 11 England Transfiguration \n", + "2 Ron Weasley Male 11 England Chess \n", + "3 Draco Malfoy Male 11 England Potions \n", + "4 Luna Lovegood Female 11 Ireland Creatures \n", + "\n", + " house blood_status pet wand_type patronus \\\n", + "0 Gryffindor Half-blood Owl Holly Stag \n", + "1 Gryffindor Muggle-born Cat Vine Otter \n", + "2 Gryffindor Pure-blood Rat Ash Jack Russell Terrier \n", + "3 Slytherin Pure-blood Owl Hawthorn NaN \n", + "4 Ravenclaw Half-blood NaN Fir Hare \n", + "\n", + " quidditch_position boggart favorite_class \\\n", + "0 Seeker Dementor Defense Against the Dark Arts \n", + "1 NaN Failure Arithmancy \n", + "2 Keeper Spider Charms \n", + "3 Seeker Lord Voldemort Potions \n", + "4 NaN Her mother Creatures \n", + "\n", + " house_points \n", + "0 150.0 \n", + "1 200.0 \n", + "2 50.0 \n", + "3 100.0 \n", + "4 120.0 \n" + ] + } + ], + "source": [ + "# Inspecting the first few rows of the dataset\n", + "dataset_path = 'data/hogwarts-students.csv' # Path to our dataset\n", + "hogwarts_df = pd.read_csv(dataset_path)\n", + "print(hogwarts_df.head())" + ] + }, + { + "cell_type": "markdown", + "id": "ef99dd67-61fc-492d-a76a-15d2331231fe", + "metadata": {}, + "source": [ + "As we peer into these rows, we see a variety of features such as **student names**, **house affiliations**, and **various traits**. Each **row is a story**, each **column a chapter**. We might notice, for example, that **Harry, Hermione, and Ron** are all in **_Gryffindor_**, characterized by their bravery and determination. This initial inspection helps us understand the scope and scale of our dataset." + ] + }, + { + "cell_type": "markdown", + "id": "62d15617-d3e1-44ed-bf86-306496b8efa5", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "903cf4e8-6ea5-4739-9210-2acc83acfa5a", + "metadata": {}, + "source": [ + "### **4A.2 Checking Dataset Features**" + ] + }, + { + "cell_type": "markdown", + "id": "5562cdb9-db13-43d5-8582-b361ece6e983", + "metadata": {}, + "source": [ + "Next, we delve deeper into the columns of our DataFrame, much like how Hermione would meticulously study her textbooks. Each column represents a different feature of our students, from their house to their magical abilities.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f20408f5-2446-442c-ae99-dbee06e14636", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Index(['name', 'gender', 'age', 'origin', 'specialty', 'house', 'blood_status',\n", + " 'pet', 'wand_type', 'patronus', 'quidditch_position', 'boggart',\n", + " 'favorite_class', 'house_points'],\n", + " dtype='object')\n" + ] + } + ], + "source": [ + "# Displaying the columns of the dataset\n", + "print(hogwarts_df.columns)\n" + ] + }, + { + "cell_type": "markdown", + "id": "73dee7e5-64f6-4d32-bb57-19884fe896d8", + "metadata": {}, + "source": [ + "As the magic spell finished its wizardry, the previous magical spell reveal the following **hidden artifacts**." + ] + }, + { + "cell_type": "markdown", + "id": "42290e5a-6013-4b6c-859e-8dda5aef026c", + "metadata": {}, + "source": [ + "```\n", + "Index(['name', 'gender', 'age', 'origin', 'specialty', 'house', 'blood_status',\n", + " 'pet', 'wand_type', 'patronus', 'quidditch_position', 'boggart',\n", + " 'favorite_class', 'house_points'],\n", + " dtype='object')\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "b1a72bb3-6691-4edf-9cfe-80e0e4e02dee", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(52, 14)\n" + ] + } + ], + "source": [ + "# Displaying the how many rows and columns in the dataset\n", + "print(hogwarts_df.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "1233f472-dc69-4ddf-8458-7410b6db1ff8", + "metadata": {}, + "source": [ + "And you're guessing correctly sorcerers, the dataset consists of 52 rows and 14 columns.✨🌟" + ] + }, + { + "cell_type": "markdown", + "id": "d23e0b4c-ca27-4b0b-b577-59f2871b5263", + "metadata": {}, + "source": [ + "Let us explore these features, each as significant as a spell component in a well-crafted incantation:\n", + "\n", + "- **Name**: The given name of our witch or wizard, from the illustrious Harry Potter to the enigmatic Luna Lovegood. 🌟\n", + "- **Gender**: Whether they are a young wizard or witch, reflecting the diversity of Hogwarts.\n", + "- **Age**: Their age at the time of sorting, for even the youngest students have their place in the castle's storied history.\n", + "- **Origin**: The place they hail from, be it the rolling hills of England, the rugged highlands of Scotland, or the enchanting isles of Ireland. 🏞️\n", + "- **Specialty**: Their area of magical expertise, such as Potions, Transfiguration, or Defense Against the Dark Arts, much like Professor Snape’s mastery of the subtle art of potion-making.\n", + "- **House**: The revered house to which they belong—Gryffindor, Hufflepuff, Ravenclaw, or Slytherin—each with its own rich traditions and values.\n", + "- **Blood Status**: Whether they are Pure-blood, Half-blood, or Muggle-born, a detail that, while significant in the wizarding world, never diminishes their magical potential.\n", + "- **Pet**: Their chosen magical companion, be it an owl, a cat, or a toad, reminiscent of Harry's loyal Hedwig or Hermione's clever Crookshanks. 🦉🐈\n", + "- **Wand Type**: The wood and core of their wand, the very tool of their magical prowess.\n", + "- **Patronus**: The form their Patronus takes, a magical manifestation of their innermost self, like Harry's proud stag or Snape's ethereal doe. 🦌\n", + "- **Quidditch Position**: Their role in the beloved wizarding sport, whether Seeker, Chaser, Beater, or Keeper, or perhaps no position at all.\n", + "- **Boggart**: The form their Boggart takes, a glimpse into their deepest fears.\n", + "- **Favorite Class**: The subject they excel in or enjoy the most, akin to Hermione's love for Arithmancy or Neville's talent in Herbology.\n", + "- **House Points**: Points they have contributed to their house, reflecting their achievements and misadventures alike.\n", + "\n", + "With this compendium of magical features, we craft our dataset with the precision of a spell-wright composing a new enchantment. Each character's details are meticulously recorded, ensuring that our data is as rich and detailed as the tapestry of Hogwarts itself.🧙‍♂️🏰\n", + "\n", + "By examining these features, we gain a deeper understanding of the dataset's richness, much like a wizard learning about the different properties of magical creatures. As we assemble this treasure trove of information, we prepare ourselves for the next step in our magical journey—transforming these attributes into the foundations upon which our **Decision Tree** algorithm will cast its spell. Let us proceed, dear sorcerers, for the magic is only just beginning.✨🧙‍♂️ \n" + ] + }, + { + "cell_type": "markdown", + "id": "0e0fce39-4e08-42e3-b06c-3b3fdf12a3f9", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "f1029ad6-d57e-405c-8463-4384f2552d92", + "metadata": {}, + "source": [ + "### **4A.3 Inspecting Data Types**" + ] + }, + { + "cell_type": "markdown", + "id": "894243b9-0a21-4bd0-8977-79e807e3ea7e", + "metadata": {}, + "source": [ + "With a clear understanding of our features, we now turn our attention to the data types. This step is akin to examining the ingredients of a potion, ensuring each component is appropriate for its intended use." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "0705d956-ad45-43ba-9834-6b3e5ec2927f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "name object\n", + "gender object\n", + "age int64\n", + "origin object\n", + "specialty object\n", + "house object\n", + "blood_status object\n", + "pet object\n", + "wand_type object\n", + "patronus object\n", + "quidditch_position object\n", + "boggart object\n", + "favorite_class object\n", + "house_points float64\n", + "dtype: object\n" + ] + } + ], + "source": [ + "# Checking the data types of each column\n", + "print(hogwarts_df.dtypes)\n" + ] + }, + { + "cell_type": "markdown", + "id": "173547d1-6acc-43d8-bb21-a89bc06aa5c6", + "metadata": {}, + "source": [ + "And in return, the previous magic spell would yield us, dear sorcerers the following incarnations." + ] + }, + { + "cell_type": "markdown", + "id": "78459af8-f7bb-4d5e-8b82-81d3229707d1", + "metadata": {}, + "source": [ + "```\n", + "name object\n", + "gender object\n", + "age int64\n", + "origin object\n", + "specialty object\n", + "house object\n", + "blood_status object\n", + "pet object\n", + "wand_type object\n", + "patronus object\n", + "quidditch_position object\n", + "boggart object\n", + "favorite_class object\n", + "house_points float64\n", + "dtype: object\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "ad0e5ffc-cf1d-494d-be40-6c4a18ab41a4", + "metadata": {}, + "source": [ + "_Wow_, would you look at that, we've just discovered a lot of data types inconsistencies within the dataset. The data types had told us whether each column contains numerical values, text, or other forms of data. For instance, Age should be a `numerical type`, while `Name` and `House` are `text (or string)` types. Ensuring these types are correct is crucial for our subsequent analyses and visualizations." + ] + }, + { + "cell_type": "markdown", + "id": "7d0acfb5-63e5-4f55-ab71-cdaeb4373443", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "5fd0ec0f-6ea3-4e76-8908-6a96f0ea070f", + "metadata": {}, + "source": [ + "### **4A.4 Incorrect Data Type**" + ] + }, + { + "cell_type": "markdown", + "id": "71cfaade-7760-420b-8a79-e44fe176cb31", + "metadata": {}, + "source": [ + "Occasionally, we may find discrepancies in the data types, much like finding a rogue ingredient in a potion. Correcting these mismatches is essential to ensure the accuracy of our spells (or analyses). So let's just spin our wands (should I say Jupyter Lab), and try to fix them this time." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "82ff3822-150a-4536-9876-e9f5291564fc", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Index(['name', 'gender', 'age', 'origin', 'specialty', 'house', 'blood_status',\n", + " 'pet', 'wand_type', 'patronus', 'quidditch_position', 'boggart',\n", + " 'favorite_class', 'house_points'],\n", + " dtype='object')\n" + ] + } + ], + "source": [ + "# Converting data types if necessary\n", + "# First, let's check the columns again to identify the correct names\n", + "print(hogwarts_df.columns)" + ] + }, + { + "cell_type": "markdown", + "id": "99d46b29-34c6-4f26-9bf6-58046beb0bfc", + "metadata": {}, + "source": [ + "Among one of the requirements to perform the magical data sorcery tasks, is that you need to have a clean dataset that is by its naming convention is easy to follow and easy to work with at the same time. Now, let's try to change the data types according to its nature, by means to have an easier dataset to navigate with according to out next enchanted upcoming magical spells." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "be7658f1-0415-4951-ba34-36df08136656", + "metadata": {}, + "outputs": [], + "source": [ + "# Assuming we identified 'age' as the correct column name for age\n", + "hogwarts_df['age'] = pd.to_numeric(hogwarts_df['age'], errors='coerce') # Ensure Age is numeric\n", + "\n", + "# Ensuring 'gender' is categorical\n", + "hogwarts_df['gender'] = hogwarts_df['gender'].astype('category') # Ensure Gender is categorical\n", + "\n", + "# Ensuring 'specialty' is categorical\n", + "hogwarts_df['specialty'] = hogwarts_df['specialty'].astype('category') # Ensure specialty is categorical\n", + "\n", + "# Ensuring 'house' is categorical\n", + "hogwarts_df['house'] = hogwarts_df['house'].astype('category') # Ensure house is categorical\n", + "\n", + "# Ensuring 'blood_status' is categorical\n", + "hogwarts_df['blood_status'] = hogwarts_df['blood_status'].astype('category') # Ensure blood_status is categorical\n", + "\n", + "# Ensuring 'pet' is categorical\n", + "hogwarts_df['pet'] = hogwarts_df['pet'].astype('category') # Ensure pet is categorical\n", + "\n", + "# Ensuring 'wand_type' is categorical\n", + "hogwarts_df['wand_type'] = hogwarts_df['wand_type'].astype('category') # Ensure wand_type is categorical\n", + "\n", + "# Ensuring 'quidditch_position' is categorical\n", + "hogwarts_df['quidditch_position'] = hogwarts_df['quidditch_position'].astype('category') # Ensure quidditch_position is categorical\n", + "\n", + "# Ensuring 'favorite_class' is categorical\n", + "hogwarts_df['favorite_class'] = hogwarts_df['favorite_class'].astype('category') # Ensure favorite_class is categorical" + ] + }, + { + "cell_type": "markdown", + "id": "eb267f93-3737-4ac8-be45-a7bab5b1389c", + "metadata": {}, + "source": [ + "By casting these spells, we ensure that each column is of the appropriate type, ready for further exploration and manipulation. This step is much like Snape meticulously adjusting the ingredients of a complex potion to achieve the perfect brew. Now, once we've done the previous spell, the Jupyter would yield us the following updated results.\n", + "\n", + "Now let's verify the previous spell has done it magical course towards our dataset by invoking the following spell again." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "64a9433f-c3a6-4d20-b953-3495596c9a0a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "name object\n", + "gender category\n", + "age int64\n", + "origin object\n", + "specialty category\n", + "house category\n", + "blood_status category\n", + "pet category\n", + "wand_type category\n", + "patronus object\n", + "quidditch_position category\n", + "boggart object\n", + "favorite_class category\n", + "house_points float64\n", + "dtype: object\n" + ] + } + ], + "source": [ + "# Verify the data types after conversion\n", + "print(hogwarts_df.dtypes)" + ] + }, + { + "cell_type": "markdown", + "id": "08c6180b-7c4f-41e6-8f44-4bba847ee3d7", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "7fce1d76-726d-43bc-9190-9bf2e57d1dd8", + "metadata": {}, + "source": [ + "### **4A.5 Spells and Charms to Convert Data Types**" + ] + }, + { + "cell_type": "markdown", + "id": "68fa51a1-3a2d-4321-a000-9df7f0ce7010", + "metadata": {}, + "source": [ + "In case you dear sorcerers are wondering how many data types pandas is capable of supporting, following are all the list of them and ways to manipulate them in orders." + ] + }, + { + "cell_type": "markdown", + "id": "b16321bb-9662-4b25-ad2a-475153c691eb", + "metadata": {}, + "source": [ + "| Data Type | Description | Example Values | Conversion Method |\n", + "|-----------|-------------|----------------|--------------------|\n", + "| **int64** | Integer values | 1, 2, 3, -5, 0 | `pd.to_numeric(df['column'])` |\n", + "| **float64** | Floating point numbers | 1.0, 2.5, -3.4, 0.0 | `pd.to_numeric(df['column'])` |\n", + "| **bool** | Boolean values | True, False | `df['column'].astype('bool')` |\n", + "| **object** | String values | 'apple', 'banana', '123' | `df['column'].astype('str')` |\n", + "| **datetime64[ns]** | Date and time values | '2024-07-17', '2023-01-01 12:00' | `pd.to_datetime(df['column'])` |\n", + "| **timedelta[ns]** | Differences between datetimes | '1 days 00:00:00', '2 days 03:04:05' | `pd.to_timedelta(df['column'])` |\n", + "| **category** | Categorical data | 'A', 'B', 'C' | `df['column'].astype('category')` |\n" + ] + }, + { + "cell_type": "markdown", + "id": "dd9fdfaf-f2cf-4d4e-a5ec-681f615b7756", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "84135da3-5bc2-47bd-bfb3-54f7e6979868", + "metadata": {}, + "source": [ + "### **4A.6 Reinvestigate The Data Type in The Dataset**" + ] + }, + { + "cell_type": "markdown", + "id": "2e050f58-03f4-4117-bcd8-36f718e1ef7a", + "metadata": {}, + "source": [ + "Having ensured the correctness of our data types, it's time to take a more comprehensive look at our dataset. This step is akin to casting a revealing charm over a hidden room, allowing us to see everything at once." + ] + }, + { + "cell_type": "markdown", + "id": "d905275b-c727-4327-bbd4-2204e49cc96b", + "metadata": {}, + "source": [ + "By previewing the whole dataset, we gain a holistic view of its structure, contents, and summary statistics. This comprehensive overview helps us identify any remaining inconsistencies or areas that require further attention, much like a careful sweep of the castle grounds to ensure everything is in order, as the following results." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "395a08a3-fb49-4d56-85f8-938a9837f94a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 52 entries, 0 to 51\n", + "Data columns (total 14 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 52 non-null object \n", + " 1 gender 52 non-null category\n", + " 2 age 52 non-null int64 \n", + " 3 origin 52 non-null object \n", + " 4 specialty 52 non-null category\n", + " 5 house 52 non-null category\n", + " 6 blood_status 52 non-null category\n", + " 7 pet 27 non-null category\n", + " 8 wand_type 52 non-null category\n", + " 9 patronus 50 non-null object \n", + " 10 quidditch_position 10 non-null category\n", + " 11 boggart 52 non-null object \n", + " 12 favorite_class 51 non-null category\n", + " 13 house_points 50 non-null float64 \n", + "dtypes: category(8), float64(1), int64(1), object(4)\n", + "memory usage: 6.8+ KB\n", + "None\n" + ] + } + ], + "source": [ + "# Displaying a summary of the entire data types\n", + "print(hogwarts_df.info())" + ] + }, + { + "cell_type": "markdown", + "id": "a33551cd-3280-4d5a-b571-6bf347844daf", + "metadata": {}, + "source": [ + "### **4A.7 Detailed Summary of Dataset**" + ] + }, + { + "cell_type": "markdown", + "id": "6e9b92d3-29db-43a0-a143-e973bc782f22", + "metadata": {}, + "source": [ + "And here's the interesting part, how one sorcerers may see thing from a high level overview, while this time the spell would give us the following information about the dataset. It's a bit statistical for sure, but fear not dear sorcerers, as you scroll forward, you'll notice couple of other stunning facts around Hogwarts students." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "48a5ec2a-7863-4063-bf9d-23c62a05f4c6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " name gender age origin specialty house \\\n", + "count 52 52 52.000000 52 52 52 \n", + "unique 52 2 NaN 9 24 6 \n", + "top Harry Potter Male NaN England Charms Gryffindor \n", + "freq 1 27 NaN 35 7 18 \n", + "mean NaN NaN 14.942308 NaN NaN NaN \n", + "std NaN NaN 2.492447 NaN NaN NaN \n", + "min NaN NaN 11.000000 NaN NaN NaN \n", + "25% NaN NaN 13.250000 NaN NaN NaN \n", + "50% NaN NaN 16.000000 NaN NaN NaN \n", + "75% NaN NaN 17.000000 NaN NaN NaN \n", + "max NaN NaN 18.000000 NaN NaN NaN \n", + "\n", + " blood_status pet wand_type patronus quidditch_position boggart \\\n", + "count 52 27 52 50 10 52 \n", + "unique 4 9 28 15 5 11 \n", + "top Half-blood Owl Fir Non-corporeal Seeker Failure \n", + "freq 25 11 4 34 5 40 \n", + "mean NaN NaN NaN NaN NaN NaN \n", + "std NaN NaN NaN NaN NaN NaN \n", + "min NaN NaN NaN NaN NaN NaN \n", + "25% NaN NaN NaN NaN NaN NaN \n", + "50% NaN NaN NaN NaN NaN NaN \n", + "75% NaN NaN NaN NaN NaN NaN \n", + "max NaN NaN NaN NaN NaN NaN \n", + "\n", + " favorite_class house_points \n", + "count 51 50.000000 \n", + "unique 21 NaN \n", + "top Charms NaN \n", + "freq 8 NaN \n", + "mean NaN 119.200000 \n", + "std NaN 54.129097 \n", + "min NaN 10.000000 \n", + "25% NaN 72.500000 \n", + "50% NaN 125.000000 \n", + "75% NaN 160.000000 \n", + "max NaN 200.000000 \n" + ] + } + ], + "source": [ + "print(hogwarts_df.describe(include='all')) # Providing a detailed summary of the dataset" + ] + }, + { + "cell_type": "markdown", + "id": "c31ccb10-a332-4d83-9fa1-2d0386c023ea", + "metadata": {}, + "source": [ + "From the summary, we can infer several interesting points:\n", + "\n", + "- **House Distribution**: _Gryffindor_ has the highest count with 18 students, showing its prominence.\n", + "- **Age**: The average age of students is around **14.94** years, with the youngest being 11 and the oldest 18.\n", + "- **Gender**: The dataset includes **27 males** and **25 females**, showing a fairly balanced gender distribution.\n", + "- **Blood Status**: _Half-bloods_ are the most common, with 25 occurrences, indicating a diverse student body.\n", + "- **Wands and Pets**: There are 28 unique wand types and 9 different pet types, reflecting the unique personalities and preferences of the students.\n", + "- **Quidditch**: Only a few students play Quidditch, with **Seeker** being the most common position.\n", + "- **Favorite Class**: Charms is the most favored class among students, with 8 mentions.\n", + "- **House Points**: The average house points are 119.2, with a standard deviation of 54.13, indicating a wide range of performance." + ] + }, + { + "cell_type": "markdown", + "id": "3ce92d49-04cb-4686-b2d9-9058eab44b97", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "e1130888-301c-4e93-a81f-b999f690c852", + "metadata": {}, + "source": [ + "### **4A.8 Preview the whole Dataset**" + ] + }, + { + "cell_type": "markdown", + "id": "2b6b61ea-a9bb-4b24-8250-fb341b6ee173", + "metadata": {}, + "source": [ + "For the curious mind that their thoughts flew as fast as their broomstick, here's the magic spell to display the whole values within the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "b597bf14-6321-48e3-96c7-6cdbacdc6ee0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " name gender age origin specialty house blood_status pet wand_type patronus quidditch_position boggart favorite_class house_points\n", + "0 Harry Potter Male 11 England Defense Against the Dark Arts Gryffindor Half-blood Owl Holly Stag Seeker Dementor Defense Against the Dark Arts 150.0\n", + "1 Hermione Granger Female 11 England Transfiguration Gryffindor Muggle-born Cat Vine Otter NaN Failure Arithmancy 200.0\n", + "2 Ron Weasley Male 11 England Chess Gryffindor Pure-blood Rat Ash Jack Russell Terrier Keeper Spider Charms 50.0\n", + "3 Draco Malfoy Male 11 England Potions Slytherin Pure-blood Owl Hawthorn NaN Seeker Lord Voldemort Potions 100.0\n", + "4 Luna Lovegood Female 11 Ireland Creatures Ravenclaw Half-blood NaN Fir Hare NaN Her mother Creatures 120.0\n", + "5 Neville Longbottom Male 11 England Herbology Gryffindor Pure-blood Toad Cherry Non-corporeal NaN Severus Snape Herbology 70.0\n", + "6 Ginny Weasley Female 11 England Defense Against the Dark Arts Gryffindor Pure-blood Owl Yew Horse Chaser Tom Riddle Defense Against the Dark Arts 140.0\n", + "7 Cedric Diggory Male 15 England Quidditch Hufflepuff Pure-blood NaN Ash Non-corporeal Seeker Failure Defense Against the Dark Arts 160.0\n", + "8 Cho Chang Female 14 Scotland Charms Ravenclaw Half-blood Owl Hazel Swan Seeker Failure Charms 110.0\n", + "9 Severus Snape Male 16 England Potions Slytherin Half-blood NaN Elm Doe NaN Lily Potter Potions 90.0\n", + "10 Albus Dumbledore Male 17 England Transfiguration Gryffindor Half-blood Phoenix Elder Phoenix NaN Ariana's death Transfiguration 200.0\n", + "11 Minerva McGonagall Female 16 Scotland Transfiguration Gryffindor Half-blood Cat Fir Cat NaN Failure Transfiguration 190.0\n", + "12 Bellatrix Lestrange Female 15 England Dark Arts Slytherin Pure-blood NaN Walnut NaN Azkaban Dueling 80 NaN\n", + "13 Nymphadora Tonks Female 14 Wales Metamorphmagus Hufflepuff Half-blood Owl Blackthorn Wolf NaN Failure Defense Against the Dark Arts 130.0\n", + "14 Remus Lupin Male 16 England Defense Against the Dark Arts Gryffindor Half-blood Dog Cypress Non-corporeal NaN Full Moon Defense Against the Dark Arts 150.0\n", + "15 Sirius Black Male 16 England Transfiguration Gryffindor Pure-blood Owl Chestnut Dog Beater Full Moon Defense Against the Dark Arts 140.0\n", + "16 Horace Slughorn Male 16 England Potions Slytherin Half-blood NaN Cedar Non-corporeal NaN Failure Potions 100.0\n", + "17 Filius Flitwick Male 17 England Charms Ravenclaw Half-blood NaN Hornbeam Non-corporeal NaN Failure Charms 180.0\n", + "18 Pomona Sprout Female 16 England Herbology Hufflepuff Pure-blood Cat Pine Non-corporeal NaN Failure Herbology 170.0\n", + "19 Helena Ravenclaw Female 17 Scotland Charms Ravenclaw Pure-blood NaN Rowan Non-corporeal NaN Her mother Charms 160.0\n", + "20 Godric Gryffindor Male 17 England Dueling Gryffindor Pure-blood NaN Sword Lion NaN Failure Dueling 200.0\n", + "21 Helga Hufflepuff Female 17 Wales Herbology Hufflepuff Pure-blood NaN Cedar Non-corporeal NaN Failure Herbology 190.0\n", + "22 Rowena Ravenclaw Female 17 Scotland Charms Ravenclaw Pure-blood NaN Maple Eagle NaN Failure Charms 180.0\n", + "23 Salazar Slytherin Male 17 England Dark Arts Slytherin Pure-blood NaN Ebony Serpent NaN Failure Dark Arts 200.0\n", + "24 Molly Weasley Female 16 England Household Charms Gryffindor Pure-blood Owl Pine Non-corporeal NaN Failure Household Charms 80.0\n", + "25 Arthur Weasley Male 16 England Muggle Artifacts Gryffindor Pure-blood NaN Hornbeam Non-corporeal NaN Failure Muggle Studies 60.0\n", + "26 Lucius Malfoy Male 16 England Dark Arts Slytherin Pure-blood Owl Elm Non-corporeal NaN Failure Dark Arts 90.0\n", + "27 Narcissa Malfoy Female 15 England Potions Slytherin Pure-blood NaN Hawthorn Non-corporeal NaN Failure Potions 70.0\n", + "28 Pansy Parkinson Female 11 England Gossip Slytherin Pure-blood Cat Birch Non-corporeal NaN Failure Gossip 40.0\n", + "29 Vincent Crabbe Male 11 England Strength Slytherin Pure-blood NaN Oak Non-corporeal NaN Failure Strength 50.0\n", + "30 Gregory Goyle Male 11 England Strength Slytherin Pure-blood NaN Alder Non-corporeal NaN Failure Strength 50.0\n", + "31 Lily Evans Female 11 England Charms Gryffindor Muggle-born NaN Willow Doe NaN Failure Charms 150.0\n", + "32 James Potter Male 11 England Dueling Gryffindor Pure-blood Owl Walnut Stag Chaser Failure Dueling 160.0\n", + "33 Peter Pettigrew Male 11 England Transformation Gryffindor Half-blood Rat Ash Non-corporeal NaN Failure Transformation 30.0\n", + "34 Gilderoy Lockhart Male 15 England Memory Charms Ravenclaw Half-blood NaN Cherry Non-corporeal NaN Failure Memory Charms 70.0\n", + "35 Dolores Umbridge Female 15 England Dark Arts Slytherin Half-blood Cat Hemlock Non-corporeal NaN Failure Dark Arts 60.0\n", + "36 Newt Scamander Male 17 England Magical Creatures Hufflepuff Half-blood Demiguise Chestnut Non-corporeal NaN Failure Creatures 160.0\n", + "37 Tina Goldstein Female 17 USA Auror Hufflepuff Half-blood Owl Ash Non-corporeal NaN Failure Defense Against the Dark Arts 140.0\n", + "38 Queenie Goldstein Female 17 USA Legilimency Ravenclaw Half-blood Owl Cypress Non-corporeal NaN Failure Legilimency 130.0\n", + "39 Jacob Kowalski Male 17 USA Baking Hufflepuff No-mag NaN Birch Non-corporeal NaN Failure Baking 10.0\n", + "40 Theseus Scamander Male 17 England Auror Gryffindor Half-blood Dog Elder Non-corporeal NaN Failure Defense Against the Dark Arts 150.0\n", + "41 Leta Lestrange Female 16 England Potions Slytherin Pure-blood Cat Ebony Non-corporeal NaN Failure Potions 100.0\n", + "42 Nagini Female 18 Indonesia Transformation Slytherin Half-blood Snake Teak Non-corporeal NaN Failure Transformation 90.0\n", + "43 Grindelwald Male 18 Europe Dark Arts Slytherin Pure-blood NaN Elder Non-corporeal NaN Failure Dark Arts 200.0\n", + "44 Bathilda Bagshot Female 17 England History of Magic Ravenclaw Half-blood Cat Willow Non-corporeal NaN Failure NaN NaN\n", + "45 Aberforth Dumbledore Male 17 England Goat Charming Gryffindor Half-blood Goat Oak Non-corporeal NaN Failure Goat Charming 70.0\n", + "46 Ariana Dumbledore Female 14 England Obscurus Gryffindor Half-blood NaN Fir Non-corporeal NaN Failure Obscurus 20.0\n", + "47 Victor Krum Male 17 Bulgaria Quidditch Durmstrang Pure-blood NaN Hawthorn Non-corporeal Seeker Failure Quidditch 180.0\n", + "48 Fleur Delacour Female 17 France Charms Beauxbatons Half-blood NaN Rosewood Non-corporeal NaN Failure Charms 140.0\n", + "49 Gabrielle Delacour Female 14 France Charms Beauxbatons Half-blood NaN Alder Non-corporeal NaN Failure Charms 80.0\n", + "50 Olympe Maxime Female 17 France Strength Beauxbatons Half-blood NaN Fir Non-corporeal NaN Failure Strength 110.0\n", + "51 Igor Karkaroff Male 18 Europe Dark Arts Durmstrang Half-blood NaN Yew Non-corporeal NaN Failure Dark Arts 90.0\n" + ] + } + ], + "source": [ + "# Displaying a summary of the entire dataset\n", + "print(hogwarts_df.to_string())" + ] + }, + { + "cell_type": "markdown", + "id": "e1306469-b93f-44a0-85de-48d246700eab", + "metadata": {}, + "source": [ + "Once we've manipulated the data types from the dataset, it's time to save the existing dataset, so that it'd be ready for our next set of adventures." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "c702a322-f764-4c2a-a390-9dce31229928", + "metadata": {}, + "outputs": [], + "source": [ + "hogwarts_df.to_csv('data/hogwarts-students-01.csv')" + ] + }, + { + "cell_type": "markdown", + "id": "fcd50386-eb52-4bfa-893f-cd1b324e9719", + "metadata": {}, + "source": [ + "Now dear sorcerers, once you've invoked the previous spell, you may check within your `data` directory that there's a new CSV file already with the name of `hogwarts-students-01.csv` which will be utilizing through out the next of our magical journey onward." + ] + }, + { + "cell_type": "markdown", + "id": "9c54eb86-582b-43ea-aa35-ed027961c67d", + "metadata": {}, + "source": [ + "### **4A.9 Gemika's Pop-Up Quiz: Unveiling the Mysteries**" + ] + }, + { + "cell_type": "markdown", + "id": "046f8112-fbb5-45e9-9a23-644e321af449", + "metadata": {}, + "source": [ + "And now, young wizards and witches, my son Gemika Haziq Nugroho appears with a sparkle in his eye and a quiz at the ready. Are you prepared to test your newfound knowledge and prove your prowess in data exploration?\n", + "\n", + "1. **What function do we use to display the first few rows of a DataFrame**?\n", + "2. **Why is it important to check the data types of each column in our dataset**?\n", + "3. **How can we convert a column to a numeric type if it's not already**?\n", + "\n", + "Answer these questions with confidence, and you will demonstrate your mastery of the initial steps in data exploration. With our dataset now fully understood and prepared, we are ready to dive even deeper into its mysteries. Onward, to greater discoveries! 🌟✨🧙‍♂️\n", + "\n", + "By now, you should feel like a true data wizard, ready to uncover the hidden patterns and secrets within any dataset. Let us continue our journey with confidence and curiosity, for there is much more to discover in the magical world of data science! 🌌🔍" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2bb28730-c64b-4f95-bb95-0fbe8e2b7d9f", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/harry-potter-04b.ipynb b/harry-potter-04b.ipynb new file mode 100644 index 0000000..f4fa6b9 --- /dev/null +++ b/harry-potter-04b.ipynb @@ -0,0 +1,138 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "f8e6e25c-2fcb-4b1b-b47f-1534a1e5f4b4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unnamed: 0 0\n", + "name 0\n", + "gender 0\n", + "age 0\n", + "origin 0\n", + "specialty 0\n", + "house 0\n", + "blood_status 0\n", + "pet 25\n", + "wand_type 0\n", + "patronus 2\n", + "quidditch_position 42\n", + "boggart 0\n", + "favorite_class 1\n", + "house_points 2\n", + "dtype: int64\n", + "Unnamed: 0 0\n", + "name 0\n", + "gender 0\n", + "age 0\n", + "origin 0\n", + "specialty 0\n", + "house 0\n", + "blood_status 0\n", + "pet 0\n", + "wand_type 0\n", + "patronus 2\n", + "quidditch_position 0\n", + "boggart 0\n", + "favorite_class 0\n", + "house_points 2\n", + "dtype: int64\n" + ] + } + ], + "source": [ + "# Importing necessary libraries\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# Loading the dataset\n", + "dataset_path = 'data/hogwarts-students-01.csv' # Path to our dataset\n", + "hogwarts_df = pd.read_csv(dataset_path)\n", + "\n", + "# Checking for missing values\n", + "print(hogwarts_df.isnull().sum())" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "5c11b874-4824-4287-beff-d19ca93d505d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unnamed: 0 0\n", + "name 0\n", + "gender 0\n", + "age 0\n", + "origin 0\n", + "specialty 0\n", + "house 0\n", + "blood_status 0\n", + "pet 0\n", + "wand_type 0\n", + "patronus 2\n", + "quidditch_position 0\n", + "boggart 0\n", + "favorite_class 0\n", + "house_points 2\n", + "dtype: int64\n" + ] + } + ], + "source": [ + "# Filling missing numerical values with the mean\n", + "hogwarts_df['age'].fillna(hogwarts_df['age'].mean(), inplace=True)\n", + "\n", + "# Filling missing categorical values with the mode\n", + "hogwarts_df['house'].fillna(hogwarts_df['house'].mode()[0], inplace=True)\n", + "hogwarts_df['gender'].fillna(hogwarts_df['gender'].mode()[0], inplace=True)\n", + "hogwarts_df['specialty'].fillna(hogwarts_df['specialty'].mode()[0], inplace=True)\n", + "hogwarts_df['blood_status'].fillna(hogwarts_df['blood_status'].mode()[0], inplace=True)\n", + "hogwarts_df['pet'].fillna(hogwarts_df['pet'].mode()[0], inplace=True)\n", + "hogwarts_df['wand_type'].fillna(hogwarts_df['wand_type'].mode()[0], inplace=True)\n", + "hogwarts_df['quidditch_position'].fillna(hogwarts_df['quidditch_position'].mode()[0], inplace=True)\n", + "hogwarts_df['favorite_class'].fillna(hogwarts_df['favorite_class'].mode()[0], inplace=True)\n", + "\n", + "# Verifying that all missing values are handled\n", + "print(hogwarts_df.isnull().sum())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57263cab-ffd4-4220-85b6-8256c9430ebd", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}