-
Notifications
You must be signed in to change notification settings - Fork 0
CaptainStorm21/Hands-on-Python-Ai-Pandas
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
some code sources / files are jumping a coupe of numbers - Those are associated with the udemy course Hands-on Machine Learning Tutorial with Pandas, Numpy, Seaborn, Scikit-learn in Python and ChatGPT: A Complete Work-flow It's a fun class very educational - most chatGPT autogenerated codes do not look exactly the same as in the videos. # Lesson 1 https://chat.openai.com/ Give me a python code to load an excel data Change file name excel_data # Lesson 2 Give me a python code to find out missing values in a data Add Display the head df.head() Give me a python code for simpleimpurer to impute mission values in variabe of a data Imputer is a machine learning function which is used to impute the missing values, or to replace themissing values in a variable in a data by a standard measure. pip install scikit-learn ## Solution to excersize 1 import pandas as pd data = { 'age': [25, 30, None, 40, 35, None], 'income': [50000, 60000, 'XXX', 80000, 70000, 32477], 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York', 'Chicago'] } df = pd.DataFrame(data) duplicate_row = pd.DataFrame({'age': [30, None], 'income': [60000, 'YYY'], 'city': ['Los Angeles', 'Los Angeles']}) df = pd.concat([df, duplicate_row], ignore_index=True) print(df) ## Solution to exercise 2 import pandas as pd from sklearn.impute import SimpleImputer # Create DataFrame data = { 'age': [25, 30, None, 40, 35], # Three missing values 'income': [50000.3, 60000.34, 'XXX', 80000, 70000], # Inconsistent value 'XXX' 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York'] } df = pd.DataFrame(data)l df['income'] = pd.to_numeric(df['income'], errors='coerce') age_strategy = 'mean' # Choose an appropriate strategy for the 'age' column income_strategy = 'median' # Choose an appropriate strategy for the 'income' column age_imputer = SimpleImputer(strategy=age_strategy) income_imputer = SimpleImputer(strategy=income_strategy) df['age'] = age_imputer.fit_transform(df[['age']]) df['income'] = income_imputer.fit_transform(df[['income']]) df['age'] = df['age'].astype(int) df['income'] = df['income'].astype(float) print(df) # Lesson 3 Give me a python code find the data types of the variables in a data Give me a python code to find out non-numberic value from the numeric values in a variable ## Exercise 3 Write the following code for creating the dataframe: import pandas as pd # Create DataFrame data = { 'age': [25, 30, None, 40, 35], # Three missing values 'income': [50000, 60000, 'XXX', 80000, 70000], # Inconsistent value 'XXX' 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York'] } df = pd.DataFrame(data) Then, perform necessary code after this to drop the inconsistent value 'XXX' from income. Solution import pandas as pd # Create DataFrame data = { 'age': [25, 30, None, 40, 35], # Three missing values 'income': [50000, 60000, 'XXX', 80000, 70000], # Inconsistent value 'XXX' 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York'] } df = pd.DataFrame(data) # Convert 'income' column to numeric, coerce errors to NaN df['income'] = pd.to_numeric(df['income'], errors='coerce') # Drop rows with NaN values in the 'income' column df = df.dropna(subset=['income']) print(df) In Python, specifically in the context of Pandas, dropna() is a method used to remove missing (NaN) values from a DataFrame or Series object. It allows you to drop either entire rows or entire columns that contain missing values. ## Exercise 4 Removing inconsistent value Write the following code for creating the dataframe: import pandas as pd # Create DataFrame data = { 'age': [25, 30, None, 40, 35], # Three missing values 'income': [50000, 60000, 'XXX', 80000, 70000], # Inconsistent value 'XXX' 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York'] } df = pd.DataFrame(data) Then, perform necessary code after this to drop the inconsistent value 'XXX' from income. Solution import pandas as pd # Create DataFrame data = { 'age': [25, 30, None, 40, 35], # Three missing values 'income': [50000, 60000, 'XXX', 80000, 70000], # Inconsistent value 'XXX' 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York'] } df = pd.DataFrame(data) # Remove rows with inconsistent value 'XXX' from 'income' column df = df[df['income'] != 'XXX'] print(df) ## Exercise 5 Assigning correct data types Write the following code first: import pandas as pd # Create DataFrame data = { 'age': [25, 30, None, 40, 35], # Three missing values 'income': [50000, 60000, 'XXX', 80000, 70000], # Inconsistent value 'XXX' 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York'] } df = pd.DataFrame(data) # Drop rows with 'XXX' in 'income' column df = df[df['income'] != 'XXX'] Now assign correct data type to the variable income. solution import pandas as pd # Create DataFrame data = { 'age': [25, 30, None, 40, 35], # Three missing values 'income': [50000, 60000, 'XXX', 80000, 70000], # Inconsistent value 'XXX' 'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'New York'] } print(data) df = pd.DataFrame(data) # Drop rows with 'XXX' in 'income' column df = df[df['income'] != 'XXX'] # Convert 'income' column to numeric df['income'] = pd.to_numeric(df['income']) ## Lesson 5 Give me a python code to deal with duplicates To handle duplicates in a DataFrame using Pandas, you can use several methods like drop_duplicates(), duplicated(), or groupby(). Here's how you can use these methods: python Copy code import pandas as pd # Sample DataFrame data = { 'name': ['John', 'Alice', 'Bob', 'Alice', 'John'], 'age': [25, 30, 35, 30, 25], 'city': ['New York', 'Los Angeles', 'Chicago', 'Los Angeles', 'New York'] } df = pd.DataFrame(data) ### Method 1: Drop duplicates df_dedup = df.drop_duplicates() print("DataFrame after dropping duplicates:") print(df_dedup) ### Method 2: Identify duplicates duplicates_mask = df.duplicated() print("\nMask of duplicated rows:") print(duplicates_mask) ### Method 3: Group by and aggregate (e.g., take the first occurrence) df_grouped = df.groupby(['name', 'age', 'city']).first().reset_index() print("\nDataFrame after grouping and taking first occurrence:") print(df_grouped)
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published