This project is a part of K136. Istanbul Data Science Bootcamp. house_price_1 is original dataset and evfiyati is modified version of house_price_1 HousePricesoriginal contains all processing that convert house_price_1 to evfiyati. There is also a modeling part. HousePrices was created to be easier to work with. It uses evfiyati, which is a directly processed data. HousePricePrediction_lit.py is streamlit code.
This repo has been developed for the Istanbul Data Science Bootcamp, organized in cooperation with İBB & Kodluyoruz. Prediction for house prices was developed using the Kaggle House Prices - Advanced Regression Techniques competition dataset.
The dataset is available at Kaggle.
The goal of this project is to predict the price of a house in Ames using the features provided by the dataset.
The dataset contains the following features:
- MSSubClass: Identifies the type of dwelling involved in the sale
- MoSold: Month Sold (MM)
- OverallQual: Rates the overall material and finish of the house
- OveralCond: Rates the overall condition of the house
- YrSold: Year Sold (YYYY)
- LotArea: Lot size in square feet
- LotFrontage: Linear feet of street connected to property
- BsmtUnfSF: Unfinished square feet of basement area
- GrLivArea: Above grade (ground) living area square feet
- GarageArea: Size of garage in square feet
- TotalBsmtSF: Total square feet of basement area
- YearBuilt: Year house was built
- 1stFlrSF: First Floor square feet
- BsmtFinSF1: Type 1 finished square feet
- OpenPorchSF: Open porch square feet
- MasVnrArea: Masonry veneer square feet
- YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
- WoodDeckSF: Wood deck area in square feet
- GarageYrBlt: Year garage was built
- 2ndFlrSF: Second floor square feet
- SalePrice: Sale price
# clone the repo
git clone https://github.com/izelcelikkaya/housepriceprediction.git
# change to the repo directory
cd housepriceprediction
# if virtualenv is not installed, install it
#pip install virtualenv
# create a virtualenv
virtualenv -p python3 venv
# activate virtualenv for LINUX or MACOS
source venv/bin/activate
# # activate virtualenv for WINDOWS
# venv\Scripts\activate.ps1
# # throubleshooting for activation error in windows
# Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
# install dependencies
pip install -r requirements.txt
# run the script
streamlit run About_🚀.py
The model is based on a XGBoost algorithm.
from xgboost import XGBRegressor
import xgboost as xgb
model = XGBRegressor(n_estimators=1150,
max_depth=5, eta=0.03,
subsample=0.5,
colsample_bytree=0.8)
model.fit(X_train,y_train)
from sklearn.metrics import mean_squared_error
from math import sqrt
def mae(y_true, predictions):
y_true, predictions = np.array(y_true), np.array(predictions)
return np.mean(np.abs(y_true - predictions))
pred = model.predict(X_test)
r_squared = model.score(X_test, y_test)
print(sqrt(mean_squared_error(y_test, pred)))
print(mae(y_test,pred))
print(r_squared)
The model is trained on the dataset and tested on the test dataset. The results are shown demo with Streamlit below:
- İzel Çelikkaya - Github - LinkedIn
- Mehmet Özmen - Github - LinkedIn
- Uğur Can Kıvanç - Github - LinkedIn
- Fuat Akdemir - Github - LinkedIn
- Çiğdem Taş - Github - LinkedIn
- Ercan Tuncay - Github - LinkedIn
- Serenay Ardahanlı - Github - LinkedIn
- Ali Haydar Şenyurt - Github - LinkedIn
- Aybüke Akçay - Github - LinkedIn
- Ömer Batuhan Özbay - Github - LinkedIn
- Melisa Gündüz - Github - LinkedIn