In this competition you will work with a challenging time-series dataset consisting of daily sales data, kindly provided by one of the largest Russian software firms - 1C Company. We are asking you to predict total sales for every product and store in the next month. By solving this competition you will be able to apply and enhance your data science skills.
Here is the link to download the dataset https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data
Submissions are evaluated by root mean squared error (RMSE). True target values are clipped into [0,20] range.
You are provided with daily historical sales data. The task is to forecast the total amount of products sold in every shop for the test set. Note that the list of shops and products slightly changes every month. Creating a robust model that can handle such situations is part of the challenge.
sales_train.csv - the training set. Daily historical data from January 2013 to October 2015. test.csv - the test set. You need to forecast the sales for these shops and products for November 2015. sample_submission.csv - a sample submission file in the correct format. items.csv - supplemental information about the items/products. item_categories.csv - supplemental information about the items categories. shops.csv- supplemental information about the shops. Data fields ID - an Id that represents a (Shop, Item) tuple within the test set shop_id - unique identifier of a shop item_id - unique identifier of a product item_category_id - unique identifier of item category item_cnt_day - number of products sold. You are predicting a monthly amount of this measure item_price - current price of an item date - date in format dd/mm/yyyy date_block_num - a consecutive month number, used for convenience. January 2013 is 0, February 2013 is 1,..., October 2015 is 33 item_name - name of item shop_name - name of shop item_category_name - name of item category
Long short-term memory is an artificial recurrent neural network architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can not only process single data points, but also entire sequences of data
LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. LSTMs were developed to deal with the vanishing gradient problem that can be encountered when training traditional RNNs