Skip to content

This repo provides predictive analysis to estimate shares outstanding for public companies using stock market data from SEC 10K filings. It contains data exploration, feature engineering, model training, and evaluation with Linear, Lasso, Ridge, and XGBoost models. Includes datasets, notebooks, and results.

Notifications You must be signed in to change notification settings

VLTSankalpa/stock-market-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stock Market Data Analysis - Estimating Shares Outstanding

Predicting the number of shares outstanding for publicly listed companies is crucial for a variety of financial analyses and investment decisions. Shares outstanding represent the total number of shares held by all of a company's shareholders, including insiders, and are a key metric in understanding a company's ownership structure. By accurately predicting the shares outstanding, investors can assess a company's market capitalization, earnings per share, and other fundamental ratios, making it an essential part of stock valuation and financial planning.

In this repository, we focus on estimating the number of shares outstanding for publicly listed companies using stock market data. We leverage metrics extracted from SEC 10K fillings, as provided in the fundamentals.csv file from the New York Stock Exchange (S&P 500 companies historical prices with fundamental data) dataset available on Kaggle.

Project Objectives

  • Data Exploration: Perform an exploratory data analysis (EDA) to understand the dataset, including visualizations and descriptive statistics.
  • Feature Engineering and Preprocessing: Identify and engineer relevant features, handle missing values, and transform variables for optimal model performance.
  • Model Selection and Training: Evaluate the performance of various regression models, including Linear Regression, Lasso Regression, Ridge Regression, and XGBoost, and train them on the dataset.
  • Model Evaluation: Assess the models' performance using the Mean Absolute Percentage Error (MAPE) metric, and compare the results.
  • Conclusion: Summarize the findings and provide recommendations for more accurate estimation of shares outstanding.

Data

The dataset used in this project is sourced from the New York Stock Exchange (S&P 500 companies historical prices with fundamental data) dataset on Kaggle. It includes daily prices, company descriptions, and metrics extracted from annual SEC 10K fillings (2012-2016).

This analysis is focused solely on the fundamentals.csv file, which contains metrics extracted from SEC 10K fillings.

Contents of the Repository

kotlinCopy code
.
├── README.md
├── data
│   └── fundamentals.csv
├── notebooks
│   └── Shares_Outstanding_Prediction.ipynb
├── requirements.txt
└── results
    ├── model-preformance-comparison.png
    └── true-vs-predicted-values.png
  • README.md: Provides an overview of the project, including objectives, data sources, repository contents, and instructions for getting started.
  • data/: Contains the dataset used for the analysis.
    • fundamentals.csv: Metrics extracted from SEC 10K fillings.
  • notebooks/: Contains the Jupyter notebooks with the full analysis and code.
    • Shares_Outstanding_Prediction.ipynb: Jupyter notebook containing the analysis code.
  • requirements.txt: Lists the required Python packages for the analysis.
  • results/: Contains the results of the analysis, including model performance metrics and visualizations.
    • model-preformance-comparison.png: Visualization comparing the performance of different regression models.
    • true-vs-predicted-values.png: Visualization showing the true vs. predicted values for the best-performing model.

Getting Started

  1. Clone this repository to your local machine:

    git clone https://github.com/VLTSankalpa/stock-market-data-analysis.git
    cd stock-market-data-analysis
    
  2. Create a virtual environment using Conda and install the required packages from requirements.txt:

    conda create -n stock_analysis python=3.8
    conda activate stock_analysis
    pip install -r requirements.txt
    
  3. Open the Jupyter notebook in the notebooks/ directory to start the analysis:

    jupyter notebook notebooks/Shares_Outstanding_Prediction.ipynb
    

Contributing

If you have any suggestions or would like to contribute to this project, feel free to submit an issue or create a pull request.

About

This repo provides predictive analysis to estimate shares outstanding for public companies using stock market data from SEC 10K filings. It contains data exploration, feature engineering, model training, and evaluation with Linear, Lasso, Ridge, and XGBoost models. Includes datasets, notebooks, and results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published