Skip to content

Explore this repository for a comprehensive year-long analysis of sales data. Leveraging Pandas and data visualization libraries, this project uncovers valuable insights, including top-selling products and peak sales hours. It showcases techniques for data cleaning, exploration, and visualization

License

Notifications You must be signed in to change notification settings

vkquests/Sales_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sales_Analysis

In this analysis project, we leverage the analytical power of Pandas and Matplotlib to dive deep into a year's worth of sales data. This extensive dataset comprises a wide array of transactional records from our electronics store, encompassing crucial details like product categories, pricing, customer locations, and more.

Our mission is to extract actionable insights, address key business inquiries, and unveil the underlying narrative within this substantial sales dataset. By doing so, we aim to empower informed decision-making, optimize business strategies, and unlock the full potential of our sales data for growth and success. Check the notebook for a detailed explanation.

We commence our data analysis journey with a robust data cleaning phase, where we address crucial tasks such as:

  • Eliminating NaN values from our DataFrame.
  • Removing duplicate rows from the dataset.
  • Removing rows based on specific conditions.
  • Converting column data types to the appropriate formats using methods like to_numeric, to_datetime, and astype.

Once our data is polished and refined, we transition to the data exploration section, where we investigate five key business questions of paramount importance:

  1. What was the best month for sales, and how much was earned during that month?
  2. Which city had the highest product sales?
  3. What is the optimal time for displaying advertisements to maximize the likelihood of customers buying products?
  4. Which products are most frequently sold together?
  5. Which product had the highest sales, and why do you think it achieved that?

To address these inquiries effectively, we employ an array of powerful pandas and Matplotlib methods, including:

  • Combining multiple CSV files into a consolidated DataFrame using pd.concat.
  • Augmenting our dataset with additional columns.
  • Parsing cell data as strings to derive new columns.
  • Leveraging the .apply() method for advanced data manipulation.
  • Employing the groupby operation to perform insightful aggregate analysis.
  • Crafting informative bar charts and line graphs to visualize our findings.
  • Enhancing the clarity of our visualizations with well-placed labels.

Join on this data-driven expedition, where we unlock valuable insights and reveal the compelling narrative hidden within our sales data.

About

Explore this repository for a comprehensive year-long analysis of sales data. Leveraging Pandas and data visualization libraries, this project uncovers valuable insights, including top-selling products and peak sales hours. It showcases techniques for data cleaning, exploration, and visualization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published