Skip to content

Analysis on School District's funding and SAT scores using Pandas and NumPy libraries.

Notifications You must be signed in to change notification settings

nicoserrano/School_District_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

School_District_Analysis

Analysis on School District SAT scores using pandas library.

Overview

In this project I will be helping the State's Board of Education analyze data on student funding and SAT scores. The goal will be to uncover trends on student's performance and funding in order to reallocate the budget appropriately.

Resources

  • Data sources:
    • schools_complete.csv
    • students_complete.csv
  • Software:
    • Python 3.6.1
    • Jupyter Notebook
    • Pandas and NumPy Libraries

Analysis

The first thing I needed to do was clean the data. This process consisted of two steps. First, I noticed that some of the students had professional prefixes and suffixes. So, I fixed it using the replace() method as follows:

prefixes_suffixes = ["Dr. ", "Mr. ","Ms. ", "Mrs. ", "Miss ", " MD", " DDS", " DVM", " PhD"]

for word in prefixes_suffixes:
    student_data_df["student_name"] = student_data_df["student_name"].str.replace(word,"")

And second, I noticed the data evidenced academic dishonesty in the results of 9th graders at Thomas High School. Thus, I replaced their math and reading scores with NaNs while keeping the rest of the data intact. I will be addressing later on how this changes affected the overall analysis.

student_data_df.loc[(student_data_df["grade"] == "9th") & (student_data_df["school_name"] == "Thomas High School"),["reading_score"]] = np.nan

student_data_df.loc[(student_data_df["grade"] == "9th") & (student_data_df["school_name"] == "Thomas High School"),["math_score"]] = np.nan

Before going into the actual analysis of the data with the NaNs for 9th graders at Thomas H.S., I expected some slight but not significant changes in the results. NaN (Not a Number) errors can be tricky. Even though they can be used to perform additions and averages, they might be a problem when trying to multiply or divide. In this case, I knew they were not going to affect my approach as they were just going to be dismissed when trying to calculate the average math and reading scores for each grade and school. Again, before going through everything I expected no major differences but we will se what happened in the results section.

Results

  • How is the district summary affected?

Old District Summary: Screen Shot 2021-06-06 at 6 45 09 PM

New District Summary: Screen Shot 2021-06-06 at 6 43 08 PM

As we can observe, there is a slight difference in the results for the district summary. The academic dishonesty of 9th graders at Thomas High School made the overall results go down by a little. Once I replaced their results with NaNs, the average scores, passing percentages and overall passing percentages went down by no more than 0.3. In conclusion, as the district summary is such a broad analysis it was not affected significantly because the scores from the students we replaced consisted of just a small fraction of the overall analysis of 15 schools.

  • How is the school summary affected?

Screen Shot 2021-06-07 at 10 13 42 PM

Old School Summary: Screen Shot 2021-06-06 at 7 16 54 PM

New School Summary: Screen Shot 2021-06-06 at 7 17 42 PM

As it can be seen, the NaN generated a significant increase in the passing percentages for math, reading, and overall. This was due to the fact that 9th graders at Thomas High School had a considerably high amount of students who didn't pass the math or reading exams. Even though there was academic dishonesty, 9th graders were making that percentage of students who passed remain very low in the 60s %. After their scores got replaced with NaN, and therefore dismissed in the analysis, the percentage of passing students increased up to the 90s % as the rest of the school performed mostly above the passing grade of 70.

  • How does replacing the ninth graders’ math and reading scores affect Thomas High School’s performance relative to the other schools?

Screen Shot 2021-06-07 at 10 42 29 PM

In comparison to other schools, after replacing the 9th graders with NaNs Thomas High School's results became a lot better. The average scores were still pretty similar, but the passing percentages were at the top. These are the top 5 schools of the district:

Screen Shot 2021-06-07 at 10 44 11 PM

As we can see above, it reached the 2nd place out of the 15 schools in the district with a overall passing percentage of 90.63%. This meant that 90.6% of the entire high school (excluding 9th graders because of academic dishonesty) had a score above 70 for both math and reading.

How does replacing the ninth-grade scores affect the following:

  • Math and reading scores by grade

Screen Shot 2021-06-07 at 11 28 01 PM Screen Shot 2021-06-07 at 11 28 11 PM

First we have the math average scores per grade followed by the reading scores per grade for each school. In this case, the NaNs did not affect our results at all, they were just displayed as an error for ninth graders at Thomas High School. The rest of the data remained intact.

  • Scores by school spending

Screen Shot 2021-06-07 at 11 33 11 PM

These are the results categorized by the budget per student, meaning the money that the school is theoretically investing on each student. In this situation, the NaNs do not affect the results at all. This is due to the fact that there are a couple of other schools that are categorized in this same per student budget, not affecting the average scores and passing percentages considerably. The per student budget was a calculation made by dividing the school budget by the number of students in it. Thomas High School still has 1,635 students and a $1,043,130 budget making them be part of the third per student budget interval with $638 dollars.

  • Scores by school size

Screen Shot 2021-06-07 at 11 33 22 PM

Again, Thomas High School kept on having 1,635 students which makes them a medium sized along with other 4 more schools. By replacing the ninth graders scores with NaNs the data is almost untouched. There is no significant changes in the results for this category.

  • Scores by school type

Screen Shot 2021-06-07 at 11 33 33 PM

And last but not least, the results for the school type did not change at all. This was due to the fact that Thomas High School was categorized in the Charter school type along other 7 schools. Their ninth graders were just a small group relative to all the students in Charter schools, making the overall results stay untouched.

Summary

In conclusion, replacing the ninth graders' results at Thomas High School with NaNs generated 4 overall changes to the updated School District Analysis. The first one consisted on a slight alteration in the district summary. As the these students just composed a small portion of the entire population of students from 15 different schools, the averages and passing percentages were decreased by no more than 0.3 units. Moreover, we were able to see a big change in the per school summary as we were analyzing each school independently. For Thomas H.S., replacing 9th graders' scores now consisted on almost 30% of their population of students. Their passing percentages increased by a lot because 9th graders were bringing averages and passing percentages down because most of them had grades below 70. The passing percentages were on the 60s% and moved to the 90s% with the NaNs. Therefore, the top schools graph was also affected as Thomas H.S. was now in second place with an overall passing percentage of 90.6%. And lastly, the graph that was also affected by this change was the average math and reading scores per grade. In this case, the mean could not be calculated for the 9th graders in Thomas H.S. again because all of the scores were replaced with NaNs so the graph displayed the Not a Number error.

About

Analysis on School District's funding and SAT scores using Pandas and NumPy libraries.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published