Skip to content

Latest commit

 

History

History
30 lines (25 loc) · 1.54 KB

4_Statsmodels.md

File metadata and controls

30 lines (25 loc) · 1.54 KB

Statistics

Python Vs. Others

Python

  • Best all-in-one package
  • Python is probably the best package for machine learning and BIG data.
  • Pandas and statsmodels is an acceptable substitute for R most of the time.
  • Python is relatively simple to use, although Matlab is the easiest.
  • GUIs aren't as easy to build as in matlab, but they work anywhere.

R

  • by far has the most comprehensive statistics. If you want a complicated statistical test hardly anyone has heard of, I almost guarantee R will have it.
  • Very good at exploration. If you're good at R, you can make a lot of graphs really fast.
  • Limited usability for anything other than stats.

Matlab

  • Expensive
  • Great at data manipulation
  • Almost as good as python for multi-purpose science programming, but with serious caveats (advanced GUI support almost non-existent. If you want to dig in and modify a function it's almost impossible.)
  • Professional Support. If you can't figure out how to use a function, they have professional staff on hand.

Statsmodels

With that fun comparison of languages, let's begin our exploration of statsmodels, python's "R substitute". Let's look at the documentation:
https://www.statsmodels.org/stable/user-guide.html
Here we are presented with the user guide. The home page has some extended examples of usage.

Here is the task:

  • find a function to implement the welsh t-test, a common task in scientific analysis.

Hints:
! the welsh t-test is a variation on the standard two-tailed t-test, but accounts for unequal variances.