- Best all-in-one package
- Python is probably the best package for machine learning and BIG data.
- Pandas and statsmodels is an acceptable substitute for R most of the time.
- Python is relatively simple to use, although Matlab is the easiest.
- GUIs aren't as easy to build as in matlab, but they work anywhere.
- by far has the most comprehensive statistics. If you want a complicated statistical test hardly anyone has heard of, I almost guarantee R will have it.
- Very good at exploration. If you're good at R, you can make a lot of graphs really fast.
- Limited usability for anything other than stats.
- Expensive
- Great at data manipulation
- Almost as good as python for multi-purpose science programming, but with serious caveats (advanced GUI support almost non-existent. If you want to dig in and modify a function it's almost impossible.)
- Professional Support. If you can't figure out how to use a function, they have professional staff on hand.
With that fun comparison of languages, let's begin our exploration of statsmodels, python's "R substitute". Let's look at the documentation:
https://www.statsmodels.org/stable/user-guide.html
Here we are presented with the user guide. The home page has some extended examples of usage.
Here is the task:
- find a function to implement the welsh t-test, a common task in scientific analysis.
Hints:
! the welsh t-test is a variation on the standard two-tailed t-test, but accounts for unequal variances.