-
Notifications
You must be signed in to change notification settings - Fork 0
Tutorials
The tutorials at PyCon Canada have a very limited capacity. If you're interested in attending one of the PyCon Canada tutorials, please sign-up in advance below. Note that you must already have a ticket to PyCon Canada 2012 to attend.
We will also be using this list to send out any tutorial requirements (things that you should have installed on your laptop before arriving at the tutorial).
HDF5 is a hierarchical, binary database format that has become a de facto standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations.
This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?!
This tutorial is targeted at a more advanced audience which has a prior knowledge of Python and NumPy. Knowledge of C or C++ and basic HDF5 is recommended but not required. This tutorial should be taken in conjunction with David Warde-Farley's Introduction to Numerical and Scientific Computing with Python tutorial below.
Recent versions of PyTables and its dependencies HDF5, NumPy, and numexpr are required for this tutorial. Matplotlib and IPython are also highly recommended. For browsing HDF5 files, either ViTables (http://vitables.org/) or HDFView (http://www.hdfgroup.org/hdf-java-html/hdfview/) are quite useful.
To aid in the installation process, there are several free distributions of Python that include all or most of the packages covered in a one-click installer format. Anaconda CE (Community Edition) from Continuum Analytics is a good choice as PyTables (and its dependencies) out of the box. EPD Free contains NumPy, IPython and Matplotlib, but the other packages will need to be installed by the user. The full EPD is also available for free for academics and contains everything needed for this tutorial. Windows users may also have some luck with Python(x, y), though the instructor has no experience with this distribution.
- Anthony Scopatz (Instructor)
- Stefan Wiechula
- Nasser M. Abukhdeir
- Christopher Ing
- Greg Wilson
- Fernando Perez
- Scott Rostrup
- David Warde-Farley
- Ram Venkat
- Ramsey D'silva
- Zhuyi Xue
- Eric Anderson
- Edwin Frondozo
- Taavi Burns
Intended audience: beginning to intermediate Python programmers, anyone who has written Python applications that feels they might have need of number crunching facilities.
This tutorial will offer an introduction to numerical data processing and scientific computing using Python. The tutorial will revolve around NumPy, the fundamental package for scientific computing with Python, and introduce users to its use not only in implementing numerical algorithms but in interfacing with legacy systems and C libraries. We will touch on scientific visualization with matplotlib as well as other "general interest" packages such as SciPy and scikit-learn, and discuss and demonstrate strategies for writing high-performance numerical code that can be easily integrated into larger Python-based applications.
Recent versions of NumPy, Matplotlib will suffice for most of the tutorial. Cython will also be covered and will be useful for the later advanced portions (note that this requires a C compiler). IPython and its optional dependencies pyzmq and libzmq are highly recommended.
There are several free distributions of Python that include all or most of the packages covered in a one-click installer format. The most comprehensive of these is Anaconda CE (Community Edition) from Continuum Analytics: it includes all the packages touched on in this tutorial as well PyTables, required for Anthony Scopatz's HDF5 tutorial above. It is available for Windows, Mac and Linux.
EPD Free, a free community edition of Enthought Python Distribution, contains everything in this tutorial except for Cython. High school and postsecondary students and staff are also eligible for a free academic license for the full version of EPD courtesy of Enthought. Either option is also available for Windows, Mac or Linux.
Windows users may also have some luck with Python(x, y), though the instructor has no experience with this distribution.
- David Warde-Farley (Instructor)
- Ashwin Panchapakesan
- Andrey Paramonov
- Gerrat Rickert
- Vid Ayer
- Chris Cooper
- Jonathan Dobson
- Mike Pettypiece
- Nasser M. Abukhdeir
- Annika Hillebrandt
- Yanshuai Cao
- David Kua
- Chris Fournier
- Jeremy Banks
- Fernando Perez
- Jason Cornell
- Simon Ditner
- Ram Venkat
- Edwin Frondozo
- Olivier Yiptong
- Ramsey D'silva
- Mahmoud Hashim
- Terence Lo
Find and fix your performance bottlenecks. Where should you spend your time so your users don't have to spend theirs waiting for your code? Topics covered: algorithmic complexity ("big O" notation); using the right algorithm for the job; profiling to find the hot spots; micro-optimization tricks; caching vs. computing; storage hierarchies; and when/how you should turn to C.
- Python interpreter (2.7 or 3.2; bonus points if you have both)
- text editor
- Cython (optional)
- Greg Ward (Instructor)
- Javier de la Rosa (@versae)
- Cameron Davidson-Pilon
- Todd Whiteman
- Martine Vong
- Julien Vong
- Matt Okura
- Mike Pettypiece
- Yanshuai Cao
- Ashwin Panchapakesan
- Chris Boothe
- Stefan Wiechula
- Trevor Bekolay
- Steve Singer
- Fernando Perez
- Alan Boudreault
- Jason Cornell
- David Warde-Farley
- Evan Hicks
- Anthony Scopatz
- Ramsey D'silva
- Zach Aysan
- Russell Warren
- Amrik Singh
- Matt Ruten
- Zhuyi Xue
- Terence Lo
- Ye Liu