Frequently Asked Questions

Here are some of the more common questions asked about PyFluxPro and their answers.

What is PyFluxPro? PyFluxPro is a program for the quality control, post-processing, gap filling and partitioning of data from flux towers. It also features several powerful data visualisation tools, routines for converting file formats and can be run in interactive and batch modes. It is open source.
What does it do? PyFluxPro takes data recorded at flux towers, averaged over 30 or 60 minutes, and produces a clean, gap filled and partitioned (NEE) product that can be used by researchers interested in process-based studies and by modellers for parameterisation and validation. The user can;
1. Select from 7 different quality control tests to automatically remove bad data.
2. Apply a range of common corrections (WPL, ground heat flux storage term, CO2 storage).
3. Use data from external sources such as the Bureau of Meteorology (BoM) Automatic Weather Station (AWS) network, the BoM ACCESS-G numerical weather prediction model and the ECMWF Reanalysis product (ERA5) to gap fill meteorological, radiation and soil data.
4. Use Marginal Distribution Sampling (MDS) or a neural network (SOLO) to gap fill flux data, with the ability to use MODIS data (EVI and NDVI) to express the seasonal changes in dynamic ecosystems.
5. Use 3 methods (SOLO, Lloyd-Taylor and Lasslop) to partition NEE into GPP and ER and provide monthly and annual summaries of carbon, water and energy budgets.
Does it use 10Hz data or 30 min data? PyFluxPro uses 30 or 60 minute data. EddyPro, EdiRe and TK3 use 10 or 20 Hz data.
Why should I use PyFluxPro and not EdiRe, Easyflux or EddyPro? PyFluxPro and EddyPro, EasyFlux and EdiRe (and TK3) do different things. EddyPro, EdiRe etc are only for processing 10 or 20 Hz data. They can do simple quality control but can not do post-processing, gap filling or partitioning.
What are the six levels (L1, L2…)? The six levels are (see also Isaac et al, 2016 Biogeosciences):
1. L1 - read input data, combine with global and variable metadata and write to a netCDF file
2. L2 - quality control.
3. L3 - post-processing.
4. L4 - gap fill meteorological data.
5. L5 - gap fill fluxes.
6. L6 - partition NEE into GPP and ER
Why does every stage have its own control file? Quality control, processing, gap filling and partitioning of fluxes is a complex data path with many options that the user needs to specify. In addition, every flux tower is different and that site to site variability has to be captured somewhere. Any processing path needs a system to organise and store the options and individual site characteristics. In PyFluxPro, this is done with control files that the user can edit.
Why do I have to create the control files myself, is there not a standard protocol that delivers half-decent flux data processing (for a beginner it may be overwhelming to make decisions on that many parameters)? There are three parts to this answer;
1. PyFluxPro comes with a range of control file templates at all levels that users can take and modify to suit their situation. For example, L1 templates are available for "standard" OzFlux output, for EddyPro output, for Campbell's EasyFlux output. For a basic flux tower, modifying one of these templates for your site might take half an hour, after which you can reuse it again and again and again.
2. There is no standard protocol. That is why PyFluxPro has to do what it does.
3. It is daunting for beginners. That is why TERN EP and OzFlux run the data workshop every year and why EPCN staff spend hours on Zoom with all kinds of users. By far the biggest hurdle is not the complexity of PyFluxPro but people's lack of knowledge about processing flux tower data. Often, we have to teach them the basics, sometimes even the basics of using a computer, before we even get to setting up PyFluxPro. It's OK to be a beginner, we all were at some point, but processing flux data is complex. Don't expect it to be easy or to only take an hour or so.
Which gap filling procedure is best for which situation? It is not possible to give a blanket statement on which method is best because this changes from site to site and from case to case within a single site. The best we can do is make some general statements:
1. Short gaps can be filled by interpolation. PyFluxPro uses Akima 1D interpolation for gaps up to 3 hours.
2. For meteorology, the best method is to use alternate data from AWS, ACCESS or ERA5 if that is available. See Isaac et al 2016 for details. AWS is best for meteorology but does not have radiation or soil data. ACCESS has meteorology, radiation and soil and works really well. ERA5 has the same data as ACCESS but is not quite so good.
3. For fluxes, MDS (used by FluxNet) works well for gaps up to a few days but quickly degrades for longer gaps and can be very bad for gaps of a week or longer, especially if the ecosystem is dynamic e.g. tropical savanna, crops etc.
4. For fluxes, the SOLO neural network works very well for gaps up to a couple of months. When you add MODIS data (EVI, NDVI, ...) to introduce a driver with seasonal information, it works well for long gaps of several months, even when those gaps span a rapid change in the ecosystem (see Dry River 2016 gap from mid dry season to mid wet season).
Why do I have to manually copy stuff from a datalogger file into various Excel files? It is common to compile an L1 workbook from the data logger output. This can be as simple as opening a CSV file (TOA5 from Campbell loggers, full_output file from EddyPro) in Excel and then appending to this as new data comes in from the site. That is all that is required. One of the reasons PyFluxPro still requires the user to manually construct their own L1 input file is that we haven't had the time to build this code for Campbell systems or EddyPro output. It's on the to do list and if you want to help, give us a call. As an aside, the format for the EddyPro output files almost never changes. In contrast, site PIs love playing with their data logger programs, adding this, taking away that, forgetting or even not bothering to document the changes. The code required to make a generic input routine that can deal with an arbitrary level of complexity will itself be complex, challenging to write and time consuming to test. We'd love to do it but would have to drop something else.
Why can’t you copy the data directly from the logger into PyFluxPro? Surely it is easy to automate the data transfer from logger to PyFluxPro without a detour via Excel? PyFluxPro will read a TOA5 output by a Campbell data logger, avoiding the use of Excel. It will also read a full_output file from EddyPro without going via Excel. But, all the data you want has to be in one CSV file (i.e. one output table from the logger or no biomet data) and the format of the CSV file (i.e. what data is in which column) has to be the same for your entire data set. Otherwise you are straight back to point 9 above. Yes, a generic routine could be written, see point 9 above. Lots of people are asking for this so we'll have a look at doing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequently Asked Questions

Frequently Asked Questions

Clone this wiki locally