-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update soils data used for surface dataset #1303
Comments
ISRIC is suggesting they produce a 1 km and 5 km SoilGrids product as (web optimized) geotiff format. Is this something we can use in the toolchain @slevisconsulting and @negin513? How hard should I push for a 3 arc minute product instead and a .nc format? |
I'm pretty sure we're going to want netcdf for our tool-chain: even if we could read geotiff directly, I'm not sure it's a good idea to have different raw data in different formats: I think that's going to cause pain long-term. That said, I don't have feelings on whether we ask them to produce netcdf or if we convert their geotiff file to netcdf as an initial one-time thing. Regarding resolution: First, I realized that our existing 1km file may not actually be uniform 1km: looking at the file name and metadata, I'm remembering that @swensosc merged 10' data from some regions with 1km data from most of the globe; my sense (maybe wrong) is that the resulting dataset is therefore an unstructured mix of resolutions. Regarding 5km vs. 3 arc-minute: Maybe we need to discuss as a group how much to push for conformity to a few standard resolutions vs. accepting whatever we get. I suspect that, if we use 5km, it will be the only dataset on this exact grid, somewhat increasing the time it takes to go through the toolchain – though probably not too terribly for 5km (as opposed to 1km, which is worse). |
Yes. We can discuss about uniformity, but my guess is that the reality is
that uniformity of resolutions is going to be challenging going forward.
So, I would probably rather not to put the burden on data providers and if
we really need something on a specific grid, we can do a one-time
regridding to that specific grid when we get the data.
…On Fri, Mar 26, 2021 at 10:30 AM Bill Sacks ***@***.***> wrote:
I'm pretty sure we're going to want netcdf for our tool-chain: even if we
could read geotiff directly, I'm not sure it's a good idea to have
different raw data in different formats: I think that's going to cause pain
long-term. That said, I don't have feelings on whether we ask them to
produce netcdf or if we convert their geotiff file to netcdf as an initial
one-time thing.
Regarding resolution: First, I realized that our existing 1km file may not
actually be uniform 1km: looking at the file name and metadata, I'm
remembering that @swensosc <https://github.com/swensosc> merged 10' data
from some regions with 1km data from most of the globe; my sense (maybe
wrong) is that the resulting dataset is therefore an unstructured mix of
resolutions. Regarding 5km vs. 3 arc-minute: Maybe we need to discuss as a
group how much to push for conformity to a few standard resolutions vs.
accepting whatever we get. I suspect that, if we use 5km, it will be the
only dataset on this exact grid, somewhat increasing the time it takes to
go through the toolchain – though probably not too terribly for 5km (as
opposed to 1km, which is worse).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1303 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVDXDUAWRKU675AGWOLTFSZDFANCNFSM4ZKQKIKQ>
.
|
And with the long term in mind, it's probably best to accept the highest resolution that they have to offer. Then, as @billsacks and @dlawrenncar said, we can spend the time once to get the data in the exact form that we can work with. |
While I agree, Sam, they have a 250 m product that's published and ready to
go. This doesn't seem like where we want to start...
…On Sat, Mar 27, 2021, 12:57 PM Samuel Levis ***@***.***> wrote:
And with the long term in mind, it's probably best to accept the highest
resolution that they have to offer. Then, as @billsacks
<https://github.com/billsacks> and @dlawrenncar
<https://github.com/dlawrenncar> said, we can spend the time once to get
the data in the exact form that we can work with.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1303 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5IWJFUBTZLC6AVCIB3EZ3TFYTADANCNFSM4ZKQKIKQ>
.
|
New 1km and 5km resolution product are now available from SoilGrids. you can find the data here: https://files.isric.org/soilgrids/latest/data_aggregated/ The data producers have asked for input on these data products, which I am happy to provide. What should be our workflow to start testing these data in new surface datasets? |
Is the data in a format that it could be used directly by mksrfdata? If it
is, then I think a straightforward test of the SoilGrids vs the existing
data where only soil texture is changed, would be the next step. Perhaps
good topic for discussion at next software meeting.
…On Tue, Feb 8, 2022 at 6:00 AM will wieder ***@***.***> wrote:
New 1km and 5km resolution product are now available from SoilGrids.
you can find the data here:
https://files.isric.org/soilgrids/latest/data_aggregated/
The metadata (including the DOI for citations) can be found here:
https://data.isric.org/
The data producers have asked for input on these data products, which I am
happy to provide. What should be our workflow to start testing these data
in new surface datasets?
—
Reply to this email directly, view it on GitHub
<#1303 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVCZJOBVMPRESKEFUOLU2EHX5ANCNFSM4ZKQKIKQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
files are in geotiff format. I'm assuming we'll need to merge datasets into a single .nc file first. |
Yes, we'll need to convert to NetCDF, and make sure we have the fields needed by mksurfdata_map on them. So @wwieder are there several data files for different global regions? If so as you suggest we'd need to merge them to one global dataset. All of the datasets are in one global file. |
The variables of interest includes data on clay, sand and soil organic C that we need now, but also data on soil N, pH, CEC, bulk density, etc. that may be useful down the road? I'm somewhat inclined to include more fields than we need in generating our 'raw' dataset. Each variable has 6 tiff files that are provided (one for each soil layer 0-5, 5-15,... 100-200 cm). These should be concatenated with a depth coordinate. We'll just have to maintain the metadata, or adjust units as appropriate, because my recollection is that units, especially for soil C are kind of odd. Translating the .tif files into .nc seems pretty trivial. |
This isn't a finished product, as I need to bring in metadata somehow (it's listed elsewhere on the soilgrids website), and a bunch of other detailed things, but here's my first attempt at converting a geotiff into a .nc projection for sand that seems reasonable This projection is not wall to wall (lat != -90 to 90). Does this matter for mksrf? What other considerations need to be made? |
Looks good to see! In principle I think it's OK for mksurfdata_map, that it doesn't cover the entire globe, the mapping will be done for the part of the grid that it does cover. I thought it might be a problem that it doesn't cover Antarctica, but neither does the current file we use, so I guess that's OK. Another thing that will need to be done is to create a SCRIP grid file that describes the grid and its vertices for each gridcell. This just has the center grid coordinates. Since, it's almost exactly a regular grid, we can calculate the vertices. |
OK, here's a full 5000m dataset with soil properties from SoilGrids. We can add additional metadata and talk about where to put my notebook that generated these plots. |
Notebook with code can be found here https://github.com/wwieder/ctsm_py/blob/master/notebooks/tiff2nc.ipynb |
Sorry, I'm still struggling to understand what's needed here? There are a bunch of ways to reproject the orig. tiff data, see this website, but I can't really find anything that would be better that what's already provided? Moreover, the spacing for lon seems pretty regular, and lats are identical. Below are longitude spacing. -0.04551960876054295 -0.04551960876057137 from here, can't we calculate the corners of each grid? |
@swensosc can you have a look at the dataset below to see what we can do to calculate the corners of each gridcell in a way that can be read into mksurfdata_map? /glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc |
@uturuncoglu, @mvertens mentioned that you have a tool that generates a mesh file from a raw dataset. I'm wondering if the dataset below has the information for what your script needs? |
@uturuncoglu - I was referring to the ncl/python code you have to take a lat/lon grid (or logically rectangular grid) and create a mesh file. |
@uturuncoglu - it would be great to make this available to the TSS group - even if its not totally finished. |
Hi All, The Python tool is in my personal Gist repository. You could find it in here, https://gist.github.com/uturuncoglu/4fdf7d4253b250dcf3cad2335651f162 The NCL one is in, https://gist.github.com/uturuncoglu/1da852ffe2e0247aa4bb0caf2e79df7a BTW, just note that those are not working for the all the cases and let me know if you need anything. |
We could try the tools with |
Thanks @slevisconsulting , that file worked. There is a 5-year climo comparison (10 year trends) here: I haven't looked in detail, but I don't see any major differences or problems. |
I also looked through some plots and diagnostics. I agree that nothing
seems unusual or unexpected. There is an impact, but for the most part the
impact is modest. This is probably good to go, which means we could turn
our attention to the organic matter portion. I have full schedule today
and am on PTO tomorrow. Perhaps we could start on this on Monday next
week.
…On Thu, Apr 14, 2022 at 7:50 AM Keith Oleson ***@***.***> wrote:
Thanks @slevisconsulting <https://github.com/slevisconsulting> , that
file worked. There is a 5-year climo comparison (10 year trends) here:
https://webext.cgd.ucar.edu/I2000/ctsm51sp_ctsm51d090_2deg_GSWP3V1_soiltex_hist/lnd/ctsm51sp_ctsm51d090_2deg_GSWP3V1_soiltex_hist.2005_2009-ctsm51sp_ctsm51d090_2deg_GSWP3V1_hist.2005_2009/setsIndex.html
I haven't looked in detail, but I don't see any major differences or
problems.
—
Reply to this email directly, view it on GitHub
<#1303 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVAY3JTQFORYRKBVFC3VFAPCTANCNFSM4ZKQKIKQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@dlawrenncar @olyson - this is really encouraging news. @slevisconsulting - we should meet to discuss how to integrate this new dataset into the main branch for mksurfdata_esmf. |
Next up: ORGANIC using the same mapping unit information being used for texture. I'm used to thinking about calculating organic matter stocks (kg C/m2), but the model only cares about organic matter density (kg OM/m3) Technically, this should just be calculated on the fine earth fraction (1-coarse fragments). the WISE lookup table has all of this information
Additionally we'll assume 1g OM = 0.58 gC. NOTE I have no idea where this conversion factor is from, but I'm assuming it was used for the old calculation of ORGANIC we've been using? Thus:
This should provide ORGANIC (kgC m^-3 soil) Here are samples for two different profiles (not sure where they are?) @dlawrenncar can you check this matches your expectations (and unit conversions). Also, I think we're more correct to just use the 'fine early fraction' by removing the coarse fragment (rocks), but don't know what you think? |
I think the equation makes sense, but I don't know if the values are
reasonable or not. I know that basically we use this number (organic kg
OM/m3) to calculate the fraction of organic matter in any soil layer. We
use a prescribed value of 130 kg OM/m3 as the maximum organic matter
density, based on standard density of peat soils (I think, need to look
back at where I got that number from). When creating the original organic
dataset, we constrain so that the values cannot be larger than 130 kg
OM/m3. Across most of the Arctic, the top several layers are 130 kg OM/m3,
reflecting the surface organic soils that are prevalent.
Anyway, with this said, I am starting to think that we may be better off
calculating the %ORGANIC (analogous to %SAND, % CLAY) and putting that onto
the surface dataset and then using %ORGANIC rather than doing this
calculation in the code. (Note that %ORGANIC is considered independently
of %SAND, %CLAY and the %SAND, %CLAY values are only used if %ORANIC is not
100). This probably would have been a better way to do this originally,
but ... This would require a new piece of code that uses %ORGANIC if it is
available on the surface dataset instead of using ORGANIC. Probably we
should discuss.
…On Tue, Apr 19, 2022 at 6:29 PM will wieder ***@***.***> wrote:
Next up: ORGANIC using the same mapping unit information being used for
texture.
I'm used to thinking about calculating organic matter stocks (kg C/m2),
but the model only cares about organic matter density (kg OM/m3)
Technically, this should just be calculated on the fine earth fraction
(1-coarse fragments).
the WISE lookup table has all of this information
Property units long_name
ORGC gC kg^-1 soil organic carbon content
BULK g soil cm^-3 bulk density
CFRAG volumetric, % coarse fragment
------------------------------
Additionally we'll assume 1g OM = 0.58 gC. *NOTE* I have no idea where
this conversion factor is from, but I'm assuming it was used for the old
calculation of ORGANIC we've been using?
Thus:
ORGANIC = ORGC*BULK*(100-CFRAG)/100 *1/0.58
This should provide ORGANIC (kgC m^-3 soil)
*I think all the units, converting g to kg and cm3 to m2, cancel out.*
Here are samples for two different profiles (not sure where they are?)
[image: image]
<https://user-images.githubusercontent.com/8031012/164122451-dd8b67a4-5ba6-48fe-a30e-a2675527044c.png>
@dlawrenncar <https://github.com/dlawrenncar> can you check this matches
your expectations (and unit conversions). Also, I think we're more correct
to just use the 'fine early fraction' by removing the coarse fragment
(rocks), but don't know what you think?
—
Reply to this email directly, view it on GitHub
<#1303 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVHEVXHBYRYTWMF4VS3VF5FVJANCNFSM4ZKQKIKQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@wwieder @slevisconsulting
I'd like to move both of these to inputdata and put them in the xml file so that we are not pointing into datasets in scratch. |
I'm happy to set up a follow-up meeting, though @wwieder could you share your calendar with me so that I may pick a time that works for all? Or feel free to schedule this next meeting. |
I made a meeting for tomorrow. Hopefully 30 minutes is enough. before uploading files to input data I also wondered if the metadata on these files is sufficient or if additional information is needed (e.g. the script used to generate the .nc files from the raw data we're getting from WISE)? |
@wwieder we haven't always saved the scripts that create the raw data files. I do think it's always important to save some metadata that describes what was done though. And if the manipulations that were done were straightforward that someone could recreate the process later, it's not strictly necessary to save the script. In general we don't have to reconstruct these datasets, but for scientific reproducibility the instructions should be good enough that you could create it again. So if you do lots of complex manipulations that you can't describe easily it might be best to archive the script. The only other reason to archive the script is if we think we'll use it again fairly soon. That seems unlikely to me. But, you've already archived the script in your repo in github, and that's sufficient to me. It doesn't look like you do anything that is unduly complex. The link above already connects the script to this issue, so it'll be straight forward to find it again. |
In advance of our meeting and in terms of things I find important to have in filenames:
For mksurfdata we've prepended a "mksrf_" in front of our filenames. Since, almost all of the files in that directory have that prefix I don't think that's very helpful. So moving away from that would be reasonable now. |
Sorry for the delay on this, but here are 3 arctic map units, two from AK and one in Norway The other suggesting was that we just include some measure of soil organic carbon content ( |
Those all look reasonable and it is comforting to see that the are all within the ballpark of the assumed 100% organic matter density of 130 kg OM/m3. But, with the code as it exists now, just using these values won't work, unfortunately, which is why I am recommending that we switch to calculating %ORGANIC content and then using that directly, if it is possible (I'm not sure how to calculate this). Not sure what the other options are. @wwieder This probably requires a chat. |
just taking the field |
@mvertens I hope these two files have the right information. Their global attributes are likely worth bringing over into the files you're archiving. The lookup table, used for all resolutions /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup2.nc and the 30 resolution of the mapUnits |
I merged #1732 so closing this issue. |
No. It should be 100% organic matter near the surface in Arctic soils.
…On Fri, Apr 22, 2022 at 1:00 PM will wieder ***@***.***> wrote:
just taking the field ORGC*0.1 will give the units in %C
Are these values you'd expect for organic soils, @dlawrenncar
<https://github.com/dlawrenncar> ?
[image: image]
<https://user-images.githubusercontent.com/8031012/164777588-a0597c3d-f89a-4a45-bf5f-5b9a54f4aef5.png>
—
Reply to this email directly, view it on GitHub
<#1303 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVESXTSP4TUWPNZVBZTVGLZM7ANCNFSM4ZKQKIKQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@slevisconsulting this is completed on the ctsm5.2 branch now correct? |
Yes, and I had closed the issue, BUT I reopened it when I saw @dlawrenncar 's post a few lines up from here on Oct 11. |
I'm just noting that one thing we did here is to bring in the soils data in single rather than double precision. This saves space and obviously the scientific accuracy of the data is way less than even single precision. |
It would be nice to update the soils data we're using to generate the surface dataset to something from this century. This will introduce a number of answer changes to the code, but it seems worth having a discussion about what we need here.
@dlawrenncar suggested using SoilGrids data, which just released a version 2.0 of their dataset https://doi.org/10.5194/soil-2020-65. SoilGrids2.0 contains information on soil texture, OC content, pH, bulk density, coarse fragments, CEC, and soil N at 250 m resolution for 6 soil layers (0-200 cm). This high resolution data also includes uncertainty estimates! According to the data providers, v2.0 has changed significantly from previous releases of the dataset, but is currently only available at 250m resolution.
Laura Poggio and Niels Batjes at ISRIC are interested in and willing to provide a coarser resolution data product for our purposes and wondered what we wanted. I've basically told them we'd like the whole dataset, but to prioritize texture and soil C information. Is a 5km data product adequate for NWP applications, but not too unwieldy for climate simulations? Do we need 1km resolution mapping flies?
I also wondered if we should think about how to generate soil properties for the hillslope model? Does this happen in our own tool chain, or could it be generated in the mapping files from ISRIC? This is likely of secondary concern, but may be worth discussion?
The text was updated successfully, but these errors were encountered: