-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-3055 Added new GHG (green-house gas) analysis #252
base: master
Are you sure you want to change the base?
Conversation
This analysis computes the DLUC (direct land use change) GHG emissions factors for each location, based on the tree loss information for the location and the specified yield or commodity (which gives a default yield value). The emissions are calculated by a 20-year discounted formula, as described in GHGRawDataGroup.scala. We compute the emissions factors for each year in 2020-2023. The new set of GHG Analysis files are most closely modeled on the ForestChangeDiagnostic analysis files, but are simplified as much as possible. The GHGSummary file has the logic that computes the crop yield for a particular pixel location, including making use of the primary crop yield datasets or the backup yield CSV file. It also computes the emissions due to tree loss. The GHGRawDataGroup file does the emissions factor computation. Here are some of the other needed changes: - Added gross emissions datasets (already in use for some carbon flux analyses) to the Pro dataset catalog. - Added a new dataset per commodity (currently 6 commodities, will add more later) that gives the average crop yield in each 10km area. Added a generic MapspamYield layer to access these commodity datasets. - Added a CSV file with "backup" crop yields per GADM2 area, when the primary crop yield rasters don't have a value for a particular pixel. This CSV file (which is < 26 Mbytes) is broadcast to all nodes (not each task), so it is available for lookup, and avoids Spark tasks swamping the Data API with requests. The CSV file is specified by command-line option and will be placed on S3. - Added a new Feature and FeatureId "gfwpro_ext" that includes "commodity" and "yield" fields, which are needed to specify the exact crop yield or commodity grown at each location. - Changed ErrorSummaryRDD to allow passing in the featureId to the polygonalSummary code as part of kwargs. This is needed so we can pass the yield and commodity information into the GHGSummary code. We need the yield/commodity during the per-pixel analysis. - Added a GHG test. The input file for the test case (ghg.tsv) tests a variety of commodities and yields, and even the error case where a default yield for a commodity is not found at all for a location. A partial backup GADM2 yield file is includes in the test files (part_yield_spam_gadm2.csv)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpicks and suggestions, but otherwise looks good enough to merge to me. Let me know if you take any of these suggestion and I can look again.
// if there are any full window intersections, we only need to calculate | ||
// the summary for the window, and then tie it to each feature ID | ||
val fullWindowIds = fullWindowFeatures.map { case feature => feature.data}.toList | ||
//if (fullWindowIds.size >= 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it comment meant to be here still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it. Left over from a while ago.
val environmentVars = System.getenv().forEach { | ||
case (key, value) => println(s"$key = $value") | ||
// Print out environment variables (if needed for debugging) | ||
if (false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this meant to be here still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I left the code in there (with 'if (false)') in case sometime later we want to print out environment variables for debugging purposes.
val treeCoverDensity2000: TreeCoverDensityPercent2000 = TreeCoverDensityPercent2000(gridTile, kwargs) | ||
val grossEmissionsCo2eNonCo2: GrossEmissionsNonCo2Co2eBiomassSoil = GrossEmissionsNonCo2Co2eBiomassSoil(gridTile, kwargs = kwargs) | ||
val grossEmissionsCo2eCo2Only: GrossEmissionsCo2OnlyCo2BiomassSoil = GrossEmissionsCo2OnlyCo2BiomassSoil(gridTile, kwargs = kwargs) | ||
val mapspamCOCOYield: MapspamYield = MapspamYield("COCO", gridTile, kwargs = kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the whole list include multiple commodities, or is it just one commodity type per list? If the latter, it might simplify things to only pass one commodity in the tile, that's just parametrized by the commodity for that analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each location in the list can have a different commodity (or no commodity, but a yield specified). So, we need all the commodities as sources. But it is all lazy, so they are not opened or fetched for any particular window unless they are actually needed.
val efList = for (i <- minLossYear to maxLossYear) yield { | ||
val diff = i - umdTreeCoverLossYear | ||
if (diff >= 0 && diff < 20) { | ||
(i -> ((0.0975 - diff * 0.005) * emissionsCo2e) / (cropYield * totalArea)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick since we use magic numbers elsewhere, but named constants would be good here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, replaced them with constants. Thanks for all the comments!
val gadmAdm1: Integer = raster.tile.gadmAdm1.getData(col, row) | ||
val gadmAdm2: Integer = raster.tile.gadmAdm2.getData(col, row) | ||
val gadmId: String = s"$gadmAdm0.$gadmAdm1.${gadmAdm2}_1" | ||
//println(s"Empty ${featureId.commodity} default yield, checking gadm yield for $gadmId") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
val gadmAdm2: Integer = raster.tile.gadmAdm2.getData(col, row) | ||
val gadmId: String = s"$gadmAdm0.$gadmAdm1.${gadmAdm2}_1" | ||
//println(s"Empty ${featureId.commodity} default yield, checking gadm yield for $gadmId") | ||
val backupArray = kwargs("backupYield").asInstanceOf[Broadcast[Array[Row]]].value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, I'd call this "backupYieldArray" consistently since "backup" can mean a lot of things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, done!
|
||
val groupKey = GHGRawDataGroup(umdTreeCoverLossYear, cropYield) | ||
|
||
// if (umdTreeCoverLossYear > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
case feature => | ||
getSummaryForGeom(List(feature.data), feature.geom) | ||
} | ||
if (kwargs.get("includeFeatureId").isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would be messy, but if you wanted to keep the optimization and not split the logic, theoretically every tile should produce the same results per commodity unless the user specifies the yield manually. In which case, couldn't you just apply the yield constant to your summary result per feature after doing runPolygonalSummary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is analysis-independent code, so I would not want to put anything in here that is specific to yield/commodity/GHG, etc.
Since GHG specifies a commodity or yield, it must be done on specific farms with specific crops. So, it will not be called on very large areas (which would not all be one farm with one kind of crop). So, the full window optimization would never actually be applicable for GHG anyway.
So, I don't think it would be worth trying to get this optimization to work in a general way in the case the featureId is passed down, since it would never be used in the one case (GHG) that we have.
GTC-3055 Added new GHG (green-house gas) analysis
This analysis computes the DLUC (direct land use change) GHG emissions factors for each location, based on the tree loss information for the location and the specified yield or commodity (which gives a default yield value). The emissions are calculated by a 20-year discounted formula, as described in GHGRawDataGroup.scala. We compute the emissions factors for each year in 2020-2023.
The new set of GHG Analysis files are most closely modeled on the ForestChangeDiagnostic analysis files, but are simplified as much as possible. The GHGSummary file has the logic that computes the crop yield for a particular pixel location, including making use of the primary crop yield datasets or the backup yield CSV file. It also computes the emissions due to tree loss. The GHGRawDataGroup file does the emissions factor computation.
Here are some of the other needed changes: