Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found GLM file that has a higher number of events then current dytpe (int16) can hold for flash_np['n_points'] in io/mimic_lma.py #102

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

LeviDPC
Copy link

@LeviDPC LeviDPC commented Jan 13, 2025

There exists a GLM file (Goes18 01/12/2025 - 06:33:00) where the number of events (33601) is larger then a signed int16 (as n_points is set to here) can handle (max of 32,767).

LeviDPC$ ncdump OR_GLM-L2-LCFA_G18_s20250120633000_e20250120633200_c20250120633215.nc | head -n 8

netcdf OR_GLM-L2-LCFA_G18_s20250120633000_e20250120633200_c20250120633215 {
dimensions:
	number_of_flashes = UNLIMITED ; // (325 currently)
	number_of_groups = UNLIMITED ; // (29813 currently)
	number_of_events = UNLIMITED ; // (33601 currently)
	number_of_time_bounds = 2 ;
	number_of_field_of_view_bounds = 2 ;
	number_of_wavelength_bounds = 2 ;

When I try to grid that file I get this error:

  File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/grid/make_grids.py", line 838, in grid_GLM_flashes
    outputs = list(map(this_proc_each_grid, subgrids))
  File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/grid/make_grids.py", line 907, in proc_each_grid
    gridder.process_flashes(glm, **process_flash_kwargs_ij)
  File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/grid/make_grids.py", line 397, in process_flashes
    read_flashes(glm, self.framer, base_date=self.t_ref,
  File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 111, in read_flashes
    results = list(map(chunk_func,flash_chunks))
  File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 215, in fast_fixed_grid_read_chunk
    fake_lma = mimic_lma_dataset_lut(flash_data, base_date,
  File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 970, in mimic_lma_dataset_lut
    gr_flashes = _fake_lma_from_glm_lutgroups(flash_data, basedate)
  File "/data/users/lpfantz/gridded-glm/v1.1-testing/rc4/cspp-geo-gridded-glm-1.1/libexec/python_runtime/lib/python3.9/site-packages/glmtools/io/mimic_lma.py", line 937, in _fake_lma_from_glm_lutgroups
    flash_np['n_points'] = flash_data.number_of_events.shape[0]
OverflowError: Python integer 33601 out of bounds for int16

My commit fixes the issue by changing it to an unsigned int which should be good up to 65,535

To fix this specific issue I only needed to update n_events dtype for [flash_dtype_fixedgridlut] (https://github.com/deeplycloudy/glmtools/blob/master/glmtools/io/mimic_lma.py#L609) but I thought it would make sense to do it for n_events dtype for flash_dtype as well.

I'm happy to change that or switch the dytpe to signed int32 (if it needs to be signed).

@djhoese
Copy link
Contributor

djhoese commented Jan 14, 2025

One thing that may need to be considered with this is if these files go directly to AWIPS or to any other Java software as Java doesn't have unsigned types. I have no say in this library or the CSPP downstream project using it, but Java limitations were the first thing that came to mind when I saw this PR.

@graemely
Copy link

Does it actually affect a datatype in an output file, or is it just an internal counter? If the latter then that should eliminate the Java concern.

Having said that, I would suggest promotion to int32, to head off the same problem happening when there's a case that exceeds 65,535 counts.

@djhoese
Copy link
Contributor

djhoese commented Jan 28, 2025

Good point. They'd probably be translated to something like another AWIPS-compatible NetCDF file. I do know sometimes data file support is added directly to AWIPS, but maybe GLM is too complicated for that to have been done already (ex. groups and flashes), hence the CSPP/Satpy conversion steps.

@LeviDPC
Copy link
Author

LeviDPC commented Jan 28, 2025

Thanks for the input all, (And I didn't realize that Java didn't have unsigned types) I have gone ahead and updated the source branch for this pull request to be signed int32. The branch name is no longer accurate but I guess thats a lesson for me about branch names going forward 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants