refactor(datafile): use len(obj) rather than obj.get_nrecords() #2215

mwtoews · 2024-06-07T12:00:25Z

This PR has a few aims related to data files, including FormattedHeadFile, HeadFile and CellBudgetFile.

These files have a "number of records" property that was implemented with get_nrecords(). This "length of the object" measure is more naturally done with __len__, i.e. len(headsfile).

It is advised to prefer the len() approach, so instances of get_nrecords() for these files show a DeprecationWarning.

The CellBudgetFile also has a .nrecords property. It is also advised to show a DeprecationWarning with this property.

This PR also fixes a bug with get_nrecords() shown here:

flopy/flopy/utils/datafile.py

Lines 427 to 430 in ea3e475

    
           def get_nrecords(self): 
        
               if isinstance(self.recordarray, np.recarray): 
        
                   return self.recordarray.shape[0] 
        
               return 0

the bug is that recordarray is a structured array (np.ndarray), not a record array, so this always silently returns 0. This bug does not apply to CellBudgetFile, which worked fine and matches obj.nrecords.

Fixing this bug caused this test to fail, since the for-loop was never activated. A "todo" note is added since the reversed header is not the same as the original header.

Another aim of this PR is to re-organize a few CellBudgetFile tests from test_binaryfile.py to test_cellbudgetfile.py. Most of this is copied with perhaps minor simplifications.

Note that none of these changes apply to flopy.utils.swroutputfile.SwrBudget.get_nrecords(), which returns a tuple.

There is also still room for discussion if get_nrecords() or a dynamic .nrecords property should be preferred over len(obj). This PR can be adjusted accordingly. Opinions welcome!

langevin-usgs · 2024-06-08T12:52:40Z

I'm in favor of len(obj). Sadly, len doesn't correspond to the number of times the data is available, but rather the number of records, which are written by layer with DIS/DISV. Whatever we can do to make our objects as pythonic as possible seems to be a step in the right direction. The recordarray misnomer is something that has been propagated for years due to a misunderstanding on our (my) part regarding the differences between structured arrays and record arrays -- something to keep in mind with a needed refactoring.

wpbonelli · 2024-06-10T15:01:14Z

Sadly, len doesn't correspond to the number of times the data is available, but rather the number of records, which are written by layer with DIS/DISV.

Agreed len() suggests to me the number of rows in tabular data, as in numpy and pandas. A couple thoughts on this (hopefully not too off-base — maybe I have misunderstood something)

Flopy's approach to model outputs is to provide classes wrapping MODFLOW/etc output files, which may contain several tables of different shapes. This could be seen as a leaky abstraction. Does flopy really need to reproduce MODFLOW output file structure in user-facing APIs?
Flopy uses "record" to mean "table of some kind of data", where it commonly means a single row of tabular data, so "number of records"/"length of the object" really means "number of separate tables in this output file"

Maybe a future flopy could consider static routines to read model outputs e.g.

flopy.read_binary_file("path/to/file.cbc", data="FLOW RIGHT FACE")

File classes could live on under the hood, e.g. for indexing/caching, precision detection, etc, but it seems ideal to try to hide this and return one table at a time? The user would still need to know which sort of data can be found in which output file, but even that seems like it could be hidden with suitable shortcuts on model.output.

In the meantime I like len(obj) as sugar for get_nrecords() though I wonder if there is some way to emphasize this is not a row (i.e. time) count

mwtoews · 2024-06-11T01:33:13Z

Reply to a few points of discussion...

Agreed len() suggests to me the number of rows in tabular data, as in numpy and pandas.

Xref to #2221 which adds a .headers data frame with this number of headers.

Maybe a future flopy could consider static routines to read model outputs e.g.
flopy.read_binary_file("path/to/file.cbc", data="FLOW RIGHT FACE")

With CellBudget files, a somewhat expensive component is to build an index of headers, which is ideally done once, then re-used to get data from more than one component. This process wouldn't work well with static routines.

In the meantime I like len(obj) as sugar for get_nrecords() though I wonder if there is some way to emphasize this is not a row (i.e. time) count

I was considering keeping/adding obj.nrecords to be the same as len(obj), but the docstrings for __len__ should be sufficient without adding another property.

wpbonelli · 2024-06-11T18:26:00Z

@mwtoews can you resolve the datafile.py conflict? then we can bring this in

refactor(datafile): use len(obj) rather than obj.get_nrecords()

2e97b8e

mwtoews force-pushed the fix-nrecords-to-len branch from 5025802 to 2e97b8e Compare June 11, 2024 21:31

wpbonelli merged commit e2d16df into modflowpy:develop Jun 11, 2024
24 checks passed

mwtoews deleted the fix-nrecords-to-len branch June 11, 2024 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(datafile): use len(obj) rather than obj.get_nrecords() #2215

refactor(datafile): use len(obj) rather than obj.get_nrecords() #2215

mwtoews commented Jun 7, 2024 •

edited

Loading

langevin-usgs commented Jun 8, 2024

wpbonelli commented Jun 10, 2024

mwtoews commented Jun 11, 2024

wpbonelli commented Jun 11, 2024

	def get_nrecords(self):
	if isinstance(self.recordarray, np.recarray):
	return self.recordarray.shape[0]
	return 0

refactor(datafile): use len(obj) rather than obj.get_nrecords() #2215

refactor(datafile): use len(obj) rather than obj.get_nrecords() #2215

Conversation

mwtoews commented Jun 7, 2024 • edited Loading

langevin-usgs commented Jun 8, 2024

wpbonelli commented Jun 10, 2024

mwtoews commented Jun 11, 2024

wpbonelli commented Jun 11, 2024

mwtoews commented Jun 7, 2024 •

edited

Loading