Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need general results/reporting ability #13

Open
samhatchett opened this issue Jul 28, 2016 · 27 comments
Open

Need general results/reporting ability #13

samhatchett opened this issue Jul 28, 2016 · 27 comments

Comments

@samhatchett
Copy link
Member

I think I share this need with @ttaxon - we would like to be able to reproduce the epanet-2 style binary file (which contains results from each hydraulic step, not just reporting steps). But we should do so in a way that also allows flexibility in directing the output - for instance: sending results to an SQL/SQLite database, CSV, Python Pickle, etc.

Specific "industry" use might be saving results to a format easily read by existing GIS applications. I know GCWW has expressed interest in this sort of capability. Tom mentioned wanting to test a new water quality algo with the legacy binary format (but results from the E3 solver).

Suggest we use a Delegate pattern to accomplish this. In basic form, we create an interface class with a set of methods that get called whenever something important happens in the simulation:

class SimulationReportingDelegate {
  public:
    SimulationReportingDelegate(Project& p);

    virtual void willInitSolver();
    virtual void didInitSolver();

    virtual void willRunHydSolver(const int time);
    virtual void didIterateHydSolver(const int time);
    virtual void didConvergeHyd(const int time);
    virtual void didRunHydSolver(const int time);

    virtual void willAdvanceSolver(const int time);
    virtual void didStepQual(const int time);
    virtual void didSolveQual(const int time);
    virtual void didAdvanceSolver(const int time);

  protected:
    Project *_project; 
};

So the Project and Solvers just notify the Delegate whenever they have done something, and the specific implementation takes it from there. This allows a high degree of modularity when using the engine, and even will help in making the output code asynchronous. You could imagine these callbacks being essentially non-blocking, allowing the simulation to proceed while the results are being written to a file or database on a background thread.

@LRossman
Copy link
Collaborator

I agree with the concept. Right now there are placeholders in the code where the old Epanet 2 binary hydraulics file is referenced but hasn't been implemented. We should allow users to deploy a more flexible alternative for saving the kinds of results they want at whatever time periods they want.

Can't we do this more simply using an abstract class, say AuxOutputFile, that defines an interface that specific instances must implement. It has the open, init, write, read, and close methods that the current OutputFile class has, except declared as virtual. A specific type of AuxOutputFile gets created with a Project::createAuxOutputFile(string auxFileType) and the virtual methods get called in the same places that the OutputFile functions get called. (The write method gets called with each invocation of Project::runSolver). The specific implementation of an AuxOutputFile determines what gets saved to where.

@samhatchett
Copy link
Member Author

Probably the difference between an abstract OutputFile class and using delegation is pretty subtle. Delegation is a little more generic - just allows the user's code to gather information or intercede in numerous ways at various points in the process. Useful for many tasks that I'm sure haven't been imagined yet. Something like an AuxOutputFile would be more constrained, although the existing OutputFile class does have access to the Network pointer, so it could be a back-door to delegation. Maybe do both? Makes sense to allow client callbacks at various places, not just for writing results.

Another way to think about this could involve a plugin scheme. Provide a way to write separate DLLs for Input and Output, and tell the application which one to use. GDAL has a system that I'm looking closely at: http://www.gdal.org/gdal_drivertut.html

@samhatchett
Copy link
Member Author

I've put a little more thought into this - and the difference is philosophical and semantic, but important. Variable names matter, class and method names matter. I resist the idea of the Project assuming that results are being "written" somewhere. Writing, viewing, analyzing, or ignoring results should totally be the concern of the calling code - and so there's an important distinction between the Project telling a "writer" to "write", versus telling a user-supplied delegate that something important just happened. Delegation is less semantically constrained, and an indication that what the engine-user does at various points along the simulation is not prescribed.

Now, if the "client" in this case is the CLI that is purpose-built to read an inp and produce a binary output file, then that is what the CLI tool's delegate will be designed for, in concert with an appropriate API into the binary file I/O code. But there are many uses of the engine for which "writing" is not what is desired - perhaps the client only wants to intercede in a simulation under certain conditions or gather statistics, and no actual hydraulic results should be stored.

@jamesuber
Copy link
Member

For what it's worth I think this philosophy that Sam is presenting just makes sense. Here at the WDSA symposium in Cartagena, and before in other venues, I recall discussions with folks using epanet who routinely and casually talk of their changes to the code or API extensions they employed to get done what they wanted. Often these relate to getting data into or out of the simulation in ways that are convenient or efficient. Epanet is just way too constrained in this regard and I think many should welcome a broad change in architecture and design philosophy that values those things just as much, if not more, as solver efficiency.

I'd ask the dev committee and others to exert leadership in this area, because many of us do not yet know enough about the alternatives and so can't yet know how much more efficient things can become. As an example during our meeting on the open source project at WDSA the "architecture" discussion often reduced to arcane discussions of sparse matrix solution efficiency and the limits of parallelization. That's interesting for sure, but hardly the scope of discussion needed for getting to a point where epanet is fully supporting a wide range of uses. On the other hand these discussions of design philosophy are exactly that, so hopefully there will continue to be more of them leading up to a complete statement of design philosophy.

Jim Uber • 513-368-6303 • www.citilogics.com

On Jul 29, 2016, at 10:07 AM, Sam Hatchett notifications@github.com wrote:

I've put a little more thought into this - and the difference is philosophical and semantic, but important. Variable names matter, class and method names matter. I resist the idea of the Project assuming that results are being "written" somewhere. Writing, viewing, analyzing, or ignoring results should totally be the concern of the calling code - and so there's an important distinction between the Project telling a "writer" to "write", versus telling a user-supplied delegate that something important just happened. Delegation is less semantically constrained, and an indication that what the engine-user does at various points along the simulation is not prescribed.

Now, if the "client" in this case is the CLI that is purpose-built to read an inp and produce a binary output file, then that is what the CLI tool's delegate will be designed for, in concert with an appropriate API into the binary file I/O code. But there are many uses of the engine for which "writing" is not what is desired - perhaps the client only wants to intercede in a simulation under certain conditions or gather statistics, and no actual hydraulic results should be stored.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@michaeltryby
Copy link

Are we conflating two issues here or maybe I am confused? 1) There is a need to implement a binary hydraulics file for use when performing multiple water quality simulations when the hydraulics don't change. For example, the analysis performed by TEVA SPOT. 2) There is a need to support more flexible access to model output. Results are essentially locked away in the current binary output format.

Because the binary hydraulics file is primarily used by the water quality transport algorithm I would argue that it's format is less important. I see the need to provide flexibility in model output format as a higher priority. Something I have tried to address with the creation of the Output APIs for SWMM and EPANET. Thinking about it though, I don't think the Output API's go far enough.

I agree with @samhatchett and @jamesuber these codes are being used in lots of ways we haven't fully anticipated. For example, not to long ago I got a question about reading and modifying values in SWMM hotstart files. This didn't make any sense until I ran across a paper describing how this is really useful when doing data assimilation.

So just to be clear are we saying that we need a flexible format for every single type of file that these engines create or just the model results output?

@samhatchett
Copy link
Member Author

I think what I'm moving toward is that the "engine" should not be creating files at all. We can build up a useful collection of libraries that create files of different formats for storing/loading the results of computation, and those should include some recognizable "default" format(s). We can offer a CLI or GUI that link those I/O libraries together with the engine, but these should be separate concerns entirely; saving and loading results is just one of many use cases for the engine.

@LRossman
Copy link
Collaborator

@samhatchett said " Writing, viewing, analyzing, or ignoring results should totally be the concern of the calling code". That's exactly how the Project class currently works. It only carries out the functions that the user, through the API encapsulated in epanet3.cpp, tells it to. Users do not have direct access to the Project class or any other class it references.

If the user has no need to save results to the binary output file then they don't call EN_openOutput. If they want to save results after each time step in some particular format (or accumulate values to compute statistics) they can write their own code to do so, immediately after each call to EN_runSolver, using the EN_getxxx functions to retrieve the computed results. This is no different than the way the EPANET 2 toolkit worked.

My take on what Sam is proposing is that we define abstract FileLoader and FileSaver classes that utilize a standard interface (or maybe NetworkLoader/OutputSaver if File is too constricting). There would be a complementary set of functions in epanet3.cpp and in the Project class that call these functions. One of these functions would be a factory method that creates a specific type of FileLoader or FileSaver that has its own internals for reading/saving simulation results, which may even involve making calls to an external library.

This is the same paradigm we use for the various sub-models and solvers in the current code. Want to use the CHOLMOD sparse matrix solver instead of the SPARSPAK solver - then simply add a CholmodSolver class that inherits from MatrixSolver, write methods that implement the MatrixSolver interface (e.g., setDiag(), setRHS(), solve() etc.) and link in the compiled CHOLMOD library. Easy peasy.

@LRossman LRossman reopened this Jul 29, 2016
@LRossman
Copy link
Collaborator

Sorry - i hit the wrong button.

@michaeltryby
Copy link

I think @samhatchett is describing something much different. Something much more decentralized. I think Sam's perspective is quite different than mine our @LRossman after having developed RTX using the EPANET 2 toolkit. And as a result developing a keen sense for it's limitations.

@samhatchett
Copy link
Member Author

I would propose that we remove all loading/saving references and functionality from Project completely. Loading and saving are application-level concerns and not engine-level concerns. Loading and saving should occur at one level up.

@LRossman
Copy link
Collaborator

Now you lost me. If there's no loading of project data how does the engine get populated with the data it needs to work with?

@samhatchett
Copy link
Member Author

There would be a loading of project data, it's just that the Project class shouldn't be involved. That part is really more related to #12 but these issues are connected. Here is a hastily-drawn sequence diagram. Notice that the engine will never call into the loading or saving libraries, although those will call into the Project class to do their jobs.

screen shot 2016-07-29 at 3 42 21 pm

@LRossman
Copy link
Collaborator

OK, so there are two ways to view what you're saying:

  1. The loading library has full access to the class structure of Project and can make calls like
    aProject->network->addElement(NODE, JUNCTION, aNodeID);
    aProject->network->node(aNodeID)->elev = aValue;
  2. The loading library calls the current toolkit functions:
    EN_createNode(aNodeID, EN_JUNCTION, aProject);
    EN_getNodeIndex(aNodeID, &index, aProject):
    EN_setNodeValue(index, EN_ELEVATION, aValue, aProject);
    It seems that the current toolkit can do what you want without exposing the internals of Project to the loading library application.

@samhatchett
Copy link
Member Author

absolutely number 1. Except the internals remain private; the loading library has access to the public interfaces of the Project and Network classes - this is the purpose of object orientation.

@LRossman
Copy link
Collaborator

But then all applications would have to be written in C++ which I thought we were trying to avoid (for reasons that deserve its own topic). Also, the toolkit API is the public interface for Project.

@samhatchett
Copy link
Member Author

Not quite. It's totally possible and defensible to bundle components and expose a C-API to accomplish a "toolkit"-like paradigm that would load files, save binary outputs, even create networks etc. I would argue that a C-API should be a special case that calls into the native C++ class methods for the benefit of those who must use C. It would just be such a shame to limit all users of the library to flat C functions, when it's beautiful C++ under the hood. And there are great C++ wrapper generators for all sorts of languages.

I should add that technically speaking, the public interface for Project is:

class Project {
      public:
        Project();
        ~Project();
        int   load(const char* fname);
        int   save(const char* fname);
        void  clear();
        int   initSolver(bool initFlows);
        int   runSolver(int* t);
        int   advanceSolver(int* dt);
        int   openOutput(const char* fname);
        int   saveOutput();
        int   openReport(const char* fname);
        void  writeSummary();
        void  writeResults(int t);
        int   writeReport();
        void  writeMsg(const std::string& msg);
        void  writeMsgLog(std::ostream& out);
        void  writeMsgLog();
        Network* getNetwork() { return &network; }
...

These methods are accessible to users of the Project class - not just callers internal to the library.

@LRossman
Copy link
Collaborator

I disagree. When the engine is compiled to a library only the functions in the epanet3.h header are accessible to users. If you want to expose the internals of Project and Network you would have to distribute additional header files and might need to do a lot of code restructuring to make sure you have a well designed and encapsulated library.

@samhatchett
Copy link
Member Author

I almost entirely agree with you. The way the build is currently configured limits access to the C++ headers. This is fairly uncommon for C++ projects. Distributing multiple headers allows users to select what they need access to. An example: Microsoft has a HTTP-REST client/server library that i've used recently - look at all the headers! https://github.com/Microsoft/cpprestsdk/tree/master/Release/include/cpprest
All of those files get distributed to users of the library (this is completely different from users of software which depends on the library).

By distributing headers, you are not exposing internals - you are exposing the public interface which you intend users of the library to call. This really is the OO paradigm - to have well designed, encapsulated classes with usable public interfaces.

@LRossman
Copy link
Collaborator

Sorry, I should have said "access" instead of "expose". But there would still have to be a lot of redesign needed. The various classes in the current development version were not written with the idea that they be accessible to users of the API. If the goal is to create a pure C++ class library then maybe the code base I donated is not the way to go and a "start over" is in order.

@michaeltryby
Copy link

@LRossman I beg to differ. I think the code you have contributed is as good a staring point as any for such a library.

@jamesuber
Copy link
Member

Yes for real. Epanet 3, when it is released, will be a community effort. Lew, these sorts of discussions need to be celebrated. I only wish we had been having them for the last year, or maybe three.

Jim Uber • 513-368-6303 • www.citilogics.com

On Jul 29, 2016, at 4:38 PM, Michael Tryby notifications@github.com wrote:

@LRossman I beg to differ. I think the code you have contributed is as good a staring point as any for such a library.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@samhatchett
Copy link
Member Author

Celebrate indeed! These discussions really help to get to the foundation level. My primary interest in this project is being able to use the code as a class library in my commercial software because I think it has the potential to be more elegant, maintainable, and extensible than v2 for the way that we want to embed it. Of course I want many more things from this project - people want to do important research and enforcement work with this kind of software and I think we can make a real difference in those regards. So I'm donating my company's time to debate software architecture 😉

I see now that functioning as a class library was not an original design goal - but I would say that it would be my primary design goal, even if it were the case that this code offered no algorithmic improvements over v2. I suppose I saw a bunch of class declarations with well-described interfaces and jumped to some conclusions. However, I think that we are on a good trajectory and change is the name of the game; if it was certain that the code was done and suitable for everyone's needs, then GitHub would not the place to stash it.

I'm still waiting for others to join the discussion to get a feel for other peoples' wish lists, but in the absence of other voices I'm going to be advocating for my own needs. There are 11 other people getting emails on this thread (unless they've muted us already!) with more likely on the way - please speak up if you've got opinions!

@eladsal
Copy link
Member

eladsal commented Jul 30, 2016

Hi All,

So many things are coming up in this tread. I'm trying for a minute to look from a user stand point looking from the outside of the dll. I think we would want one dll for all users (I know we have different compiles for Windows, Mac and Linux). As a VB programmer I'd like this dll to support my needs and other developers in different languages as well (by the way the current dll is late binding so this is something else we want to think about). Now, the dll should have the option to load an INP file, run the simulation and save a report file. It should also allow to programmatically build the network (as per #12 , run the simulation and retrieve the results. I'm not the expert but I would guess that these needs reflects on how the dll is built from the inside, doesn't it?

Elad

@samhatchett
Copy link
Member Author

@eladsal thanks - i completely agree it's important to keep the C api for many users, to give pre-bundled access to the features you mentioned. Maybe call this the "C Toolkit", which would call on the various I/O and engine libraries to accomplish the "customary" workflow. I'm trying to plot a course that will isolate the "naked core" engine so that I can use it in RTX and other softwares without the extra components that assume a particular I/O paradigm or workflow.

@mariocastrogama
Copy link
Contributor

@ALL, during the WDSA2016 meeting mentioned that we have used in the past standard output files from OGC's WaterML and SensorML.

The advantage is that there are parsers allowing flexibility for the users to communicate with the files and translate into DB's, across servers or for visualization (not about numerical engine topic, I know!).

The first disadvantage of these data files is that are KML (heavy). The second disadvantage is that they do not contain elements such as pipes, nodes, valves ... but these can be added.

Another possibility can be to use NetCDF (also binary with libraries ready and parsers from most languages), although it is an old format, it is still in use for its ability to manage long time series (e.g. meteorology, hydrology, oceanography) and will allow users to include metadata in their result files.

@michaeltryby
Copy link

@mariocastrogama I agree there are a lot of promising formats out there. Lately, I have been looking at HDF5. I think the takeaway point is that we should provide the flexibility to support multiple formats and allow the user to select which format works best for them.

@samhatchett samhatchett mentioned this issue Aug 28, 2016
@goanpeca
Copy link

goanpeca commented Aug 28, 2016

Having an already OO approach to a library (Qt, Open FOAM, come to mind) really bring a lot of advantages that have been probably discussed already, and having a C interface would make it easier to write bindings for different languages (Python is my main concern as I am a heavy Python user).

Having the library be structured in a OO way on C++ (if planned ...) would seamlessly translate to Python (but I guess there are some limitations of SWIG, as big projects have their own wrapper libraries... Qt, PyQt->sip, PySide->Shiboken). Even if the C/C++ code was not structured in such a way, I would aim to write a Pythonic interface library to it, that is, I would write the OO abstraction at the wrapper level on Python (OOPNET comes to mind). This would require the C api to provide enough methods, even if it is only functional.


@samhatchett told me @eladsal was in touch with the OOPNET guys, is there any plan on their side to open up their code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants