-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite #30
Comments
A good test suite is something that we are sorely lacking. I'm not too worried about which framework we pick. I've been thinking about this, too, and here are some of my thoughts: One thing that is important (and essentially the basis of any good testing we would do here) is a library of tools that let us make assertions about the spreadsheets, presumably going around POI and operating on the ZIP/XML directly (because it will be more robust). One important assertion would be that workbook is a correct workbook. In general that means that all the linkages are right and there are no orphans and that all formulas point to real things (both on worksheets and on charts). Over time, we can add more checks around structural correctness within the elements. It's probably worth adding POI to this as a separate check of its own, where we open the worksheet with POI and walk the elements, making sure that POI doesn't explode along the way. Other assertions would be about the size of a worksheet, the contents of a block of cells, and the value of formulae (both on worksheets and chart references). Like #28, a POI-independent formula parser would be helpful here, though we certainly can get started without it. |
I agree wholeheartedly with those points, and based on the bugs and human errors that I've encountered so far I think that the quality and sources of the workbooks is going to be a critical part of having good and useful tests. The other interesting side of a testing suite is that we may be able to use it as a starting point to tackle #28 by starting with POI based tests and then transitioning them over to direct manipulation and verification of the ZIP/XML as we figure out how to deal with the different components. As a starting point, here's a way that may work for testing using POI and the current capabilities of the library. Can I read in an excel file with known values, reproduce the file exactly to a temporary location, and read back and verify those same values? This could be extended so that the base testing file (potentially the same file created in different environments and different versions of Excel and Excel alternatives) can cover multiple scenarios (Cell A1 should = 1, Cell A3 should be a formula with a certain value, etc). In this manner, we could potentially extend that one excel file (or create many for different tests) with one feature at a time to build up the tests over time. I'm a little uncomfortable with creating a bunch of temp files as part of the tests, but I don't really see a good way around it when using POI. I would ideally like to build some of the validation into a part of the library that can be used in production - I have some templates where I would like to have the formulas and results of test runs validated before running reports on an ongoing basis to make sure I don't break reports over time. |
Yeah, basically the right points. I think that we can skip the POI-based stuff because implementing the first step straight from the XML is not hard with (a) the tools clojure gives you and (b) the understanding I've gotten about how Excel works inside from working on this. As a start, try running this script on your excel file and then poke around the resulting file tree: https://gist.github.com/tomfaulhaber/4f1db38af6e57ac648e9. I think you'll see that the cells on the worksheets are really easy to find. Using zippers and data.xml I think it's pretty easy to write a library that we can build assertions on. @smahood, are you coming up to ClojureWest this year? If so, we could do a mini-hackathon to get things started. |
@tomfaulhaber Not able to come this year, that would have been a great way to get this started. Any good references to start looking into the internals of Excel? I'd love to get something started, either on the testing side or the non-POI rewrite. |
Hey @smahood, sorry to hear that you won't be in Seattle. We'll try to have fun without you :) Official documentationThe bible of Office XML is the ECMA 376 which has a home page here and the key document is the ECMA-376, 4th Edition, Office Open XML File Formats — Fundamentals and Markup Language Reference which is literally 5000 pages long. Don't worry, though, I haven't really found much need for that, although it is comforting to know I have it available if I need it. (Just now, I discovered that the complete formula grammar is in there starting on page 2043. That might come in handy.) Unbundling an existing spreadsheetIf I were you, the first thing I would do is use the script I mentioned above and blow up one of your interesting excel workbooks (and also maybe the one in excel-templates-example). This turns the excel file into a directory tree and prettifies all the XML so that it's pretty easy to navigate. It also creates a ExploringStart by looking at the layout of the worksheets themselves and see what's going on. You will see rows, cells, values and formulas in there. If you look at a sheet like <c r="Q5" s="5">
<f>N5-H5</f>
<v>0</v>
</c> The The other thing to think about is the linkage between files. For us, the most important linkage is the worksheet -> drawing -> chart relationship. Relationships are specified by an ID, like <drawing r:id="rId1"/> The target of the relationship is specified by a separate file. If you are looking at <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/drawing" Target="../drawings/drawing1.xml"/>
</Relationships> For the case of charts, you need to be aware that sheets have drawings and drawings have the charts. So you need to follow that linkage two steps (not too difficult once you know what you're looking for). Existing XML codeAs I mention elsewhere, I have long ago started down the path of manipulating the XML directly because POI was either not working right or was just in the way. The two cases are:
These two areas will give you some feeling for how I've dealt with the XML issues so far. I think building a capability to do some simple validation and check assertions based on the file structure will not be too hard, despite the extraordinary length of this comment. Let me know if and how I can help you along the way. |
I've been thinking about building a test suite to help validate that things are working for issue #26 (and now it would probably help with issue #28 as well). Have you got any preference as to testing library or example projects that I should follow as far as the test styles?
I'm planning on building some tests anyway to help verify that some of my more critical reports are working over the long term, so if the tests I write don't have an eventual home in excel-templates itself they will still serve a useful purpose for me.
The text was updated successfully, but these errors were encountered: