-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessor and executable specifications #823
Comments
Couple questions... For (A), if we were willing to rewrite programs so that multiple statements never appear on the same line, would the problems then be completely solved? (I guess this is asking: Can the inject algorithm really be modified to work only with lines?) Do you have a sense which of (B) and (C) is easier to implement? For (D), what does "manipulates tokens appropriately" mean? |
On further thought and inspection, here are some more thoughts:
int x = 0;
int y = 1;
#define CMP x < y
if (CMP) { return 1; } else { return 2; } The issue here is that if we wanted to rewrite
@kmemarian do you have some thoughts on what might be a good way to proceed? |
Personally I can live with wrong column information in error messages. There is a danger here of letting the best be the enemy of the good... |
Agreed. I think I am leaning toward option (B), but I'll setup a meeting with the other developers to see what they think, and if there's some other options. |
Miscellaneous comments:
|
There are some unfortunate interactions between the C pre-processor and CN's testing infrastructure (e.g., see #393, #654).
This ticket aims to describe the underlying problem and some potential design choices that we may want to consider.
Status Quo
The following diagram shows the steps that CN takes to produce executable specifications:
Here
AIL
is a data structure, which is an abstract representation of C code, and thechecks
is a bunch of C code (currently represented as a mixture ofAIL
and raw strings) that needs to be added to the original program to validate user assertions. The checks are added to the original program by theinject
algorithm.Problems and Considerations
Incorrect Column Numbers Note that because the
parser
runs after the pre-processor has been applied, the location information we have is not quite right. We are using an external pre-processor, which annotates its output with file+line information, however column information is lost, so the locations that are stored in the AIL have correct collumns forinput.i
but NOT forinput.c
The effect of this is two-fold:
inject
algorithm gets confused, resulting in incorrectouput.c
Multiple Files Because
input.c
may contain#include
directives, when we perform injection we may have to perform injection in multiple files. We do have some code that attempts to do this, but it doesn't seem to work quite correctly. This is a tricky problem, because generally we don't want to modify the original files, but rather a copy of the files. We then have to ensure that those modified files are correctly included inoutput.c
when building it, but we have no easy way to modify the#include
s, so we have to achieve it only by changing the flags to the pre-processor. In principle this will never be fully correct, for example, if a file#included
and absolute path, we can't change that. Fortunately, this is not something that usually happens in normal programs.Potential Solutions
Here are a few possibilities we may want to consider:
(A) Modify the
inejct
algorithm to be line-based and not consider columns. This will only work if the injections can always go on new lines, which---in general---is not the case. Consider, for example, a program that has multiple statements on the same line: we may have to inject stuff in-between those statements.(B) Modify the
inject
algorithm to inject ininput.i
(i.e., the pre-processed filed) rather than the original C source. To make this work, we'd need to modify theparser
to keep track of 2 locations per construct: the location in the original C file, which we'd use for error reporting, and the location in the actual pre-processed file, which we'd use for injection. This should solve the issue of incorrect injections and having to deal with#include
. We would still report incorrect columns error messages, however. Also, if someone wanted to debug the generated code, they'd be looking at the pre-processed file rather than the original code.(C) Modify the
inject
algorithm to inject directly in the AIL AST and then pretty-print that. This has similar trade-offs to option (B) but it wouldn't need to track 2 source locations for things. The trade off is that the generated code would be not only pre-processed but also somewhat desugared and miss things like comments, for example. It is unclear if the resulting code would be better or worse than option (B), however, because pretty printing the code would at least result in nice regular structure, while injecting directly into the text often leads to code that is not properly indented and is quite hard to read. We may also be able to preserve some of the comments in the source if we added a "comment" constructor to AIL, although all consumers of AIL would then have to deal with it...(D) Write our own pre-processor, which manipulates tokens appropriately so that we have correct column information. This seems to be the only option that would report correct column information. In theory, this should also make it possible to do injection on non-pre-processed source file. We would still have to deal with the issues of
#include
s, but if we had our own pre-processor, we might be able to also use it to modify the original source program to just change the#include
in theoutput.c
. One other benefit to having a custom pre-processor is that we could apply it to the CN specifications (the magic comments) to expand constants too (although this has its own issues, for examples numbers in C and numbers in CN do not use the same notation at the moment, so that feature won't be so useful without some further changes to CN, which we may want to do anyway).The text was updated successfully, but these errors were encountered: