Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of function and label names with non alpha-numeric characters #201

Open
KMR1-1 opened this issue Jun 20, 2024 · 11 comments
Open

Handling of function and label names with non alpha-numeric characters #201

KMR1-1 opened this issue Jun 20, 2024 · 11 comments

Comments

@KMR1-1
Copy link

KMR1-1 commented Jun 20, 2024

V1:

CLR_CompileC#(Code, References="", AppDomain=0, FileName="", CompilerOptions=""){...}

V2 (Converted):

CLR_CompileC#(Code, References="", AppDomain=0, FileName="", CompilerOptions=""){...}

V2 (Expected):

CLR_CompileCSharp(Code, References := "", AppDomain := 0, FileName := "", CompilerOptions := "") {...}

asked chat gpt ahk syntax checker and told me it was supposed to be like that

@Banaanae
Copy link
Collaborator

Very similar to #153
The reason this conversion doesn't work is because of the function name CLR_CompileC#, since the converter only uses a basic \w+\(.*?\) to match functions this won't work as \w doesn't match #, conversion works correctly without this.
The other thing we could do is use a less strict RegEx, but this could cause issues.

Incase you're wondering we will not be changing C# -> CSharp (well beyond scope)

@CoffeeChaton
Copy link

\w+\(.*?\) cannot contain many common function definitions

#Requires AutoHotkey v1.1.33+
; show A B

if (中文() = "¢ 0xa2") {
    MsgBox, % "中文1"
}
MsgBox, % "中文2"


中文() {
    Return "¢ 0xa2"
}
// js regexp
/^[#$@\w\u{A1}-\u{FFFF}]+$/iu

So far I have tested these ahk v1.1 function names that work.
I think this can save you some testing time if necessary.

Code generation and generated results are here.
CoffeeChaton/vscode-autohotkey-NekoHelp#8 (comment)


In addition, labels can accept more text symbols.
I have seen a lot of label: starting with 1) or 2) on the forum.

#Requires AutoHotkey v1.1.33+
; show foo bar

gosub 1)(miro#_r

MsgBox, % "bar"
Return

; label
1)(miro#_r:
MsgBox, % "foo"
Return

@Banaanae
Copy link
Collaborator

Banaanae commented Jun 20, 2024

/^[#$@\w\u{A1}-\u{FFFF}]+$/iu

AFAIK \u can't be used in PCRE. I don't have any idea on how to handle non-ascii functions, maybe [^xxx]* with not allowed characters is the way to go?

I am aware that @andymbody is working on function masking, and may have already found a solution, or atleast find this useful

@Banaanae
Copy link
Collaborator

More Breakage:

My¡Func() {
MsgBox
}

My¡Func()
My¡%% {
MsgBox()
}

My¡%%

The converter treats My¡Func() as Some-Random-String Func() and then converts it
Easier to see with My¡Asc() -> My¡Ord() (Correct conversion: My¡Asc())

@andymbody
Copy link
Contributor

Good info... I just came here to let you know that I am just about ready to implement the masking/separation of functions, classes, etc. But now I see that I must adjust the allowable characters for names as well. I was not aware of these rare characters being valid.

I think it is best to add a new file to the converter file list that exclusively handles the masking. That way any new masking can be added to that file. A simple #include will allow direct access to the masking classes/functions. Does this sound OK to you?

I plan to add support for labels, hotkey blocks, etc so processing of those can be handled separately (trying to preserve scope). We can start with separate processing of classes and functions (as well as global script body). Then add as we go.

Adding this support to the script should be fairly straightforward. Before processing any code, separate the parts, then run each part thru the main convert loop. The order matters somewhat. Within each separate part, sub components can be sub-divided/masked as well.

I am adding needles for many bracketed structures like IF, Switch, loops, while, etc, in case it makes things easier to process them separately in some cases. The plan is to allow everything to become a separate unit for separate processing (available as needed). Not everything requires this, but the goal is to provide the foundation of support just in case).

Btw... the function regex included will this implementation will support nested parentheses, braces, etc as well as trailing comments, mask tags, etc. Just need to look into supporting unusual characters.

@andymbody
Copy link
Contributor

\w+\(.*?\) cannot contain many common function definitions

The new regex is much more supportive. It's currently more than 200 characters long. And will grow to support the unusual characters.

So far I have tested these ahk v1.1 function names that work. I think this can save you some testing time if necessary.
In addition, labels can accept more text symbols. I have seen a lot of label: starting with 1) or 2) on the forum.

Thank you for providing these examples!

@andymbody
Copy link
Contributor

中文() {
Return "¢ 0xa2"
}

Are we sure this is supported in v1? I get an error that says v1 does not recognize as a function

@andymbody
Copy link
Contributor

Actually it might be better to test this new functionality on common code first. No sense making it overly complex until that has been done. I have tested it and it works well, but more testing will be needed to ensure hidden bugs are worked out first.

@Banaanae
Copy link
Collaborator

Does this sound OK to you?

Go ahead! If you have multiple files it might be worth putting them all in something like Convert\Masking

Are we sure this is supported in v1? I get an error that says v1 does not recognize as a function

They showed up as ?? for me, might be an issue with not having the correct language packs installed?

Can't wait for the pull request!

@CoffeeChaton
Copy link

中文() {
Return "¢ 0xa2"
}

Are we sure this is supported in v1? I get an error that says v1 does not recognize as a function

I reinstalled ahk 1.1.37.02

  • ANSI 32-bit -> not recognize as a function
  • Unicode 64-bit -> OK

@Banaanae Banaanae changed the title compile sharp Handling of function and label names with non alpha-numeric characters Jun 21, 2024
@andymbody
Copy link
Contributor

andymbody commented Jul 4, 2024

I did some research and testing and what I found was that characters for v1 label names are ridiculously non-restrictive. They can even begin with a semicolon! Should be fun designing a renaming convention that covers all cases, and yet not resulting in repeated replacement characters or killing the original description of the label. I will make it simple and we can adjust as needed from there.

I may need to replace the current related regex needles because they don't really cover all possible names that coders can come up with. I'm using the thousands of scripts in my library from other coders as a lab and you should see some of the unusual names and characters used for labels. I had no idea.

These are all valid labels in v1 (even a colon is allowed as part of the name)

`;notACommentAnymore:			; begins with semicolon
%A_Desktop%\List.txt: 			; variable de-reference
:a:					; begins with colon
c:\user\desktop\List.txt:		; has colon in middle of name
#Persistent:				; ahk directive
\w+\(.*?\):				; come on...
if(a&&b){:				; really?
IThoughtIWasAFunc(param1:=""){:		; most regex needles would think so too... think again

The only 3 things not allowed in v1 is whitespace, comma, and escape character. Everything else is fair game.
I'm glad v2 is more restrictive.

andymbody added a commit to andymbody/AHK-v2-script-converter that referenced this issue Jul 7, 2024
Fix V1 to V2 label name conversion.
Does not address variable names, that will be handled separately
@Banaanae Banaanae linked a pull request Jul 9, 2024 that will close this issue
@Banaanae Banaanae removed a link to a pull request Jul 9, 2024
@Banaanae Banaanae mentioned this issue Nov 9, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants