Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support closed generic types. #53

Open
kekyo opened this issue Nov 19, 2018 · 13 comments
Open

Support closed generic types. #53

kekyo opened this issue Nov 19, 2018 · 13 comments

Comments

@kekyo
Copy link
Owner

kekyo commented Nov 19, 2018

Idea

  • Value Witness Table (came from swift)
  • Aggregate implementation for generic parameters are objref type.
    • How to analyze and fix implementation for partial objref arguments at generic parameters?
kekyo added a commit that referenced this issue May 8, 2019
kekyo added a commit that referenced this issue May 8, 2019
@kekyo
Copy link
Owner Author

kekyo commented May 10, 2019

Found fist problem at symbol mangling system.
Current mangling rules are gonna break easier if will append/change members.

We have to make stable mangling symbols, I'll rewrite the type and method symbol name mangling algorithm.

For example: string string.Format(string format, object arg0)
--> System_String* System_String_Format__System_String_System_Object(System_String* format, System_Object* arg0)

For example (planning closed generic type, not tested): int List<string>.Add(string value)
--> System_Int32 System_Collections_Generic_List__System_String_Add__System_String(System_Collections_Generic_List__System_String* this__ ,System_String* value)

It's very redundant but stable and safe. And I'm planning will fix by append (readable, useful) alias names the final step.

kekyo added a commit that referenced this issue May 10, 2019
Changing apply full overload symbol naming. #53
@kekyo
Copy link
Owner Author

kekyo commented May 10, 2019

NOTE: It's interesting about how to fit the array types into generic symbol system, I feel the array types understand making better:

int[] --> System.Array<System.Int32> --> System_Array__System_Int32

Ofcourse System.Array<T> isn't real type definition. We can apply with pseudo type internal IL2C metadata system.

@kekyo
Copy link
Owner Author

kekyo commented May 10, 2019

MEMOIZED: Higher Kinded Polymorphism / Generics on Generics
dotnet/csharplang#339

@kekyo
Copy link
Owner Author

kekyo commented May 11, 2019

Today, redesigned method overriding calculation at Center CLR Try development meetup #8 (In japanese).

IMG_20190511_174042

I'll update CalculateVirtualMethods() and remove overload index related codes.

@Sinsjr2
Copy link

Sinsjr2 commented May 11, 2022

Hello.

I propose generic implementation idea.
if you like, please see the following repository.
https://github.com/Sinsjr2/CGenericImpleSample

Implementation idea

  • static generic function
  • generic class/struct instance method
  • generic class virtual method (Partially completed)
  • generic struct virtual method (Partially completed)
  • OpCodes.Constrained
  • generic class/struct static valiable

@kekyo
Copy link
Owner Author

kekyo commented May 12, 2022

Thanks for the sample code. Very interesting implementation.

I see that you use runtime type information to achieve this. We can already get the size (il2c_sizeof()). As for copying the value, there are many possibilities, such as in the case of an objref or when the valuetype contains an objref, but I think it is possible to go this way in the case of a static method.

My idea at the moment (not clearly formed) is to use C macros for the expansion.
The disadvantage of this method is that it could generate a large amount of the same code.
When I thought of this method, I was thinking of relying on VC optimizations (which can remove identical code at the binary level when linking. See /OPT:ICF).
Now that I am planning to pull VC out of priority support, we are wondering if we can do the same thing with gcc or clang instead.

The other problem is that readability will be poor, and I'm not sure I can go any further without resorting to a C++ template. (I have no plans to go literal "IL2C++", the C++ compiler is too slow :)

If you think you can fill in the rest of your idea, you could try applying it directly to IL2C.Core. I am currently planning to work on Release 0.5, and the rest of the work will mainly be to improve the build environment and fix the documentation. Therefore, I do not plan to do much work on IL2C.Core for a while. (The Core unit test code will be significantly modified in relation to #100.)

(This does not mean that I want to include your code in Release 0.5. I'm not in a hurry, so take it easy on me ;)

@Sinsjr2
Copy link

Sinsjr2 commented May 15, 2022

Thank you for your reply.

I see that you use runtime type information to achieve this. I can already get the size (il2c_sizeof()). As for copying the value, I think it is possible to go for a static method, given the details, like in the case of an objref or when the valuetype contains an objref. I think I can go for the static method.

it mens changing from "TypeInfo" to "IL2C_RUNTIME_TYPE", and "IL2C_RUNTIME_TYPE" can get from "il2c_get_header__".

/* System_Object* */void* obj;
IL2C_RUNTIME_TYPE generic_T = il2c_get_header__(obj)->type;
void Extensions_GenericPassThrough_T(IL2C_RUNTIME_TYPE generic_T, void *result, void *x) {
    // .locals init (
    //     [0] !!T
    // )
    void *local_0;
    void *stack_0;
    void *stack_1;
    uint32_t runtimeSize_T;

    runtimeSize_T = il2c_sizeof__(generic_T)

    local_0 = NULL;
    stack_0 = alloca(runtimeSize_T);

    // IL_0000: nop
    // IL_0001: ldarg.0
    memcpy(stack_0, genericArg_x, runtimeSize_T);
    // IL_0002: stloc.0
    memcpy(local_0, stack_0, runtimeSize_T);
    // IL_0003: br.s IL_0005

    // IL_0005: ldloc.0
    memcpy(stack_1, local_0, runtimeSize_T);
    // IL_0006: ret
    memcpy(result, stack_1, runtimeSize_T);
}

void Extensions_GenericPassThroughTest() {
    // .locals init (
    //     [0] int32 a
    // )
    System_Int32 a_System_Int32;
    System_Int32 stack_0_0;

    // IL_0000: nop
    // IL_0001: ldc.i4.s 10
    stack_0_0 = 10;
    // IL_0003: call !!0 Extensions::GenericPassThrough<int32>(!!0)
    Extensions_GenericPassThrough_T(il2c_typeof(System_Int32), &stack_0_0, &stack_0_0);
    // IL_0008: stloc.0
    a_System_Int32 = stack_0_0;
    // IL_0009: ret
}

I'm planning follows.

  1. Translate above "Implementation idea" manyally.
    Because, it's easier than tlanslation code with IL2C.
  2. Check manually translated code, to prevent obvious mistakes in implementation policy.
  3. Write code in IL2C.Core and check with unit test code.

@kekyo
Copy link
Owner Author

kekyo commented May 15, 2022

(Sorry it's a bit long. Since you seemed to be Japanese, I'll put the manuscript I wrote in Japanese on gist. You can reply there, but it would be helpful if you could also add the English translation here so that others can refer to it. deepl is also fine :)


That code is fine for how to get runtime type information. (Perhaps you should define a macro in il2c.h.)

I still don't understand all the sample code you wrote, but to get the member offsets of the structure, you can do the following.

  • When you run the unit tests, the partially translated test code will be output in the test-artifacts directory for your reference.
  • You can refer to the MultipleInsideValueType type in the GarbageCollection of IL2C.Tests.RuntimeSystems as an example structure. This type is defined in C# as follows:
public struct MultipleInsideValueTypeType
{
    public string Value1;
    public ObjRefInsideValueTypeType Value2;
    public ObjRefInsideObjRefType Value3;

    public MultipleInsideValueTypeType(string value1, string value2, string value3)
    {
        this.Value1 = value1;
        this.Value2 = new ObjRefInsideValueTypeType(value2);
        this.Value3 = new ObjRefInsideObjRefType(value3);
    }
}
  • Output under test-artifacts/Debug/net48/RuntimeSystems/GarbageCollection/MultipleInsideValueType_0/.
  • If you look at the end of MultipleInsideValueTypeType.c, you will find the following code:
//////////////////////
// [7] Runtime helpers:

// [7-10-1] VTable (Not defined, same as System.ValueType)

// [7-8] Runtime type information
IL2C_RUNTIME_TYPE_BEGIN(IL2C_RuntimeSystems_MultipleInsideValueTypeType, "IL2C.RuntimeSystems.MultipleInsideValueTypeType", IL2C_TYPE_VALUE, sizeof(IL2C_RuntimeSystems_MultipleInsideValueTypeType), System_ValueType, 3, 0)
    IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_REFERENCE(IL2C_RuntimeSystems_MultipleInsideValueTypeType, Value1)
    IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE(IL2C_RuntimeSystems_MultipleInsideValueTypeType, IL2C_RuntimeSystems_ObjRefInsideValueTypeType, Value2)
    IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_REFERENCE(IL2C_RuntimeSystems_MultipleInsideValueTypeType, Value3)
IL2C_RUNTIME_TYPE_END();

This is a macro that defines runtime type information, with three lines of definitions IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_REFERENCE() and IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE(), respectively, for the objref and valuetype fields are defined.
For example, look at the definition of IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE() (il2c.h):

#define IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE(typeName, fieldTypeName, fieldName) \
    (uintptr_t)il2c_typeof(fieldTypeName), \
    offsetof(typeName, fieldName),

which corresponds to markTargets[] in IL2C_RUNTIME_TYPE_DECL (il2c_private.h):

typedef const struct IL2C_MARK_TARGET_DECL
{
    const IL2C_RUNTIME_TYPE valueType;
    const uintptr_t offset;
} IL2C_MARK_TARGET;

struct IL2C_RUNTIME_TYPE_DECL
{
    const char* pTypeName;
    const uintptr_t flags;
    const uintptr_t bodySize;       // uint32_t
    const IL2C_RUNTIME_TYPE baseType;
    const void* vptr0;
    const uintptr_t markTarget;     // mark target count / custom mark handler (only variable type)
    const uintptr_t interfaceCount;
    //IL2C_MARK_TARGET markTargets[markTarget];
    //IL2C_IMPLEMENTED_INTERFACE interfaces[interfaceCount];
};

In other words, code like IL2C_RUNTIME_TYPE->markTargets[index].offset will give you the offset of the structure member.
For now, IL2C uses this information only to track the garbage collector, but I have a feeling it could be used for this method as well.

  • You can refer to the il2c_mark_handler_recursive__() area for the specific formula.

Since this calculation was also a very internal information of IL2C, I did not specifically define a macro for this calculation, but if necessary, you may define a macro.

Now, besides the performance issues with memcpy(), you need to be careful whether you can do pure copying or not.

  • If it is valuetype, no problem. (If you include an objref, you need to be able to track it, so you need to insert start and end codes that bind the EXECUTION_FRAME. This can be considered later.)
  • If IL2C_RUNTIME_TYPE points to an objref, copying it from pReference does not mean you copied it correctly. This is because IL2C_REF_HEADER is placed before pReference (at a negative offset):
    // +----------------------+ <-- pHeader
    // | IL2C_REF_HEADER      |
    // +----------------------+ <-- pReference   -------
    // |          :           |                    ^
    // | (Instance body)      |                    | bodySize
    // |          :           |                    v
    // +----------------------+                  -------

I still don't understand the need for the copy, but I have a feeling that the way to handle this depends on why the copy is needed.

@Sinsjr2
Copy link

Sinsjr2 commented May 19, 2022

As you see,As you see, I'm Japanese.
I will also write English for many peple can read this.

Generic T GC implementation

Thank you for your description of GC mark method.

I simply describe generic T GC implementation.
The problem of implementing GC for generic T is to change objref and value type dynamically.

So, as follows I assign generic_T to x_type__,
when gc run, it determines whether x_value_ptr__ is objref or value type,
switch the function to call.

for (index = 0; il2c_likely__(index < pCurrentFrame->valueCount__); index++, pValueDesc++)
{
il2c_assert(pValueDesc->ptr_value != NULL);
il2c_assert(pValueDesc->type_value != NULL);
il2c_assert((pValueDesc->type_value->flags & IL2C_TYPE_VALUE) == IL2C_TYPE_VALUE);
il2c_runtime_debug_log_format(
L"il2c_step2_mark_gcmark__ [5]: pCurrentFrame=0x{0:p}, index={1:u}, type={2:s}, pValue={3:p}",
pCurrentFrame,
index,
pValueDesc->type_value->pTypeName,
pValueDesc->ptr_value);
// Mark for this value.
il2c_mark_handler_for_value_type__((void*)pValueDesc->ptr_value, pValueDesc->type_value);

objref: il2c_mark_handler_for_objref__(*(System_Object**)x_value_ptr__) value type (existing implementation): il2c_mark_handler_recursive__(pAdjustedReference, pHeader->type, offset);`

Case of Generic T is Objref

If type of T is objref, it doesn't copy class fields.
It copies pointer itself with memcpy.

  • If it is valuetype, no problem. (If you include an objref, you need to be able to track it, so you need to insert start and end codes that bind the EXECUTION_FRAME. This can be considered later.)
  • If IL2C_RUNTIME_TYPE points to an objref, copying it from pReference does not mean you copied it correctly. This is because IL2C_REF_HEADER is placed before pReference (at a negative offset):

Case of assign

System_Object* x;
System_Object* local_0;
local_0 = x;

Case of memcpy

System_Object* x;
System_Object** arg_x;
System_Object** local_0;

arg_x = &x;
local_0 = alloca(sizeof(System_Object*));

memcpy(local_0, arg_x, sizeof(System_Object*))

OutputCode

typedef struct Extensions_GenericPassThroughTest_EXECUTION_FRAME_DECL
{
    const IL2C_EXECUTION_FRAME* pNext__;
    const uint16_t objRefCount__;
    const uint16_t valueCount__;
    //-------------------- objref
    //-------------------- value type
    const IL2C_RUNTIME_TYPE x_type__; // generic type
    const void* x_value_ptr__;
    const IL2C_RUNTIME_TYPE local_0_type__; // generic type
    const void* local_0_value_ptr__;
    const IL2C_RUNTIME_TYPE local_1_type__; // generic type
    const void* local_1_value_ptr__;
    const IL2C_RUNTIME_TYPE local_2_type__; // generic type
    const void* local_2_value_ptr__;

} Extensions_GenericPassThroughTest_T_EXECUTION_FRAME__;

void Extensions_GenericPassThrough_T(IL2C_RUNTIME_TYPE generic_T, void *result, void *x) {
    // .locals init (
    //     [0] !!T
    // )
    uint32_t runtimeSize_T;
    runtimeSize_T = il2c_sizeof__(generic_T);
    Extensions_GenericPassThroughTest_T_EXECUTION_FRAME__ frame = {
        ...
        generic_T, // x
        alloca(runtimeSize_T),
        generic_T, // local_0
        alloca(runtimeSize_T),
        generic_T, // local_1
        alloca(runtimeSize_T),
        generic_T,// local_2
        alloca(runtimeSize_T)
    };
    // IL_0001: ldarg.0
    // T is value type: copy member filelds to new instance
    // T is object reference type: copy pointer to new local variable with memcpy.
    //    so, this is not Object.MemberwiseClone https://docs.microsoft.com/ja-jp/dotnet/api/system.object.memberwiseclone?view=net-6.0
    memcpy(frame.stack_0, x, runtimeSize_T);
    ...
}

void Extensions_GenericPassThroughTestObj() {
    // .locals init (
    //     [0] object a
    // )

    // IL_0000: nop
    System_Object* a_System_Object;
    System_Object* stack_0_0;

    // IL_0001: newobj instance void [System.Runtime]System.Object::.ctor()
    stack_0_0 = il2c_get_uninitialized_object(System_Object);
    System_Object__ctor(stack_0_0);
    // IL_0006: call !!0 C::GenericPassThrough<object>(!!0)
    // pass pointer of pointer
    Extensions_GenericPassThrough_T(il2c_typeof(System_Object), &stack_0_0, &stack_0_0);
    // IL_000b: stloc.0
    a_System_Object = stack_0_0;
    // IL_000c: ret
}

@kekyo
Copy link
Owner Author

kekyo commented May 30, 2022

Hold a field in the execution frame with a raw pointer to the instance x_value_ptr__ (which may point to a pointer to an objref, or to the body of a valuetype) and the runtime type information x_type__:

  • If T is an objref, then :
    • treat x_value_ptr__ as if it were a System_Object* (reinterpret_cast).
    • Let GC traverse the reference tracking as it is (implement it in il2c_gc.c as a handler for the third variable element, or use [pReference in the execution frame](https://github.com/kekyo/IL2C/blob/4c3b 4097de29f119a01e9b4499d319eca773003e/IL2C.Runtime/src/il2c_private.h#L77) to handle it well... There seems to be a trade-off between footprint and readability.
  • If T is a valuetype, then :
    • Treat x_value_ptr__ like a pointer to the target value type body.
    • When accessing runtime type information with box opcode and etc., refer to x_type__.
    • If GC reference tracing is required (IsRequiredTraverse) put it in valueDescriptors__ of Execution frame, or not if you don't need it...? Might be better to create some helper function and have it do it in there?)

Maybe your initial concern about using memcpy can be offset by optimizations in the C compiler. At least when I verified it with optimization enabled in VC++ before, it generated exactly the same code with memcpy and assignment expressions in C language. Of course, I suppose it depends on the conditions...


Generic type argument constraints

We haven't examined instance member access yet, but accesses like System.Object.ToString() for T :

public static string foo<T>(T value) =>
  value.ToString();

Or access with T constraint:

public static string bar<T>(T value)
  when T : IDisposable =>
  value.Dispose();

Assuming a managed compiler like C# has (correctly) computed the constraints, IL2C simply casts the pointer (reinterpret_cast to the VTABLE layout type of System.Object.ToString's VTABLE or IDisposable's VTABLE) might be able to access it.

In the case of interfaces, we need to calculate adjustor offset, but if we can determine which interface the specified member (Dispose()) belongs to by cecil from T, I think it would be possible to convert the code to calculate adjustor offset statically.

@Sinsjr2
Copy link

Sinsjr2 commented Jun 15, 2022

I have come up with a conversion process for the following process and report it below.

  • generic static variable
  • generic virtual function
  • generic class/struct fields

Now that I have a rough idea of how to implement generics using memcpy in handwritten C code, I would like to think about the details while actually implementing it in IL2C (output in C89).

First, I will try to implement it for value types that do not require gc in static methods.
I don't fully understand how the process is divided by objref value when tracking with gc, so I will think about it later.

As for memcpy optimization, I'm not that worried about it in major compilers (gcc, clang) including msvc.
I am a little worried about how far the compiler for microcontrollers (cc-rx, cc-rl, rx gcc) will optimize it.
However, it is no use thinking about it before implementation, so I will think about it after implementation is done.

https://github.com/Sinsjr2/CGenericImpleSample/blob/29a64b9642910ed5cf67d0cc4f332d92f3b02af5/README.md

@Sinsjr2
Copy link

Sinsjr2 commented Jun 19, 2022

In the generic implementation, I need to add result and generic_T to the function arguments and
In current mangling process, it has a possibility of name conflicts.

class C
{
  static void F<T>(int result, int generic_T) {}
}

Current conversion process

void C_F_T(void* result, IL2C_RUNTIME_TYPE generic, int result, int generic_T) {}

So, I will try to escape strings used for type and variable names with the following rules.
The following process is reversible, so unescape is possible and names will not conflict.

Escaping rules

. => __ // Currently converts to _, but that does not treat _ as an escape character Existing. IL2C.Runtime needs to be modified.
_ => _i_ // _ is often used to separate and discard values i is a vertical bar, so it is easy to see that it is separated
[a-zA-Z0-9] => no conversion
After reserved (ex: this, frame) converted by IL2C => suffix with _sr_ (ex: this_sr_, frame_sr_, generic_T_sr_, result_sr_)
  (_sr_) stands for System Reserved
Other characters => convert to _ux○○○○○○○○ (8-digit hexadecimal UTF-32)
  ux stands for unicode hex
  IL can use unicode characters such as Kanji and Japanese as identifiers, so conversion is necessary (most systems can use only Ascii characters and _ in c language. Some systems can use universal character names).
< => _d_ // For using generic, this notation can easily be written in c if written by hand.
> => _b_  // For using generic, this notation can easily be written in c if written by hand. 
Local variables used in methods (@if, malloc, @void, NULL) c# can describe reserved words by adding @) => add _l_ as suffix (ex: if_l_, malloc_l_, void_l_, NULL_l_)
  If you don't add anything, it will be expanded as a macro or conflict with C reserved words, resulting in a compile error.

I plan to make the above fixes separately from the generic implementation, but may I implement them?

@kekyo
Copy link
Owner Author

kekyo commented Jun 26, 2022

Sorry for the late reply.


  • How do you give the value of alignment for il2c_adjustAlignment?

    • I'm thinking the safe thing to do would be to #define it in a platform specific header file, but
    • It would be better if we could have the compiler calculate it (I haven't come up with the specifics, but maybe have it use offsetof()...)
    • size_t is quite hard to use on some platforms, so uint8_t might be better, assuming that the alignment never exceeds 256 (even in IL2C, there are some places where we have compromised and stopped using size_t).
  • I think we need some kind of generic dictionary function...

    • It's hard to add it to the runtime library to increase the footprint, but I guess it can't be helped.
    • I wonder if it would be exempt if we didn't use generic types, if they weren't linked...

However, I can't get IL2C_RUNTIME_TYPE_DECL from this when it comes to regular function calls.
Therefore, we separate the function for virtual calls and the function for normal calls.

Yes indeed, in the case of value type, I have no problem without vptr (pReference) (this is in consideration of allowing direct pointer references on the native side during interop)


In the generic implementation, I need to add result and generic_T to the function arguments, and
With the current mangling process, there is a possibility of name conflicts.

Noted :)

In particular, we need to add a . You are absolutely right about the conversion of into _, which is problematic even when generic type arguments are not involved.

  • When I looked into it before, something like escaping Unicode points (\x1234) is not available in the preprocessor macro.
  • There is no stable special character other than _ as a preprocessor macro symbol (although there may be one on some processors).

So, I was holding off.

With the method you suggest, I would have to modify the translator as well as the existing runtime implementation. Until we make this modification, we should try to increase the runtime implementation as little as possible.

After reserved (ex: this, frame) converted by IL2C => suffix with sr (ex: this_sr_, frame_sr_, generic_T_sr_, result_sr_)

I think it's good (I was thinking it might be better to make it dirtier for the current method).

Other characters => convert to _ux○○○○○○○○ (8-digit hexadecimal UTF-32)

Do you want to use UTF-8? There are readability issues, but realistically I don't think there's much use of CJK or anything like that, just enough to be a target if umlauts or something like that is used. Although not a symbol, you might want to keep in mind bug #124 that we recently picked up, if you haven't seen it yet. I mistakenly put in wchar_t thinking it was 16-bit.

  • The translator side cannot (maybe) use string literals. uint16_t str[] = { ... }; to output the raw value, or char str[] = "..." ; to put in UTF-8 I believe.
  • If you decide to use UTF-8 in the above, you will need to modify the implementation at runtime, especially around System_String.

< => d // For using generic, this notation can easily be written in c if written by hand.

=> b // For using generic, this notation can easily be written in c if written by hand.
Local variables used in methods (@if, malloc, @void, NULL) c# can describe reserved words by adding @) => add l as suffix (ex: if_l_, malloc_l_, void_l_, NULL_l_)

I think it is good.

Or maybe it would be better to have it macro-expanded (though I'm not a bit sure if it would contribute to readability). For example :

// System.Collections.Generic.List<System.Int32>

// this won't work.
#define GENERIC_ARG(args) _d_##args%%_b_
System__Collections__Generic GENERIC_ARG(System__Int32)

// not so good...
#define GENERIC_TYPE(type, args) type##_d_##args%%_b_
GENERIC_TYPE(System__Collections__Generic, System__Int32)

like ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants