Thiết kế website giá rẻ

Question

I’m working on a transpiler that converts source language into C++ without disobeying its semantics other attributes of the language.

This language is based on a VM/bytecode and it has a lot of “intermediate” memory values that its compiler generate a cause a lot of bloat even when its transpiled into C++ code because compiler is not able to resolve the origin of the memory to perform compiler-time optimizations.

“Memory” is stored in a contigous memory block and interacted through it. Since the source language is an embedded one, it relies on a sophisticated reflection system that collects attributes of the values that can be accessible from source language from C++. Reflection system has the following class (pseudocode):

class Property
{
    int Offset;
    int Alignment;
    // .. .etc
};

And code generator reads values from the “Memory” like this:

Property* Property = (Bytecode + 32); // Offset of the FProperty* is known in codegenerator so its a literal integer here
uint8* PropertyAddress = (Memory + Property->Offset); // Offset of PropertyAddress is not known so I have to read from reflection. PropertyAddress is interacted by the generated code multiple times, and its copied around as pointer to perform "stack" logic.

Handwritten C++ version of the code:

class NativeClass
{
    bool SomeValue;
    bool Function() { return (SomeValue && (3 > 2) || (5 > 1));
}

Clang output:

Function(NativeClass&): # @Function(NativeClass&)
  mov al, 1
  ret

Clang output for the same exact code from code generator’s output:

TranspiledCode(stack&, void*): # @TranspiledCode(stack&, void*)
  mov rax, qword ptr [rdi + 8]
  mov rcx, qword ptr [rdi + 16]
  movsxd rdx, dword ptr [rax + 10]
  movsxd rsi, dword ptr [rax + 40]
  movsxd r8, dword ptr [rax + 70]
  mov byte ptr [rcx + rdx], 0
  mov byte ptr [rcx + rsi], 1
  mov r9, qword ptr [rdi]
  movsxd r10, dword ptr [rax + 88]
  cmp byte ptr [r9 + r10], 0
  setne r9b
  cmp byte ptr [rcx + rdx], 0
  setne dl
  and dl, r9b
  mov byte ptr [rcx + r8], dl
  movsxd r8, dword ptr [rax + 108]
  cmp byte ptr [rcx + rsi], 0
  setne sil
  or sil, dl
  mov byte ptr [rcx + r8], sil
  mov rdx, qword ptr [rdi + 32]
  movsxd rsi, dword ptr [rax + 155]
  add rax, 147
  lea r8, [rcx + rsi]
  movzx esi, byte ptr [rcx + rsi]
  mov byte ptr [rdx], sil
  mov qword ptr [rdi + 40], rax
  mov qword ptr [rdi + 48], r8
  mov qword ptr [rdi + 56], rcx
  ret

If there is a “virtually free” operation defined in the source language, like comparing a few bools, summing and subtracting values etc. writing same code in C++ often just outputs a few lines of assembly code. Meanwhile in transpiled version of the source language outputs 30x more code, which is still faster than evaluating same code in VM but I am able to optimize away this a lot if I do something like this instead:

bool SomeValue; // I know this specific value is completely local and wont be used elsewhere
uint8* PropertyAddress = &SomeValue;

When using a locally constructed bool value for the same bytecode function Clang output:

TranspiledCodeOptimized(stack&, void*): # @TranspiledCodeOptimized(stack&, void*)
  mov rax, qword ptr [rdi + 32]
  mov byte ptr [rax], 1
  ret

However this is not always possible because a single function thats defined in the source language is spread to multiple functions (one function per n amount of opcode – depends on how much transpiler could fold/combine). So bool SomeValue cant be local to the function body I’m declaring it, its value has to be carried around multiple times.

I’m looking for a way to hint compiler that content of the “Memory” can be optimized in compiler-time without using actual types thats declared in C++.

I tried to create a struct that contains the values that I can “localize” from the “Memory”, pseudocode:

struct LocalizedMemoryElements
{
    bool SomeValue;
    int SomeFunctionsReturnValue;
    int SomeFunctionsInputValue;
};

and passing this struct to the generated opcode function makes clang generate faster code:

uint8* PropertyAddress = reinterpret_cast<uint8*>(&LocalizedMemoryElements.SomeValue);

Clang output:

TranspiledCodeWithLocalizedMemoryElements(stack&, void*, LocalizedMemoryElements&): # @TranspiledCodeWithLocalizedMemoryElements(stack&, void*, LocalizedMemoryElements&)
  mov word ptr [rcx + 4], 256
  mov byte ptr [rcx + 6], 0
  mov byte ptr [rcx + 16], 1
  mov rax, qword ptr [rdi + 32]
  mov byte ptr [rax], 1
  ret

But there are problems with this:

I end up duplicating memory by creating another struct on the local scope or heap, because “Memory” is already allocated by VM.
Reinterpret casting to the “Memory” through LocalizedMemoryElements is not possible either because I am unable to localize/nativize all types of data. The optimization I’m after mostly works for POD types.
I also tried to exploit usage of __restrict a lot but no luck.

Thiết kế website giá rẻ

Danh mục

Is there any way to promise compiler that “(Type*)(Memory + SomeClass->Offset)” formula points to a local/temp memory?