I have a long-term project: DIY computer with various processors. One of my wishes not only make hardware, but software too.
So I started from assembler/disassembler for Linux, though there is a lot of Z assemblers. I want to implement all known instructions, like LD D, RES 0, (IX + n) even if it will never be used.
I have never hacked others’ assembler source codes as I want to do it the way I imagine it. I want to know whether my idea is weird, wrong, bad by design or not so bad.
I created a table describing each instruction I found on the Net:
typedef struct
{
uint32_t opcode;
uint8_t data_size;
bool reljmp;
char *mnemo;
char *hash;
} opcode_table;
It looks like it:
{.opcode=0x10, .mnemo="DJNZ %#.2x", .data_size=1, .reljmp=true },
The table itself is the same for assembler and disassembler.
Assembler
Source code is parsed by means of GNU Bison/Flex two times. The first one there is only dry run in order to calculate labels’ offsets, expand macroses etc. Addresses and user variables is hold in hash tables (UThash is used).
Each instruction user wrote is converted to the specific form, like DJNZ %#.2x in this example (if Bison was able to form the string and it is correct). And then it passes to the handle function:
int handle_instruction ( char* instruction, intmax_t data, size_t size )
It is followed by simple linear search by string (the first optimisation I see – use string hashes and binary search)
const opcode_table* new_opc = find_opcode ( instruction );
On the first pass if there is a label it is substituted by INTMAX_MIN on Bison side, so if there is INTMAX_MIN as argument on the second run I suppose I should look into hashtable for previously calculated address
As I use preformatted mnemonic in the table, it is very simple to print it out:
if ( PASS2 == run_pass && verbose )
{
printf ( "%#.4x: ", PC );
printf ( new_opc->mnemo, ( uint16_t ) data );
puts ( "" );
}
I don’t need to format it manually – it all automatic.
Disassembler
Is very simple – just read byte-by-byte and see if it prefix or not – then just look in the table for opcode and get corresponding mnemonic and print it
char* compile_string ( const char* format, ... )
{
char* string;
va_list args;
va_start ( args, format );
if ( 0 > vasprintf ( &string, format, args ) ) string = NULL;
va_end ( args );
return string;
}
The resulted code is large enough (~300kb) because of table, but it doesn’t matter. The main advantage of a table is ease of adding new CPU support. I want to implement a Forth on the same base.
The main problem I met is bug-detection. At the moment there is bugs in logic.
As main test for the assembler I set to compile Monitor 48 written for TASM. It is poorly do it on 30-40% because of relative jump calculation issues.
What is the better way of testing this kind of programs? I can compile all instruction but in real code there may be issues. I can compare the result with another assemblers but in very limited way, as the don’t support TASM-like macro, for example, or some instructions.
4