SIRTX VM
Information on the SIRTX VM
The SIRTX VM is a part of SIRTX that is used to run code that not part of the image nor interpreted command line (see line command interpreter).
Information on the SIRTX VM
The SIRTX VM is a part of SIRTX that is used to run code that not part of the image nor interpreted command line (see line command interpreter).
The SIRTX VM supports code in its own optimised format. This special format natively supports the SIRTX API and therefor can be used to compile to a small binary size. The following outlines the syntax for the reference assembler.
The syntax follows many standard assembler patterns. However, opcodes may be different from other architecturs.
Each line of the file may contain comments. A comment starts with ;
, //
, or #
and extends the rest of the line. If the line contains any directive there must be at least one space in between.
Every line may start with any number of whitespaces, end with them. Those are ignored. Between values there must be at least one whitespace or a comma or both.
Each directive may have zero or more values. A value can be of different types, depending on the directive. The first one is normally the destination, output, or target, and the second one the source or input.
It is possible to mark each opcode with a bang (!
) to select autodie mode. This will set the autodie flag of the opcode (if any) or insert a autodie directive right after the opcode (if it has no autodie on its own).
Some directives (marked †) listed below are synthetic. This means they don't exist as their own opcode or variant of one but are generated by the assembler. They may be multiple opcodes long. The assembler may optimise them based on its knowledge.
Some (non-synthetic) directives require on a specific register or prefer one. Those generally don't need to be mapped. This requires the stress on the mapping. Common example include the use of arg in control which can be undefined most of the time and can be kept in undefined and unmapped state and the combination of compare and conditional jump which both use out. Automatic mapping is aware of those cases.
Please also see the best practice section below.
BOOL
: Booleantrue
or false
.INT
: Integer123
for decimal, 0xabc
for hex, 0123
for octal, or 0b0101
for binary. Also, each value can be prefixed with a +
or -
to make it relative. What it is relative to depends on the opcode.STRING
: String"..."
or U+1234
(single character Unicode string). If in double quotes, backslashes are used to escape. \\
escapes single backslash, \0
escapes single 0-byte, \n
escapes LF/new line (U+000A), \t
escapes HT/horizontal tab (U+0009), \e
escapes ESC/escape (U+001B). Also, the form \xAB
can be used to escape any byte, note that in most cases those most form correct UTF-8 strings. For example U+00FC is correctly escaped \xC3\xBC
, not \xFC
.LREG
: Logical registerrN
with N being 0 to 7.PREG
: Physical registersuser7
or program_text
.REG
: General register syntaxr*
, and user*
.ID
: Identifiersni
, e.g. sni:80
. Only numerical values are currently supported. Some values from the raes
space are however automatically converted to raen
.VAR
: Variable$
, author defined ones start with it.NAME
: A alias name, this is the part after the last $
in a VAR
.Below is a list of directives and opcodes. Note that directives and opcodes may generate a variable amount of actual code depending on their parameters and context.
.align INT
Aligns the current location to at least INT bytes
.align 8
.autosection SECTIONTYPE
Automatically creates a section of the given type. The content of the section is taken from other directives and defaults.
.autosection header
.autosectionstart SECTIONTYPE
Starts a section with automatically generated content. This is the same as .autosection but the section is not yet ended with .endsection. This is useful if additional content is to be added to the section.
.autosectionstart header ; more content .endsection
.bug
Marks a point in code as only reachable by means of a bug. Generates code that will throw an exception.
.bug
.align INT [, INT [, INT [, ...]]]
Outputs the given bytes.
.bytes 0x56, 0x4D
.cat STRING [, STRING [, STRING [, ...]]]
Includes files as-is into the output. No translation is performed when including file files. This is mostly useful to include large binary objects (blobs).
.cat "help.txt"
.cold PREG [, PREG [, PREG [, ...]]]
Mark a given physical register as cold. Cold registers are unlikely to be used/referenced. This allows the assembler to optimise better.
.cold arg
.endfunction
Ends a function started with .function.
This will also emit a corresponding end$function$FUNCTION
and size$function$FUNCTION
alias.
If the function did not end with a returning opcode a void return is inserted before closing the function.
.endfunction
.endlabel
Generate a end mark for a label. This can later be referend using end$label$LABEL
.
.endlabel
.endsection
Ends a section started with .section
.endsection
.force_mapped REG [, REG [, REG [, ...]]]
Force the automatic mapping of registers (see .regmap_auto) to map a given set of registers. This is normally not needed, but can be used to move mappings outside small loops to improve performance.
.force_mapped io, user7
.function NAME
Marks the start of a function. This emits a function$NAME
alias. It also informs the assembler that the code is only reachable via a function call. This changes the assembler's understanding of the state of the registers. Functions are normally to be terminated using .endfunction.
.function main
.hot PREG [, PREG [, PREG [, ...]]]
Marks a given physical register as hot. Hot registers are likely to be used frequently. This allows the assembler to optimise better.
.hot user3, out
.include STRING [, STRING [, STRING [, ...]]]
Includes source files and translates them. The .include directive does not create a new context of any kind. The files are included as if their content were in the main file.
.include "lib.vmv0-asm"
.label NAME
Adds an alias for the given label, it can later be referenced using label$NAME
.
.label blubber
.lukewarm PREG [, PREG [, PREG [, ...]]]
Marks physical registers as lukewarm. This tells the assmeber that they are not specically hot nor cold. This is the default temperature for registers.
.lukewarm user2, error
.map LREG, PREG
Marks a logical register to be mapped to a physical register in the assembler's internal state without actually emitting an opcode. This is useful to set a correct state e.g. after a jump when the assembler can no longer know the actual mapping. To actually emit an opcode use map.
.map r0 user0
.mine REG [, REG [, REG [, ...]]]
This marks registers as owned by the author. If a register is owned by the authour the assembler will not try to use it for automatic code generation (such as auto mapping). This is the default for registers.
.mine r0, r1, r2, out
.noops INT
Inserts the given number of noops.
.noops 4
.not_implemented
Marks a section of code as not (yet) implemented. Generates code that throws a corresponding exception.
.not_implemented
.org INT
Seeks the output to the given position. If the value is relative moves relative to the current position
.org 0x100
.popname VAR [, VAR [, VAR [, ...]]]
Pops a name of the name stack. See .pushname for details.
.popname $x, $y
.pushname VAR = VALUE [, VAR = VALUE [, VAR = VALUE [, ...]]]
Pushes name-value-pairs onto the name stack. Each name must start with a single $
. The value can be any value (register, constant, ...) but another name. This mechanism can be used to assign more memorable names to registers or other constants. Those names can be used at any point. Names are pushed onto the stack, meaning if a name becomes pushed again it shadows the old content. The old content will be come visible again after .popname.
.pushname $x = 5, $y = user7
.quit
Quits the assembler, the rest of the file is ignored.
.quit
.regattr REG ATTR [, ATTR [, ...]]
Sets register attributes. The following attributes are defined:
mine
, yours
, theirs
: Sets register ownershipcold
, lukewarm
, hot
: Sets register temperature.regattr r0 mine, hot
.regattr user3 theirs, cold
.regmap_auto BOOL
Enables or disables automatic register mapping by the assembler. In order to work as expected some logical registers must be declared .yours. The number of registers that should be declared .yours depend on the code. A general starting point would be three or four.
.regmap_auto true
.section SECTIONTYPE [, STRING ]
Starts a section of the given type. If another section is already open it first needs to be ended with .endsection. Optionally a magic can be given as a second argument. Consider starting sections with .autosectionstart. This will include automatically generated and default content into the section.
.section header
.synthetic_auto_unref BOOL
Enables (default) or disables automatich unref-ing of temporary registers in synthetic opcodes. If enabled the assembler will try to unref all temporary registers immediately. This is normally the safter option, but might not be the most performant one. This option can be used to skip this step.
.synthetic_auto_unref true
.theirs REG [, REG [, REG [, ...]]]
Marks a register as owned not by the author nor the assambler. This can be useful if it is for example owned by a library or other module.
.theirs deep, user4
.utf8 STRING [, STRING, [, ...]]]
Writes the given strings as-is to the output. No final null-byte is inserted. However it can be manually added using \0
. Any escaped characters within must be valid UTF-8.
Note that the source file must be in UTF-8 (it always must). If it is not, strange things might happen.
.utf8 "Hello World\n", "Good afternoon!\0", U+1F981
.yours REG [, REG [, REG [, ...]]]
Marks a register as owned by the assembler. The assembler is free to use it however it sees fit. The author should avoid using those registers after being declared yours.
.yours r4, r5, r6, r7
.yours r*, arg, user7
add out, REG, REG † add out, REG, INT †
Adds the given two values, storing the result in out.
add out, user0, user1
add out, user0, 12
autodie
Dies with the error in error
if error
is not undef and not RoarAudio error code 0
(none).
autodie
byte_transfer! REG, REG, INT byte_transfer REG, REG, INT
Transfers the given number of bytes from the source to the destination register. Optionally dies if the number of bytes actually transferred is less than the number of bytes requested. The transferred length is stored in out
.
byte_transfer! r0, r2, 15
call? REG, REG call! REG, REG call REG † call INT † call ID † call REG, INT † call REG, ID †
Calls the function (second argument) using the given context (first argument Before the call the filter-output port is reset, the filter input port is set to in. After successful return out is set to the filter output port.
If the only one argument is passed, the sole argument is the function to call. It will be run in a fresh context. This might have performance implications (both ways).
call! r0, r1
compare out, REG, REG [, with FLAG [, FLAG [, FLAG [, ...]]]]
Compares two registers storing the result in out. Optionally a number of flags can be passed. The result is a number that is smaller than zero if the first input is smaller than the second, zero if they are equal, and bigger than zero if the first input is bigger than the second. The following flags are supported:
icase
: Compare case insensitiveasciz
: Compare up to a 0-byteprefix
: The first input is in the second but may have a prefix in frontsuffix
: The first input is in the second but may have a suffix after itnulls_distinct
: Null values are distinctnulls_equal
: Null values are equalnulls_first
: Null values are considered smaller than any other valuenulls_last
: Null values are considered bigger than any other valueseekback_start
: Preforms a seekback on the second input to match the start of the submatch of the first inputseekback_end
: Preforms a seekback on the second input to match the end of the submatch of the first inputsubject
: True if the subjects match, even if the handles are distinct handlescompare out, user0, user4
compare out, user0, user4 with icase
compare out, user0, user4 with nulls_first, prefix
contents REG, REG contents REG, INT † contents REG, ID †
Performs a call to the function (second argument) for every content element of the given handle (first argument). The context for the calls can be given via arg. If arg is undefined, then a new context is created (and destroyed) automatically.
contents user0, user3
contents user0, function$perentry
control REG, REG control REG, REG, REG control REG, sni:NNN control REG, sni:NNN, REG control REG, sni:NNN, sni:MMM control REG, sni:NNN, INT
Performs a control call on the target register. The second argument is the command, the third is in if any. arg is taken from the arg register. And out is always the out register.
See also SIRTX numerical identifiers (sni) for details on sni values.
control! user0, sni:31, user1
die REG
Dies with the error in the given register. The value is stored in the error
register.
die r0
div out, REG, REG † div out, REG, INT †
Divides the second parameter by the third, storing the result in out.
div out, user0, user1
div out, user0, 7
exit REG
Exits the program (not just returns the function). Stores the value from the given register in out
exit r0
filesize INT
Emits a filesize directive. This is generally only useful in the header section.
The value provided is in bytes. But it may be encoded in other units. The only useful value is size$out$
.
filesize size$out$
jump REG jump INT jump INT {if|unless} out OP VAL [or [out OP VAL [or [out OP VAL [...]]]]]
Jumps to the given position.
If the integer form is used the target must be within range of a relative jump (which it is for all files smaller than 64KiB).
A condition can be provided in some forms beginning with if
or unless
followed by triplets that are or-ed.
The following conditions are supported:
out is valid
: True if the handle is defined and validout is true
: True if the value has a true truth value (default: false)out is notfine
: True if the value indicates a not so fine stateout < 0
, out <= -1
: True if the value is negative numberout == 0
: True if the value is zeroout > 0
, out >= 1
: True if the value is a positive numberjump r0
jump label$repeat
jump label$good if out is true
magic
Writes a format specific magic. Normally this is used as the very first opcode for format detection. If it is encountered anywhere in the code it is equivalent to a noop.
magic
map LREG, PREG
Maps a logical register to a physical register.
map r0 user0
metadata REG, REG, REG
Calls the output function (first argument) for every metadata of tag (second argument), and relation (third argument). The context for the calls can be given via arg. If arg is undefined, then a new context is created (and destroyed) automatically.
metadata user0, user1, user2
mod out, REG, REG † mod out, REG, INT †
Calculates the modulus (integer reminder) of the second value divided by the third, storing the result in out.
mod out, user0, user1
mod out, user0, 8
move REG, REG
Moves a value from one register to another. The source register becomes undef
by the move. To copy a value (like many assembler langauges do with mov
) use replace.
move r0, r1
mul out, REG, arg † mul out, REG, REG † mul out, REG, INT †
Multiplies two values, storing the result in out. Using arg as the second factor results in smaller and faster code.
sub out, user0, arg
sub out, user0, 4
noop
Does nothing (no operation).
noop
open REG, INT open REG, ID open REG, ns, INT open REG, ns, INT, INT open REG, undef †
Opens a handle and stores the value in the given register. The forms using ns will resolve the given id using the type given in ns. If two INTs are given the first one will be used to resolve to a type and the second to do the final resolver step.
open r0, 7
open user3, sni:80
open user3, ns, 13
open_context REG
Creates a new context and stores it in the target register. The context will be child context of the current context (in context).
open_context r0
open_function REG, INT
Creates a handle for the given function.
open_function r0, function$print
relations REG, REG, REG, undef relations REG, undef, REG, REG
Calls the output function (first argument) for every relation of tag (second register), relation (third register), and related (forth register).
One of tag or relation must be undef
.
The context for the calls can be given via arg. If arg is undefined, then a new context is created (and destroyed) automatically.
relations r0, r1, r2, undef
replace REG, REG
Replaces a value in a register with the value of another register. The source register is unchanged.
replace r0, r2
return return REG return undef †
Returns from the current function. If a register is given the value is copied into out before return. If no value (or undef
) is given, out is unref-ed.
return
return user4
rewind REG
Rewinds (seeks to the beginning) the handle in the given register.
rewind r0
seek REG, REG
Seeks the value in the left register to the positon in the right register.
seek r0, r1
sub out, REG, REG † sub out, REG, INT †
Subtracts the third value from the second, storing the result in out.
sub out, user0, user1
sub out, user0, 4
substr REG, REG, INT, INT
Opens a substring of the input register in the output register. The offset and end is given. If the end is given as a relative integer it is understood as the length.
substr rodata, program_text, 0x100, +13
tell REG, REG
Stores the current position of the input register in the output register.
tell r0, r3
transfer REG, REG
Transfers the subject/object from the source register to the target register.
transfer r0, r2
unref REG
Un-references (undefines) the given register.
unref r3
This defines a few best practice rules to follow when writing vmv0 assembler code.
.XXX
and .endXXX
should be indented by four spaces.
This means that a function body (normally inside a .section and .function) is indented by eight spaces most of the time.
Function calls generally happen by creating a context first, adding required data to the context, performing the actual call, and then reading the results from the context and error before the context is finally destroyed. Some of those steps might be skipped or hidden by synthetic directives. Function calls are also used to catch exceptions (see die).
Function calls can be near (to local code) or far (to code with a large relative offset or outside the image). It is best to assume that all calls are far. Calls might be encoded as near calls by the assembler if it considers them a performance gain without side effects. So from a developer perspective one can consider all calls to be far with all rules for them applying. This also means that calls to outside the image are generally no more or only slightly more expensive than calls within the same image.
Data is passes from the function and back via the provided context. Some synthetic calls will automatically allocate a context and destroy it afterwards. This is very usefull if no direct access is needed. Access to in, out, and error is still possible.
On call the value from in is transparently copied into the filter-input port of the context. On return the value from the filter-output is transparently copied into out.
Inside the function the value from filter-input is transparently copied to in. On return the value from out is copied into filter-output. return allows to set out without mapping it first. Functions that return no value can use the void return. .endsection automatically inserts a void return if the last directive was not a form of return. So the return for functions not returning can be skipped.
If more or other values beside the filter ports are required additional ports must be declared on the context. It is important that those ports must match their usage. If needed new tags need to be defined. Data that is available to the current context is also available to the newly created context as the current context is used as a parent context to the new context. See open_context.
The VM v0 binary format defines the outline for native code that can be executed on the SIRTX VM.
The file structure is very flexible to allow for images to be very compact in size. Therefore, many features and structures are optional or may appear in different orders.
However, unless otherwise defined, each file begins with a magic (see magic) or a header section with the corresponding magic (see .section).
Zero or more headers may follow. In most cases the headers are followed by the main part of the program text (code).
Optionally non-executable parts are present at the end such as a rodata
section.
The file may end with a trailer
section containing additional metadata not relevant to execution.
The magic for VM v0 files contains 8 bytes in all cases (if a magic present).
For this and all future versions it is guaranteed that the magic is never valid UTF-8 (and therefore also never valid ASCII).
This is so that it is easy for tools to detect the files as binary.
The magic is 0x00 0b00###111 0x56 0x4d 0x0d 0x0a 0xc0 0x0a
with ###
being any three bit combination.
Opcodes (see VM v0 assembler syntax) always have a size that is a multiple of two bytes and must also be two byte aligned. Data can be at any offset and alignment. Still, a well-formed file should be multiple of 2 bytes long. If all data is contained within a section (see .section) this is automatically the case. An image file is considered valid if only a magic is present, so the minimum filesize (unless otherwise defined) is 8 bytes. VM v0 implementations are required to handle image sizes of at least up to 32512 bytes (215-28) if storage permits.
All values are stored in network byte order (most significant bits first), and all string values as UTF-8 unless otherwise noted.
All opcodes are 16 bit long, optionally being extended with additional 16 bit values called extra. The general format is as follows.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | ... |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
code | P | codeX | S | T | extra |
code
represents the general group for the opcode.
codeX
is used as a subgroup. It often encodes the number of registers used in the opcode.
P
, S
, T
are used to store the register number or additional bits depending on the opcode.
The following rules are used to decode the number of registers.
If one register is used it is encoded in P
.
If two registers are used they are in P
and T
.
If three registers are used it is encoded in P
, T
, and S
.
0 ≤ code ≤ 3
: The number of registers used is in codeX
code == 4
: The number of registers is 3The size of extra
calculated using the following list. The values are always in units of 2 bytes.
0 ≤ code ≤ 1
and 0 ≤ codeX ≤ 1
: The number is in the lower two bits of T
0 ≤ code ≤ 1
and codeX == 3
: The number is 0code == 1
and codeX == 2
and (S & 4) == 4
: The number is 1code == 1
and codeX == 2
and S == (0 + 3)
: The number is 2code == 1
and codeX == 2
and S == (0 + 1)
: The number is 0code == 0
and codeX == 2
: The number is 0code == 3
: The number is 1code == 4
: The number is 0