SIRTX VM

Information on the SIRTX VM

The SIRTX VM is a part of SIRTX that is used to run code that not part of the image nor interpreted command line (see line command interpreter).

2929 (* 0) "SIRTX VM" ~>

VM v0 assembler syntax

The SIRTX VM supports code in its own optimised format. This special format natively supports the SIRTX API and therefor can be used to compile to a small binary size. The following outlines the syntax for the reference assembler.

The syntax follows many standard assembler patterns. However, opcodes may be different from other architecturs.

Each line of the file may contain comments. A comment starts with ;, //, or # and extends the rest of the line. If the line contains any directive there must be at least one space in between.

Every line may start with any number of whitespaces, end with them. Those are ignored. Between values there must be at least one whitespace or a comma or both.

Each directive may have zero or more values. A value can be of different types, depending on the directive. The first one is normally the destination, output, or target, and the second one the source or input.

It is possible to mark each opcode with a bang (!) to select autodie mode. This will set the autodie flag of the opcode (if any) or insert a autodie directive right after the opcode (if it has no autodie on its own).

Some directives (marked †) listed below are synthetic. This means they don't exist as their own opcode or variant of one but are generated by the assembler. They may be multiple opcodes long. The assembler may optimise them based on its knowledge.

Some (non-synthetic) directives require on a specific register or prefer one. Those generally don't need to be mapped. This requires the stress on the mapping. Common example include the use of arg in control which can be undefined most of the time and can be kept in undefined and unmapped state and the combination of compare and conditional jump which both use out. Automatic mapping is aware of those cases.

Please also see the best practice section below.

Types

BOOL: Boolean
true or false.
INT: Integer
Each integer might be in format 123 for decimal, 0xabc for hex, 0123 for octal, or 0b0101 for binary. Also, each value can be prefixed with a + or - to make it relative. What it is relative to depends on the opcode.
STRING: String
A string is in form "..." or U+1234 (single character Unicode string). If in double quotes, backslashes are used to escape. \\ escapes single backslash, \0 escapes single 0-byte, \n escapes LF/new line (U+000A), \t escapes HT/horizontal tab (U+0009), \e escapes ESC/escape (U+001B). Also, the form \xAB can be used to escape any byte, note that in most cases those most form correct UTF-8 strings. For example U+00FC is correctly escaped \xC3\xBC, not \xFC.
LREG: Logical register
Logical registers must be given as rN with N being 0 to 7.
PREG: Physical registers
Physical registers are given by their full name, e.g. user7 or program_text.
REG: General register syntax
Most cases where a register is given, both a logical and a physical register name can be used. Unless automapping is active, physical names can often only be used if they are currently mapped to a logical register. See map. In directives that accept register lists one can also use r*, and user*.
ID: Identifier
Identifiers are constants that are given using a type and a value seperated by a colon. Common are identifiers with a type of sni, e.g. sni:80. Only numerical values are currently supported. Some values from the raes space are however automatically converted to raen.
VAR: Variable
Variables are used to allow more symbolic programming. They can be set to any other kind of type from this list. Variable names contain a $, author defined ones start with it.
NAME: A alias name, this is the part after the last $ in a VAR.

Index

Directives and opcodes

Below is a list of directives and opcodes. Note that directives and opcodes may generate a variable amount of actual code depending on their parameters and context.

.align

.align INT

Aligns the current location to at least INT bytes

Example
.align 8

.autosection

.autosection SECTIONTYPE

Automatically creates a section of the given type. The content of the section is taken from other directives and defaults.

Example
.autosection header

.autosectionstart

.autosectionstart SECTIONTYPE

Starts a section with automatically generated content. This is the same as .autosection but the section is not yet ended with .endsection. This is useful if additional content is to be added to the section.

Example
.autosectionstart header
; more content
.endsection

.bug

.bug

Marks a point in code as only reachable by means of a bug. Generates code that will throw an exception.

Example
.bug

.byte

.align INT [, INT [, INT [, ...]]]

Outputs the given bytes.

Example
.bytes 0x56, 0x4D

.cat

.cat STRING [, STRING [, STRING [, ...]]]

Includes files as-is into the output. No translation is performed when including file files. This is mostly useful to include large binary objects (blobs).

Example
.cat "help.txt"

.cold

.cold PREG [, PREG [, PREG [, ...]]]

Mark a given physical register as cold. Cold registers are unlikely to be used/referenced. This allows the assembler to optimise better.

Example
.cold arg

.endfunction

.endfunction

Ends a function started with .function. This will also emit a corresponding end$function$FUNCTION and size$function$FUNCTION alias. If the function did not end with a returning opcode a void return is inserted before closing the function.

Example
.endfunction

.endlabel

.endlabel

Generate a end mark for a label. This can later be referend using end$label$LABEL.

Example
.endlabel

.endsection

.endsection

Ends a section started with .section

Example
.endsection

.force_mapped

.force_mapped REG [, REG [, REG [, ...]]]

Force the automatic mapping of registers (see .regmap_auto) to map a given set of registers. This is normally not needed, but can be used to move mappings outside small loops to improve performance.

Example
.force_mapped io, user7

.function

.function NAME

Marks the start of a function. This emits a function$NAME alias. It also informs the assembler that the code is only reachable via a function call. This changes the assembler's understanding of the state of the registers. Functions are normally to be terminated using .endfunction.

Example
.function main

.hot

.hot PREG [, PREG [, PREG [, ...]]]

Marks a given physical register as hot. Hot registers are likely to be used frequently. This allows the assembler to optimise better.

Example
.hot user3, out

.include

.include STRING [, STRING [, STRING [, ...]]]

Includes source files and translates them. The .include directive does not create a new context of any kind. The files are included as if their content were in the main file.

Example
.include "lib.vmv0-asm"

.label

.label NAME

Adds an alias for the given label, it can later be referenced using label$NAME.

Example
.label blubber

.lukewarm

.lukewarm PREG [, PREG [, PREG [, ...]]]

Marks physical registers as lukewarm. This tells the assmeber that they are not specically hot nor cold. This is the default temperature for registers.

Example
.lukewarm user2, error

.map

.map LREG, PREG

Marks a logical register to be mapped to a physical register in the assembler's internal state without actually emitting an opcode. This is useful to set a correct state e.g. after a jump when the assembler can no longer know the actual mapping. To actually emit an opcode use map.

Example
.map r0 user0

.mine

.mine REG [, REG [, REG [, ...]]]

This marks registers as owned by the author. If a register is owned by the authour the assembler will not try to use it for automatic code generation (such as auto mapping). This is the default for registers.

Example
.mine r0, r1, r2, out

.noops

.noops INT

Inserts the given number of noops.

Example
.noops 4

.not_implemented

.not_implemented

Marks a section of code as not (yet) implemented. Generates code that throws a corresponding exception.

Example
.not_implemented

.org

.org INT

Seeks the output to the given position. If the value is relative moves relative to the current position

Example
.org 0x100

.popname

.popname VAR [, VAR [, VAR [, ...]]]

Pops a name of the name stack. See .pushname for details.

Example
.popname $x, $y

.pushname

.pushname VAR = VALUE [, VAR = VALUE [, VAR = VALUE [, ...]]]

Pushes name-value-pairs onto the name stack. Each name must start with a single $. The value can be any value (register, constant, ...) but another name. This mechanism can be used to assign more memorable names to registers or other constants. Those names can be used at any point. Names are pushed onto the stack, meaning if a name becomes pushed again it shadows the old content. The old content will be come visible again after .popname.

Example
.pushname $x = 5, $y = user7

quit

.quit

Quits the assembler, the rest of the file is ignored.

Example
.quit

.regattr

.regattr REG ATTR [, ATTR [, ...]]

Sets register attributes. The following attributes are defined:

  • mine, yours, theirs: Sets register ownership
  • cold, lukewarm, hot: Sets register temperature
Examples
.regattr r0 mine, hot
.regattr user3 theirs, cold

.regmap_auto

.regmap_auto BOOL

Enables or disables automatic register mapping by the assembler. In order to work as expected some logical registers must be declared .yours. The number of registers that should be declared .yours depend on the code. A general starting point would be three or four.

Example
.regmap_auto true

.section

.section SECTIONTYPE [, STRING ]

Starts a section of the given type. If another section is already open it first needs to be ended with .endsection. Optionally a magic can be given as a second argument. Consider starting sections with .autosectionstart. This will include automatically generated and default content into the section.

Example
.section header

.synthetic_auto_unref

.synthetic_auto_unref BOOL

Enables (default) or disables automatich unref-ing of temporary registers in synthetic opcodes. If enabled the assembler will try to unref all temporary registers immediately. This is normally the safter option, but might not be the most performant one. This option can be used to skip this step.

Example
.synthetic_auto_unref true

.theirs

.theirs REG [, REG [, REG [, ...]]]

Marks a register as owned not by the author nor the assambler. This can be useful if it is for example owned by a library or other module.

Example
.theirs deep, user4

.utf8

.utf8 STRING [, STRING, [, ...]]]

Writes the given strings as-is to the output. No final null-byte is inserted. However it can be manually added using \0. Any escaped characters within must be valid UTF-8.

Note that the source file must be in UTF-8 (it always must). If it is not, strange things might happen.

Example
.utf8 "Hello World\n", "Good afternoon!\0", U+1F981

.yours

.yours REG [, REG [, REG [, ...]]]

Marks a register as owned by the assembler. The assembler is free to use it however it sees fit. The author should avoid using those registers after being declared yours.

Examples
.yours r4, r5, r6, r7
.yours r*, arg, user7

add

add out, REG, REG 
add out, REG, INT 

Adds the given two values, storing the result in out.

Examples
add out, user0, user1
add out, user0, 12

autodie

autodie

Dies with the error in error if error is not undef and not RoarAudio error code 0 (none).

Example
autodie

byte_transfer

byte_transfer! REG, REG, INT
byte_transfer REG, REG, INT

Transfers the given number of bytes from the source to the destination register. Optionally dies if the number of bytes actually transferred is less than the number of bytes requested. The transferred length is stored in out.

Example
byte_transfer! r0, r2, 15

call

call? REG, REG
call! REG, REG
call REG 
call INT 
call ID 
call REG, INT 
call REG, ID 

Calls the function (second argument) using the given context (first argument Before the call the filter-output port is reset, the filter input port is set to in. After successful return out is set to the filter output port.

If the only one argument is passed, the sole argument is the function to call. It will be run in a fresh context. This might have performance implications (both ways).

Example
call! r0, r1

compare

compare out, REG, REG [, with FLAG [, FLAG [, FLAG [, ...]]]]

Compares two registers storing the result in out. Optionally a number of flags can be passed. The result is a number that is smaller than zero if the first input is smaller than the second, zero if they are equal, and bigger than zero if the first input is bigger than the second. The following flags are supported:

  • icase: Compare case insensitive
  • asciz: Compare up to a 0-byte
  • prefix: The first input is in the second but may have a prefix in front
  • suffix: The first input is in the second but may have a suffix after it
  • nulls_distinct: Null values are distinct
  • nulls_equal: Null values are equal
  • nulls_first: Null values are considered smaller than any other value
  • nulls_last: Null values are considered bigger than any other value
  • seekback_start: Preforms a seekback on the second input to match the start of the submatch of the first input
  • seekback_end: Preforms a seekback on the second input to match the end of the submatch of the first input
  • subject: True if the subjects match, even if the handles are distinct handles
Examples
compare out, user0, user4
compare out, user0, user4 with icase
compare out, user0, user4 with nulls_first, prefix

contents

contents REG, REG
contents REG, INT 
contents REG, ID 

Performs a call to the function (second argument) for every content element of the given handle (first argument). The context for the calls can be given via arg. If arg is undefined, then a new context is created (and destroyed) automatically.

Examples
contents user0, user3
contents user0, function$perentry

control

control REG, REG
control REG, REG, REG
control REG, sni:NNN
control REG, sni:NNN, REG
control REG, sni:NNN, sni:MMM
control REG, sni:NNN, INT

Performs a control call on the target register. The second argument is the command, the third is in if any. arg is taken from the arg register. And out is always the out register.

See also SIRTX numerical identifiers (sni) for details on sni values.

Example
control! user0, sni:31, user1

die

die REG

Dies with the error in the given register. The value is stored in the error register.

Example
die r0

div

div out, REG, REG 
div out, REG, INT 

Divides the second parameter by the third, storing the result in out.

Examples
div out, user0, user1
div out, user0, 7

exit

exit REG

Exits the program (not just returns the function). Stores the value from the given register in out

Example
exit r0

filesize

filesize INT

Emits a filesize directive. This is generally only useful in the header section. The value provided is in bytes. But it may be encoded in other units. The only useful value is size$out$.

Example
filesize size$out$

jump

jump REG
jump INT
jump INT {if|unless} out OP VAL [or [out OP VAL [or [out OP VAL [...]]]]]

Jumps to the given position. If the integer form is used the target must be within range of a relative jump (which it is for all files smaller than 64KiB). A condition can be provided in some forms beginning with if or unless followed by triplets that are or-ed. The following conditions are supported:

  • out is valid: True if the handle is defined and valid
  • out is true: True if the value has a true truth value (default: false)
  • out is notfine: True if the value indicates a not so fine state
  • out < 0, out <= -1: True if the value is negative number
  • out == 0: True if the value is zero
  • out > 0, out >= 1: True if the value is a positive number
Examples
jump r0
jump label$repeat
jump label$good if out is true

magic

magic

Writes a format specific magic. Normally this is used as the very first opcode for format detection. If it is encountered anywhere in the code it is equivalent to a noop.

Example
magic

map

map LREG, PREG

Maps a logical register to a physical register.

Example
map r0 user0

metadata

metadata REG, REG, REG

Calls the output function (first argument) for every metadata of tag (second argument), and relation (third argument). The context for the calls can be given via arg. If arg is undefined, then a new context is created (and destroyed) automatically.

Example
metadata user0, user1, user2

mod

mod out, REG, REG 
mod out, REG, INT 

Calculates the modulus (integer reminder) of the second value divided by the third, storing the result in out.

Examples
mod out, user0, user1
mod out, user0, 8

move

move REG, REG

Moves a value from one register to another. The source register becomes undef by the move. To copy a value (like many assembler langauges do with mov) use replace.

Example
move r0, r1

mul

mul out, REG, arg 
mul out, REG, REG 
mul out, REG, INT 

Multiplies two values, storing the result in out. Using arg as the second factor results in smaller and faster code.

Examples
sub out, user0, arg
sub out, user0, 4

noop

noop

Does nothing (no operation).

Example
noop

open

open REG, INT
open REG, ID
open REG, ns, INT
open REG, ns, INT, INT
open REG, undef 

Opens a handle and stores the value in the given register. The forms using ns will resolve the given id using the type given in ns. If two INTs are given the first one will be used to resolve to a type and the second to do the final resolver step.

Examples
open r0, 7
open user3, sni:80
open user3, ns, 13

open_context

open_context REG

Creates a new context and stores it in the target register. The context will be child context of the current context (in context).

Example
open_context r0

open_function

open_function REG, INT

Creates a handle for the given function.

Example
open_function r0, function$print

relations

relations REG, REG, REG, undef
relations REG, undef, REG, REG

Calls the output function (first argument) for every relation of tag (second register), relation (third register), and related (forth register). One of tag or relation must be undef. The context for the calls can be given via arg. If arg is undefined, then a new context is created (and destroyed) automatically.

Example
relations r0, r1, r2, undef

replace

replace REG, REG

Replaces a value in a register with the value of another register. The source register is unchanged.

Example
replace r0, r2

return

return
return REG
return undef 

Returns from the current function. If a register is given the value is copied into out before return. If no value (or undef) is given, out is unref-ed.

Examples
return
return user4

rewind

rewind REG

Rewinds (seeks to the beginning) the handle in the given register.

Example
rewind r0

seek

seek REG, REG

Seeks the value in the left register to the positon in the right register.

Example
seek r0, r1

sub

sub out, REG, REG 
sub out, REG, INT 

Subtracts the third value from the second, storing the result in out.

Examples
sub out, user0, user1
sub out, user0, 4

substr

substr REG, REG, INT, INT

Opens a substring of the input register in the output register. The offset and end is given. If the end is given as a relative integer it is understood as the length.

Example
substr rodata, program_text, 0x100, +13

tell

tell REG, REG

Stores the current position of the input register in the output register.

Example
tell r0, r3

transfer

transfer REG, REG

Transfers the subject/object from the source register to the target register.

Example
transfer r0, r2

unref

unref REG

Un-references (undefines) the given register.

Example
unref r3
7422 (* 0) "VM v0 assembler syntax" ~>

VM v0 assembler best practice

This defines a few best practice rules to follow when writing vmv0 assembler code.

Common rules

  • Use automatic register mapping as much as possible. See .regmap_auto. It is hardly needed to map any registers manually. In some cases it might be good to provide a hint using .force_mapped. Some synthetic directives require user registers or arg (or other registers) to be handled by the assembler. Those need to be set to .yours for that. Therefore, it is sometimes best to just mark all registers as .yours and then only re-claim those actually needed via .mine.
  • As the VM holds handles not mere numbers in registers it is important to unref your values once you are done. However, do not do this right before returning or dieing for registers that are restored by such action. See Common registers.
  • Blocks between any .XXX and .endXXX should be indented by four spaces. This means that a function body (normally inside a .section and .function) is indented by eight spaces most of the time.

Function calls

Function calls generally happen by creating a context first, adding required data to the context, performing the actual call, and then reading the results from the context and error before the context is finally destroyed. Some of those steps might be skipped or hidden by synthetic directives. Function calls are also used to catch exceptions (see die).

Function calls can be near (to local code) or far (to code with a large relative offset or outside the image). It is best to assume that all calls are far. Calls might be encoded as near calls by the assembler if it considers them a performance gain without side effects. So from a developer perspective one can consider all calls to be far with all rules for them applying. This also means that calls to outside the image are generally no more or only slightly more expensive than calls within the same image.

Data is passes from the function and back via the provided context. Some synthetic calls will automatically allocate a context and destroy it afterwards. This is very usefull if no direct access is needed. Access to in, out, and error is still possible.

On call the value from in is transparently copied into the filter-input port of the context. On return the value from the filter-output is transparently copied into out.

Inside the function the value from filter-input is transparently copied to in. On return the value from out is copied into filter-output. return allows to set out without mapping it first. Functions that return no value can use the void return. .endsection automatically inserts a void return if the last directive was not a form of return. So the return for functions not returning can be skipped.

If more or other values beside the filter ports are required additional ports must be declared on the context. It is important that those ports must match their usage. If needed new tags need to be defined. Data that is available to the current context is also available to the newly created context as the current context is used as a parent context to the new context. See open_context.

3929 (* 0) "VM v0 assembler best practice" ~>

VM v0 binary format

The VM v0 binary format defines the outline for native code that can be executed on the SIRTX VM.

File structure and headers

The file structure is very flexible to allow for images to be very compact in size. Therefore, many features and structures are optional or may appear in different orders. However, unless otherwise defined, each file begins with a magic (see magic) or a header section with the corresponding magic (see .section). Zero or more headers may follow. In most cases the headers are followed by the main part of the program text (code). Optionally non-executable parts are present at the end such as a rodata section. The file may end with a trailer section containing additional metadata not relevant to execution.

The magic for VM v0 files contains 8 bytes in all cases (if a magic present). For this and all future versions it is guaranteed that the magic is never valid UTF-8 (and therefore also never valid ASCII). This is so that it is easy for tools to detect the files as binary. The magic is 0x00 0b00###111 0x56 0x4d 0x0d 0x0a 0xc0 0x0a with ### being any three bit combination.

Opcodes (see VM v0 assembler syntax) always have a size that is a multiple of two bytes and must also be two byte aligned. Data can be at any offset and alignment. Still, a well-formed file should be multiple of 2 bytes long. If all data is contained within a section (see .section) this is automatically the case. An image file is considered valid if only a magic is present, so the minimum filesize (unless otherwise defined) is 8 bytes. VM v0 implementations are required to handle image sizes of at least up to 32512 bytes (215-28) if storage permits.

All values are stored in network byte order (most significant bits first), and all string values as UTF-8 unless otherwise noted.

Opcodes

All opcodes are 16 bit long, optionally being extended with additional 16 bit values called extra. The general format is as follows.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
code P codeX S T extra

code represents the general group for the opcode. codeX is used as a subgroup. It often encodes the number of registers used in the opcode. P, S, T are used to store the register number or additional bits depending on the opcode.

The following rules are used to decode the number of registers. If one register is used it is encoded in P. If two registers are used they are in P and T. If three registers are used it is encoded in P, T, and S.

  1. For 0 ≤ code ≤ 3: The number of registers used is in codeX
  2. for code == 4: The number of registers is 3
  3. Die

The size of extra calculated using the following list. The values are always in units of 2 bytes.

  1. For 0 ≤ code ≤ 1 and 0 ≤ codeX ≤ 1: The number is in the lower two bits of T
  2. For 0 ≤ code ≤ 1 and codeX == 3: The number is 0
  3. For code == 1 and codeX == 2 and (S & 4) == 4: The number is 1
  4. For code == 1 and codeX == 2 and S == (0 + 3): The number is 2
  5. For code == 1 and codeX == 2 and S == (0 + 1): The number is 0
  6. For code == 0 and codeX == 2: The number is 0
  7. For code == 3: The number is 1
  8. For code == 4: The number is 0
  9. Die
8781 (* 0) "VM v0 binary format" ~>