Matej's V810 assembler
Version 1
Readme file

*** Introduction ***

  MV810ASM is an assembler that produces machine code for the NEC V810
  architecture. It is easy to use and does not make any assumptions about the
  machine or environment the assembled code will run on. It also optionally
  supports Nintendo Virtual Boy-specific instructions and has a ROM-hacking
  mode that makes it easy to develop and test modifications to existing
  machine code files.

*** Usage ***

  The assembler is a command line program. It is run with a command line of
  the form

    mv810asm inFile outFile [options]

  where "mv810asm" is the filename of the assembler, "inFile" is the source
  code file to assemble, "outFile" is the filename of the machine code file
  to produce, and "options", if present, are one or more of the following:

    E name        Creates an identifier with the value of -1 (see
                  "Identifiers" later in this document).

    H rom         Activates ROM-hacking mode. "rom" is the file used as the
                  template for the output file.

    I name value  Creates an identifier with the specified value (see
                  "Identifiers" later in this document). The value must be a
                  number or an already defined identifier.

    L             Lists identifiers to standard output after the assembly
                  is completed. The order in which identifiers are listed is
                  undefined.

    V             Allows usage of Nintendo Virtual Boy-specific instructions.

  The letter of each option must be preceded by the platform-specific option
  character. On DOS, this is user-configurable and a slash is used if the DOS
  used does not support this feature. On Win32, this is a slash. On OS X and
  Haiku, this is a dash. All options are case insensitive.

  If the assembly completes successfully, the assembler will silently quit
  with an error code of 0. Otherwise, error messages will be output to the
  standard error stream and the assembler will quit with a nonzero error code.
  In this case, the contents of the output file are invalid and should not be
  used.

*** Source code syntax ***

  The assembler reads source code line by line. Ignoring comments which begin
  with a semicolon and last until the end of the line, a non-empty line may
  specify a directive, instruction, or pseudoinstruction. All processing is
  case insensitive. Whitespace is ignored except where necessary to
  divide directives, instructions, and pseudoinstructions from their operands.
  Source code files do not need to end with an empty line.

  If the line ends with a colon, it is a label (see "Identifiers" later in
  this document). Otherwise, a word (terminated by whitespace) is read from
  the line and, depending on it, zero or more comma-separated operands may be
  read.

  Register operands may be referred to as "$" or "r" followed by the number of
  the register. Registers 3, 4, and 31 may be referred to as "SP", "GP",
  and "LP" respectively, optionally preceded with a dollar sign.

  System registers (used in the LDSR and STSR instructions) are referred to
  exclusively by their names and may be preceded with a dollar sign.

  Numbers may be decimal or hexadecimal, the latter in "C" (prefixed by "0x")
  or "Intel" (suffixed by "h") format. Hexadecimal numbers may be negative.
  For example, "-0x10" is the same as "-16".

  Strings begin and end with quotes. They are left as they are. No escaping
  mechanism is currently supported.

  Instructions use the usual V810 mnemonics. The SETF instructions have the
  condition specified as part of the mnemonic, not as an operand. The operand
  of the JMP instruction may be specified as "register" or "[register]".

  The source code must be encoded in an 8-bit encoding. If you need Unicode,
  use UTF-8 without a byte order mark.

*** Identifiers ***

  An identifier is a name with an associated 32-bit value. Identifiers may be
  created by labels, directives, or command line options, but they are all
  treated the same way during assembly.

  Identifiers may be global or local. Global identifiers may be created at
  any time and referred to anywhere. Local identifiers begin with a dot and
  are relative to the global label they were created after. They may only be
  referred to within the scope of the same global label or by explicitly
  specifying the global label before their name. Local identifiers may be
  created even outside their usual scope if their global label is prefixed to
  their name.

  The value of a label is the value of the program counter when the label will
  be reached. It is influenced by the !ORG directive (see "Directives" later
  in this document). It is not related to the position in the output file.
  Label values are automatically aligned if necessary, and padding bytes are
  inserted into the output file. The values of the padding bytes are
  undefined.

  Here is an example of global and local labels:

    !CONST A, 100   ; Global identifier "A", value 100
  Fn1:              ; Global identifier "Fn1", value of current PC
    ?mov A, $6      ; $6 is set to 100
    jal Fn2
    ?mov .data, $6  ; $6 is set to ".data"/"Fn1.data", value of the PC there
    ?br Fn3
    ; (Padding bytes may be inserted here to align the following label)
  .data:            ; Local identifier ".data"/"Fn1.data", value of current PC
    ?dw 0x12345678
    ; ...
    ?db 0xFE        ; PC may be unaligned after this line
    ; (A padding byte may be inserted here to align the following label)
  Fn2:              ; Global identifier "Fn2", value of current PC
    !CONST .x, 123  ; Local identifier ".x"/"Fn2.x", value 123
    ?mov .x, $10    ; $10 is set to 123
    ?mov A, $11     ; $11 is set to 100
    ; ...
    jmp $LP
  Fn3:              ; Global identifier "Fn3", value of current PC
    !CONST .x, 456  ; Local identifier ".x"/"Fn3.x", value 456
    ?mov .x, $10    ; $10 is set to 456
    ?mov Fn2.x, $11 ; $11 is set to 123
    ; ...

*** Expressions ***

  Wherever the assembler expects a constant value, an expression may be used.
  As in most other programming languages, expressions are written using infix
  notation.

  The following operators are supported, in order of priority (most to least):

    1. -   Negation

    2. <<  Left shift

    3. *   Multiplication
       /   Division

    4. +   Addition
       -   Subtraction

    5. =   Equal
       <>  Not equal
       <   Less than
       <=  Less than or equal
       >   Greater than
       >=  Greater than or equal

    6. &   Bitwise "and"
       |   Bitwise "or"
       ^   Bitwise exclusive "or"

  Negation may only be used for literals and identifiers. For example, "-A"
  and "-1" are valid, but "-(A + B)" is not. Use "0 - (A + B)" in that case.

  The comparison operators return 0 for false and -1 for true. This lets
  bitwise operators be used for logical operations.

  All calculations are performed on 32-bit signed integers.

*** Directives ***

  Directives influence the code assembled after them. With the potential
  exception of !INCLUDE, they do not insert any bytes into the output file
  themselves.

  !CONST name, value           Creates an identifier with the specified value.

  !ENDIF                       Ends the matching !IF directive.

  !IF condition                Conditionally assembles lines until the
                               matching !ENDIF. If the condition evaluates to
                               0, lines will be ignored until the matching
                               !ENDIF is encountered. Any undefined
                               identifiers will be treated as 0. !IF blocks
                               may be nested.

  !INCLUDE filename            Includes another source file. The file is
                               processed as if its contents were present in
                               the file it is being included from. The
                               filename is a string in quotes. !INCLUDE
                               directives may be nested.

  !ORG address                 Sets the assumed value of the program counter.
                               This is unrelated to the position of the code
                               in the output file. It is used to calculate
                               offsets for branch and jump instructions. The
                               initial program counter value is undefined, so
                               this directive should be used before any labels
                               or instructions.

  !RBASE address               Sets the base address for all following !RB,
                               !RH, and !RW instructions.

  !R{B|H|W} name [, quantity]  Creates an identifier with the specified name
                               at the current !RBASE base address, which is
                               then incremented by the data size (byte,
                               halfword, or word) multiplied by the quantity
                               (assumed to be 1, if omitted). This is useful
                               for reserving locations in RAM for global
                               variables or defining structures.

  !SEEK position               Sets the position of the code in the output
                               file. The position is unrelated to the assumed
                               value of the program counter, so it does not
                               have to be halfword-aligned. Padding bytes will
                               be inserted until the current position matches
                               the one specified. The padding bytes are
                               undefined. In ROM-hacking mode, no padding
                               bytes will be inserted. The initial position
                               is 0.

*** Pseudoinstructions ***

  Pseudoinstructions translate into one or more instructions or insert data
  into the output file.

  ?ADD immediate, register              Produces an immediate ADD instruction
                                        if possible, otherwise an ADDI.

  ?BR destination                       Produces a BR instruction if possible,
                                        otherwise a JR.

  ?CSTRING string                       Inserts a "C string" (a string
                                        followed by a NUL byte) into the
                                        output file.

  ?D{B|H|W} value [, value, ...]        Inserts bytes, halfwords, or words
                                        into the output file. Padding bytes
                                        will be inserted automatically before
                                        and after the pseudoinstruction if
                                        needed. The padding bytes are
                                        undefined. Any number of values may be
                                        specified. For !DW, the values may
                                        also be yet undefined identifiers.
                                        This may be used to create lists of
                                        pointers.

  ?MOV immediate, register              Produces an immediate MOV, MOVEA, or
                                        MOVHI instruction, or a MOVHI/MOVEA
                                        pair, to set a register to a constant.

  ?MOVEA immediate, register, register  Produces a MOVHI or MOVEA instruction
                                        or both to set a register to another
                                        register plus a constant.

  ?PSTRING string                       Inserts a "Pascal string" (length byte
                                        followed by the string) into the
                                        output file. The string may not be
                                        longer than 255 bytes.

  ?STRING string                        Inserts a string as it is into the
                                        output file.

*** ROM-hacking mode ***

  The assembler has a mode intended for modifying existing machine code files.
  In the ROM-hacking mode, the output file is a copy of the template file with
  modifications applied at positions specified by !SEEK directives. This
  allows fast development of patches as modifications are automatically
  applied each time to a copy of the template file and no additional manual
  work has to be done. The H option activates ROM-hacking mode.

*** Credits ***

  MV810ASM was created by Matej Horvat.

  Web site:         http://matejhorvat.si/en/software/mv810asm/
  Electronic mail:  matej.horvat@guest.arnes.si