AMOS:Sourcecode file format

From Amiga Coding
Jump to: navigation, search

AMOS source code is normally stored in a file with the extension ".AMOS". It begins with 16 bytes of ASCII text from the following list:


Text Tested? Saved from which AMOS?
"AMOS Pro101V\0\0\0\0" Yes AMOS Professional
"AMOS Pro101v\0\0\0\0" No AMOS Professional
"AMOS Basic V134 " Yes AMOS Pro, but AMOS 1.3 compatible
"AMOS Basic v134 " No AMOS Pro, but AMOS 1.3 compatible
"AMOS Basic V1.3 " Yes AMOS The Creator v1.3
"AMOS Basic v1.3 " No AMOS The Creator v1.3
"AMOS Basic V1.00" Yes AMOS The Creator v1.0 - v1.2
"AMOS Basic v1.00" No AMOS The Creator v1.0 - v1.2


As can be seen from the table, the 12th character in the text is either "V", which means "tested", or "v", which means "not tested". "Tested" in this case refers to whether the AMOS interpreter has performed a syntax check on all lines of code, and found no syntax errors. While you can save AMOS source code to disk at any time, you can only run it or compile it if it has been tested first.


After the 16 byte header is a 4-byte 32-bit unsigned integer stating the number of bytes of tokenised BASIC code. This is immediately followed by the BASIC code itself, for the length given.


Finally, after the BASIC code, a 4-bytes ASCII identifier "AmBs" is given, followed by a 2-byte 16-bit unsigned integer with the number of memory banks to follow. This is followed by the banks themselves, individually sized. Each bank can either be a sprite bank, an icon bank or a regular memory bank. There is no more data in the source code file after this. If a sprite bank is given, it always occupies bank 1 and there must not be another sprite bank or regular memory bank with a bank number of 1. If an icon bank is given, it always occupies bank 2 and there must not be another icon bank or regular memory bank with a bank number of 2.


Tokenised BASIC code format

The tokenised BASIC code is a stream of tokenised lines. Each tokenised line has the following format:

  • 1 byte: The length of this line in words (2 bytes), including this byte. To get the length of the line in bytes, double this value.
  • 1 byte: The indent level of this line. AMOS automatically indents lines to show program structure. If printing this line as ASCII text, you should print {indent level + 1} space characters as the beginning of the line, or no spaces if the value is less than 2.
  • many bytes: a sequence of tokens. Each token is at least two bytes, and all tokens are rounded to to a multiple of two bytes. Each token is individually sized. The tokens always end with a compulsory null token.

AMOS considers each token as a signed 16-bit number. Token values between 0x0000 and 0x004E are special printing and have differing sizes, all others are simply a signed offset into AMOS's internal token table. The text of the token in the internal token table is what should be printed. Some of these tokens have special size rules, all others are 2 bytes in size.


Specially printed tokens

Token Type Interpretation
0x0000 null token Marks the end of line. Always 2 bytes long.
0x0006 Variable reference, e.g. Print XYZ
  • 2 bytes: token (0x0006, 0x000C, 0x0012 or 0x0018)
  • 2 bytes: unknown purpose
  • 1 byte: length of the ASCII string for the variable or label name
  • 1 byte: flags, for tokens 0x0006, 0x0012 and 0x0018:
    • bit 1 set: this is a floating point reference, e.g. "XYZ#"
    • bit 2 set: this is a string reference, e.g. "XYZ$"
  • many bytes: the ASCII string, with the above-given length.

The ASCII string is null terminated and its length is rounded up to a multiple of two.

0x000C Label, e.g. XYZ: or 190 at the start of a line
0x0012 Procedure call reference, e.g. XYZ["hello"]
0x0018 Label reference, e.g. Goto XYZ
0x0026 String with double quotes, e.g. "XYZ"
  • 2 bytes: token (0x0026 or 0x002E)
  • 2 bytes: the length of the string
  • many bytes: the ASCII string, with the above given length


The ASCII string is null terminated and its length is rounded up to a multiple of two.

0x002E String with single quotes, e.g. 'XYZ'
0x001E Binary integer value, e.g. %100101
  • 2 bytes: token (0x001E, 0x0036 or 0x003E)
  • 4 bytes: the integer value
0x0036 Hexidecimal integer value, e.g. $80FAA010
0x003E Decimal integer value, e.g. 1234567890
0x0046 Floating point value, e.g. 3.1452
  • 2 bytes: token (0x0046)
  • 4 bytes: the single-precision floating point value.
    • bits 31-8: mantissa (24 bits)
    • bit 7: sign bit. Positive if 0, negative if 1
    • bits 6-0: exponent


An exponent of 0 means 0.0, regardless of mantissa. Counting from MSB (23) to LSB (0), each bit set in the mantissa is 2^(mantissa_bit + exponent - 88)

0x004E Extension command
  • 2 bytes: token (0x004E)
  • 1 byte: extension number (1 to 26)
  • 1 byte: unused
  • 2 bytes: signed 16-bit offset into extension's token table


Specially sized tokens

Token Type Interpretation
0x064A Rem

Print the remark string in addition to the remark token.

  • 2 bytes: token (0x064A or 0x0652)
  • 1 byte: unused
  • 1 byte: length of remark string
  • many bytes: the ASCII remark string, with the above-given length.

The ASCII string is null terminated and its length is rounded up to a multiple of two.

0x0652 Rem type 2
0x023C For
  • 2 bytes: token (0x023C, 0x0250, 0x0268, 0x027E, 0x02BE, 0x02D0 or 0x0404)
  • 2 bytes: unknown purpose
0x0250 Repeat
0x0268 While
0x027E Do
0x02BE If
0x02D0 Else
0x0404 Data
0x0290 Exit If
  • 2 bytes: token (0x0290, 0x029E or 0x0376)
  • 4 bytes: unknown purpose
0x029E Exit
0x0316 On
0x0376 Procedure
  • 2 bytes: token (0x0376)
  • 4 bytes: number of bytes to corresponding End Proc line
(start of line + 8 + above = start of End Proc line)
(start of line + 8 + 6 + above = line after End Proc line)
  • 2 bytes: part of seed for encryption
  • 1 byte: flags
    • bit 7: if set, procedure is folded
    • bit 6: if set, procedure is locked and shouldn't be unfolded
    • bit 5: if set, procedure is currently encrypted
    • bit 4: if set, procedure contains compiled code and not tokens
  • 1 byte: part of seed for encryption


Procedure decryption source code

If you should find a procedure (0x0376) token with the "is encrypted" bit set, run this C function on the code and it will decrypt the contents of the procedure.

<code><pre>
/* fetches a 4-byte integer in big-endian format */
#define EndGetM32(a)  ((((a)[0])<<24)|(((a)[1])<<16)|(((a)[2])<<8)|((a)[3]))
/* fetches a 2-byte integer in big-endian format */
#define EndGetM16(a)  ((((a)[0])<<8)|((a)[1]))

void decrypt_procedure(unsigned char *src) {
  unsigned char *line, *next, *endline;
  unsigned int key, key2, key3, size;

  /* ensure src is a pointer to a line with the PROCEDURE token on it */
  if (EndGetM16(&src[2]) != 0x0376) return;

  /* do not operate on compiled procedures */
  if (src[10] & 0x10) return;

  /* size+8+6 is the start of the line after ENDPROC */
  size = EndGetM32(&src[4]);
  endline = &src[size+8+6];
  line = next = &src[src[0] * 2];

  /* initialise encryption keys */
  key = (size << 8) | src[11];
  key2 = 1;
  key3 = EndGetM16(&src[8]);

  while (line < endline) {
    line = next;
    next = &line[line[0] * 2];

    /* decrypt one line */
    for (line += 4; line < next;) {
      *line++ ^= (key >> 8) & 0xFF;
      *line++ ^=  key       & 0xFF;
      key  += key2;
      key2 += key3;
      key = (key >> 1) | (key << 31);
    }
  }
  src[10] ^= 0x20; /* toggle "is encrypted" bit */
}
</pre></code>