The IEEE 754 standard for binary floating point arithmetic defines what is commonly referred to as “IEEE floating point”. MIMOSA utilizes the 32-bit IEEE floating point format:
N = 1.F × 2E-127
where N = floating point number, F = fractional part in binary notation, E = exponent in bias 127 representation
In the 32 bit IEEE format, 1 bit is allocated as the sign bit, the next 8 bits are allocated as the exponent field, and the last 23 bits are the fractional parts of the normalized number.
Sign Exponent Fraction 0 00000000 00000000000000000000000 Bit 31 [30 - 23] [22 - 0]
A sign bit of 0 indicates a positive number, and a 1 is negative. The exponent field is represented by “excess 127 notation”. The 23 fraction bits actually represent 24 bits of precision, as a leading 1 in front of the decimal point is implied.
There are some exceptions:
E = 255; F = 0; => +/- infinity E = 255; F != 0; => NaN, Not a number. Overflow, error. E = 0; F = 0; => 0 E = 0; F != 0; => Denormalized, tiny number, smaller than smallest allowed.
With exponent field 00000000 and 11111111 now reserved, the range is restricted to 2-126 to 2127.
Example
Suppose that we want to convert 9 97/128 into a IEEE 32 bit format. The process is:
1. Convert to base 2.
1001.1100001
2. Shift number to the form of 1.FFFFFF × 2E
:
1.0011100001 × 23
3. Add 127 (excess 127 code) to exponent field and convert to binary:
3 + 127 = 130 = 10000010
4. Determine the sign bit. If a negative number, set to 1. Otherwise set to 0.
5. Now put the numbers together, using only the fractional part of the number represented by step 2 above (remove the 1.
preceding the fractional part):
0 10000010 00111000010000000000000
in hex representation, this is
0100 0001 0001 1100 0010 0000 0000 0000
or in Hex format
411C2000
6. Finally, MIMOSA requires the low-order byte first.
00201C41