IEEE Floating Point Format

The IEEE 754 standard for binary floating point arithmetic defines what is commonly referred to as “IEEE floating point”. MIMOSA utilizes the 32-bit IEEE floating point format:

N = 1.F × 2^E-127

where N = floating point number, F = fractional part in binary notation, E = exponent in bias 127 representation

In the 32 bit IEEE format, 1 bit is allocated as the sign bit, the next 8 bits are allocated as the exponent field, and the last 23 bits are the fractional parts of the normalized number.

     Sign  Exponent   Fraction 
     0     00000000   00000000000000000000000
Bit  31    [30 - 23]  [22        -         0]

A sign bit of 0 indicates a positive number, and a 1 is negative. The exponent field is represented by “excess 127 notation”. The 23 fraction bits actually represent 24 bits of precision, as a leading 1 in front of the decimal point is implied.

There are some exceptions:

E = 255; F = 0;
=> +/- infinity

E = 255; F != 0;
=> NaN, Not a number. Overflow, error.

E = 0; F = 0;
=> 0

E = 0; F != 0;
=> Denormalized, tiny number, smaller than smallest allowed.

With exponent field 00000000 and 11111111 now reserved, the range is restricted to 2^-126 to 2¹²⁷.

Example

Suppose that we want to convert 9 97/128 into a IEEE 32 bit format. The process is:

1. Convert to base 2.

1001.1100001

2. Shift number to the form of 1.FFFFFF × 2^E:

1.0011100001 × 2³

3. Add 127 (excess 127 code) to exponent field and convert to binary:

3 + 127 = 130 = 10000010

4. Determine the sign bit. If a negative number, set to 1. Otherwise set to 0.

5. Now put the numbers together, using only the fractional part of the number represented by step 2 above (remove the 1. preceding the fractional part):

0 10000010 00111000010000000000000

in hex representation, this is

0100 0001 0001 1100 0010 0000 0000 0000

or in Hex format

411C2000

6. Finally, MIMOSA requires the low-order byte first.

00201C41