Convention to represent floating point numbers
Contains:
- sign
- 0 for positive, 1 for negative
- exponent
- If exponent is 2, need to do the following for (32-bit):
- 2 + 127 = 129
- Store 129 (as binary) in excess-127
- If exponent is 2, need to do the following for (32-bit):
- mantissa
- is normalised, with an implicit leading bit 1
- e.g. becomes
- In the above example, only 1012 is stored in the mantissa field (since 1 is implicit!)
Two Formats
- Single Precision (32 bits)
- 1-bit sign
- 8-bit exponent (excess-127)
- 23-bit mantissa
- Double Precision (64 bits)
- 1-bit sign
- 11-bit exponent (excess 1023)
- 52-bit mantissa
After getting the 32/64-bit representation, we may convert to hexadecimal to improve readability.