How to Calculate a Binary Floating Point

Written by carlos mano
  • Share
  • Tweet
  • Share
  • Pin
  • Email
How to Calculate a Binary Floating Point
Binary coding can be used to represent real numbers. (Jupiterimages/ Images)

Floating point is the way computers represent real numbers -- numbers with decimal points. Floating point formats are in two parts. The longer part is called the mantissa and contains the actual digits in the number. The shorter part is called the exponent and indicates where the decimal point goes. One or two of the bits in the floating point format are reserved for sign bits -- each manufacturer of computers sets up the floating point formats slightly differently.

Skill level:


  1. 1

    Calculate the whole number part -- the part to the left of the decimal point -- with a series of divisions. Divide the number by two and note the remainder. Continue dividing the quotients by 2, and noting the remainders, until the quotient is zero. The remainders in the reverse order that they appeared constitute the binary representation of the number. For example, to calculate the binary floating point of 14.5625, we would start by calculating the binary representation of 14. Divide 14 by 2 to get 7 with remainder 0. Divide 7 by 2 to get 3 with remainder 1. divide 3 by 2 to get 1 with remainder 1 and divide 1 by 2 to get 0 with remainder 1. This means that 14 decimal equals 1110 binary.

  2. 2

    Calculate the fraction part -- the part to the right of the decimal point -- with a series of multiplications. Multiply the number by two and note the whole part of the answer -- it will be either 0 or 1. Record the whole part and continue multiplying the fractional parts by 2 until the fractional part is gone. The recorded whole parts will be the binary fraction. To calculate the fraction part of 14.5625 we first multiply 0.5625 by 2 to get 1.125. Record the 1 and multiply 0.125 by 2 to get 0.25. Record the 0 and multiply 0.25 by 2 to get 0.5. Record the 0 and multiply 0.5 by 2 to get 1.0. Record the 1 and stop. This means that 0.5625 decimal equals 0.1001.

  3. 3

    Put the whole number part and the fraction part together. 14.5625 decimal is 1110 + 0.1001 = 1110.1001 binary. In floating point notation, the mantissa is 11101001 and the exponent is 4 which is 100 in binary. If this is a 16 bit machine and floating point numbers are set up with 11 for the mantissa followed by 5 for the exponent, the floating point representation would be 000111010010100.

Tips and warnings

  • One or two of the bits will be a sign bits. In some formats, all exponents are positive and only the mantissa has a sign.
  • On most machines you cannot change the sign of a floating point number by simply changing the sign bit. Negative numbers are stored in "two's complement" format. To change the sign of a number you need to flip all of the bits and add one -- this automatically changes the sign bit.

Don't Miss

  • All types
  • Articles
  • Slideshows
  • Videos
  • Most relevant
  • Most popular
  • Most recent

No articles available

No slideshows available

No videos available

By using the site, you consent to the use of cookies. For more information, please see our Cookie policy.