Qfplib-M0-tiny: a free ARM Cortex-M0 floating-point library in 1 kbyte
Qfplib-M0-tiny is a library of IEEE 754 single-precision floating-point arithmetic routines for microcontrollers based on the ARM Cortex-M0 core (ARMv6-M architecture). It should also run on Cortex-M3 and Cortex-M4 microcontrollers and will give reasonable performance, but it is not optimised for these devices.
Many Cortex-M0 microcontrollers have very little program memory available, and so the primary design goal was to minimise the code size of the library without sacrificing too much in speed or in usefulness. To that end it provides correctly rounded (to nearest, even-on-tie) addition, subtraction, multiplication and division operations, and sine, cosine, tangent, arctangent, logarithm, exponential and square root functions that give a high degree of accuracy. There are also conversion functions between floating-point values and signed or unsigned integer or fixed-point values. The library fits in 1 kbyte of program memory.
If you can afford the luxury of an additional 200 bytes in code size fast divide and square root functions (that do not guarantee correctly rounded results) are also available.
Qfplib-M0-tiny can be built in different ways depending on which functions you need. The functions included are controlled by the symbols include_faster, include_conversions and include_scientific in the source code.
The build always includes the four basic arithmetic operations qfp_fadd, qfp_fsub, qfp_fmul and qfp_fdiv, plus the comparison function qfp_fcmp.
If the symbol include_conversions is set to 1 then conversion routines between floating-point values and integers or fixed-point values are included. These are qfp_float2int, qfp_float2fix, qfp_int2float, qfp_fix2float, qfp_float2uint, qfp_float2ufix, qfp_uint2float and qfp_ufix2float.
If the symbol include_scientific is set to 1 (which implies setting include_conversions to 1), then the following functions are included: qfp_fcos, qfp_fsin, qfp_ftan, qfp_fatan2, qfp_fexp, qfp_fln and qfp_fsqrt.
If the symbol include_faster is set to 1 then a faster but less accurate floating-point division routine qfp_fdiv_fast is also included, as is a fast square root routine qfp_fsqrt_fast.
Qfplib-M0-tiny does not use any static storage. Stack use is parsimonious and statically analysable; recursion is not used.
Code size comparison against other embedded libraries
The standard floating-point library routines that come with the GCC cross-compiler for the Cortex-M0 core occupy about 2700 bytes for the four basic arithmetic functions alone; a trivial program that does nothing but call cosf compiles to 7.5 kbyte of code.
Texas Instruments has released information giving the code size for a number of simple benchmarks compiled for a range of microcontrollers in Table A-4 of document SLAA205C. It appears that the sizes given do not include start-up code (look for example at the compiled size of the ‘8-bit 2-dim matrix.c’ benchmark, which includes a 64 byte constant array).
The ‘floating-point math.c’ benchmark exercises floating-point addition, multiplication and division with very little overhead, and this allows us to make a meaningful comparison with the ‘Basic’ version of Qfplib-M0-tiny.
GCC compiles the body of this benchmark to a rather profligate 54 bytes and the total code size when linked with the ‘Basic’ version of Qfplib-M0-tiny is 382+54=436 bytes.
On page 26 of a presentation on LPC1100 series microcontrollers NXP claims that using the ARM ‘microlib’ library the same benchmark compiles to approximately 620 bytes. So even though microlib makes no attempt at IEEE 754 compliance, it is nevertheless about 50% larger than Qfplib-M0-tiny.
Information from the above sources is summarised in the following table.
The cross-platform fixed-point arithmetic library libfixmath can run on the Cortex-M0 core. According to this page its implementation of the atan2 function is about four times larger than the whole of Qfplib-M0-tiny.
ARM provides a range of floating-point arithmetic routines as part of its CMSIS library. Unfortunately, at least based on an inspection of the part that has been released under a non-proprietary licence (see here), the implementations are poor and do not appear to have been tested thoroughly.
ARM's floating-point cosine routine includes a table of constants, not shared with any other functions, that is already larger than the whole of Qfplib-M0-tiny. The routine produces results about half as accurate as those of Qfplib-M0-tiny.
The following table compares cycle counts for Qfplib-M0-tiny against other libraries. Qfplib-M0-tiny and GCC library results are average values for non-exceptional arguments to the functions, include calling overhead, and are approximate. They were measured using an LPC11U68 microcontroller with single-cycle flash memory. Results for the Micro Digital ‘GoFast’ library—presumably optimised for speed rather than size, judging by its name—are inferred from the timings given on this page for an ARM7TDMI-based processor. The comparison here may not be not strictly fair to Qfplib-M0-tiny as it is not clear from their description whether Micro Digital’s library exploits features available on that processor but not on the Cortex-M0: for example, ARM mode is considerably faster and more flexible than Thumb mode, and the long multiply instructions can be used to advantage in several of the routines. Micro Digital do not appear to provide public information on the code size of their library. The implementation of the basic functions does not appear to be IEEE 754 compliant with regard to rounding.
The ARM CMSIS implementations of the scientific functions, despite their name ‘FastMath’, appear to be many times slower than Qfplib-M0-tiny. For example, the average execution time for ARM's cosine function (compiled using GCC) is about 3880 cycles, virtually independent of the optimisation flags used.
The libfixmath implementations of the basic arithmetic functions are much faster than the floating-point implementations in Qfplib-M0-tiny; the implementation of square root is of comparable speed; and the scientific functions appear to be much slower.
Limitations and deviations from the IEEE 754 standard
Except as noted below, on input and output, NaNs are converted to infinities, denormals are flushed to zero, and negative zero is converted to positive zero. The result of the square root function is not always correctly rounded according to IEEE 754; see the next section for more on function accuracy.
Function ranges and accuracy
Subject to the limitations and deviations mentioned above, the functions qfp_fadd, qfp_fsub, qfp_fmul and qfp_fdiv all produce correctly rounded (to nearest, even-on-tie) results. This has been verified on real hardware against the default library supplied with the GCC cross-compiler using the Berkeley TestFloat suite, plus a further billion or so test cases, both random and contrived.
In the following table, ‘ulp’ means ‘unit in last place’. Where a relative accuracy is quoted (‘(R)’), this means the error in units of the least significant bit of the mantissa of the result. Where an absolute accuracy is quoted (‘(A)’), it means the error in units of 2–24.
qfp_fcmp returns zero if its arguments are equal (negative zero is equal to positive zero) or plus or minus one if its first argument is respectively greater than or less than its second. Input denormals are not flushed to zero; and NaNs are compared respecting their signs and treating them as values beyond ±infinity.
qfp_float2int(x) is equivalent to qfp_float2fix(x,0).
qfp_float2fix(x,y) converts a floating-point value x to a signed fixed point value, with y bits after the binary point. The result is rounded towards –infinity. y can be from –256 to +256. The result is clamped to the available (signed) output range.
qfp_int2float(x) is equivalent to qfp_fix2float(x,0).
qfp_fix2float(x,y) converts a signed fixed point value x with y bits after the binary point to a floating-point value, correctly rounded (to nearest, even-on-tie). y can be from –256 to +256. If the result is outside the representable range, ±infinity is returned as appropriate.
qfp_float2uint, qfp_float2ufix, qfp_uint2float and qfp_ufix2float are the same as qfp_float2int, qfp_float2fix, qfp_int2float and qfp_fix2float, but work with unsigned fixed-point and integer values.
Qfpio: string conversion functions
Qfpio, part of the Qfplib-M0-tiny download from release 20151029, includes two functions for converting between floating-point values and ASCII strings. The functions are qfp_float2str(float f,char*s,unsigned int fmt), which converts a float to a string with flexible control of formatting, and qfp_str2float(float*f,char*p,char**endptr), which performs the reverse conversion. Again, the emphasis is on compactness, without compromising too much in speed or accuracy: the total code size for the two functions is just over 800 bytes. Qfpio does not call any of the other functions in Qfplib-M0-tiny and so can be compiled independently.
Release 20200617. This release fixes a rare bug whereby qfp_fatan2() can give inaccurate results in some cases, and includes some small optimisations for speed and code size.
This page most recently updated Sat 22 May 10:34:36 BST 2021
New: ARM Cortex-M7 cycle counts and dual-issue combinations; Free, fast, and compact ARM Cortex-M0 single- and double-precision floating-point library; Offline SOWPODS checker
Qxw is a free (GPL) crossword construction program. New! Release 20200708 for both Linux and Windows. Non-Roman alphabets, batch mode, multiplex lights, answer treatments, circular and hex grids, jumbled entries, lots more besides. More...
You can order my book, ‘Practical Signal Processing’, directly from CUP or via Hive, Amazon UK or Amazon US.
If you find this site useful or diverting, please consider a donation to NASS (a UK registered charity), to KickAS (in the US), or to a similar body in your own country.
All trademarks used are hereby acknowledged.