Next Article in Journal
A Topology Generation and Synthesis Method for Boost Converters Based on Inductive Volt-Second Balance Theory
Previous Article in Journal
Driving Intention Inference Based on a Deep Neural Network with Dropout Regularization from Adhesion Coefficients in Active Collision Avoidance Control Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast and Accurate Approximation Methods for Trigonometric and Arctangent Calculations for Low-Performance Computers

by
Takashi Kusaka
1,*,† and
Takayuki Tanaka
2,†
1
Independent Researcher, Sapporo 063-0867, Japan
2
Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2022, 11(15), 2285; https://doi.org/10.3390/electronics11152285
Submission received: 14 June 2022 / Revised: 11 July 2022 / Accepted: 19 July 2022 / Published: 22 July 2022
(This article belongs to the Section Computer Science & Engineering)

Abstract

:
In modern computers, complicated signal processing is highly optimized with the use of compilers and high-speed processing using floating-point units (FPUs); therefore, programmers have little opportunity to care about each process. However, a highly accurate approximation can be processed in a small number of computation cycles, which may be useful when embedded in a field-programmable gate array (FPGA) or micro controller unit (MCU), or when performing many large-scale operations on a graphics processing unit (GPU). It is necessary to devise algorithms to obtain the desired calculated values without an accelerator or compiler assistance. The residual correction method (RCM) developed here can produce simple and accurate approximations of certain nonlinear functions with minimal multiply–add operations. In this study, we designed an algorithm for the approximate computation of trigonometric and inverse trigonometric functions, which are nonlinear elementary functions, to achieve their fast and accurate computation. A fast first approximation and a more accurate second approximation of each function were created using RCM with a less than 0.001 error using multiply–add operations only. This achievement is particularly useful for MCUs, which have a low power consumption but limited computational power, and the proposed approximations are candidate algorithms that can be used to stabilize the attitude control of robots and drones, which require real-time processing.

1. Introduction

The approximation of elementary functions is a critical issue in science and engineering, including computer graphics, scientific computing, and signal processing [1,2,3,4]. Central processing units (CPUs) have evolved to have computation instructions for efficient processing, but electronic circuits with limited resources, such as field programmable gate arrays (FPGAs) and micro controller units (MCUs), may not have accelerators such as a floating-point unit (FPU) or digital signal processor (DSP). Therefore, techniques to avoid calculation with a large number of division and non-linear calculation cycles are important for large-scale calculations and embedded systems [5]. Division and square root are frequently used, even though they are up to ten times slower than addition and multiplication, and this trend has not changed in the latest architectures [6,7].
In particular, MCUs have a low computational power with an instruction set limited to basic arithmetic and logic operations, but they are small in size, have a low power consumption, and are used in a wide variety of products, including smartphones, audio accessories, video game peripherals, and advanced medical equipment [8,9,10,11,12,13]. Furthermore, a small size and low power consumption features are required in areas such as IoT devices [14,15] and drone control [16,17,18,19].
Even in the rapidly developing field of machine learning, fast approximate computation is critical to solve problems such as short learning times and the real-time processing of inferences [20,21,22,23]. GPUs can process a large number of multiply–add operations in parallel, and by replacing nonlinear calculations used in machine learning with approximate calculations using multiply–add operations, it may be possible to achieve higher speeds and smaller models.
When the compiler optimization performance was low, implementation techniques to avoid division were very useful in the field of embedded electronic circuits [24]. Even today, many MCUs do not have an extended instruction set, so such techniques are very important. For example, replacing division by a product of reciprocal constants and utilizing bit shifting for multiplying and dividing powers of two are well-known techniques [25]. Computational tricks using the floating-point structure of the IEEE754 are effective, and useful techniques are known for obtaining fast approximations of exponential and logarithmic functions [21,26] and inverse square roots (reciprocal sqrt) [25,27,28,29,30,31,32]. In particular, the latter is a well-known algorithm called the fast inverse square root (FISR), which is a useful technique that can be used to find the reciprocal and square root.
As mentioned, it is important to use such techniques to obtain elementary functions at a high speed and low cost when considering integration into electronic circuits or use in machine learning with GPUs. Among elementary functions, the addition and multiplication in four fundamental arithmetic operations can generally be processed by CPUs at a high speed, and the division can be calculated by using the fast inverse square root described above. Moreover, exponential and logarithmic functions, which are nonlinear functions of elementary functions, can be obtained using IEEE754 calculation tricks. The remaining elementary functions include the computation of trigonometric and inverse trigonometric functions, for which, some methods have been proposed.
In general, approximations of trigonometric functions are often obtained using Taylor expansions, which are slow to converge and cannot be computed over a wide definition domain without the terms exceeding the seventh order. Inverse trigonometric functions are similarly computed using polynomial approximation [33], rational approximation [34,35,36,37], look up table (LUT), and coordinate rotation digital computer (CORDIC) algorithms [38,39,40], but each of them have disadvantages, such as requiring a large amount of memory and iterative calculations.
To solve this problem, we proposed an algorithm to approximate trigonometric functions (sin and cos) and inverse trigonometric functions (atan2) using only the multiply–add operation [41]. In the proposed method, the approximation is achieved by using one addition and two multiplications. The proposed residual correction method (RCM) is an approximation that minimizes the error over the entire domain, rather than a locally precise approximation such as the Taylor expansion. The atan2 function created by the proposed method is faster than the previously used DSP-trick [42], and the approximation is obtained with fewer errors. The DSP-trick is a suitable algorithm for MCUs, but it must always perform division. Thus, it is a tightly coupled algorithm with division. In contrast, our proposed method can be executed with only multiply–add operations if the norm is known. Therefore, it does not necessarily require division. Hence, it is a loosely coupled algorithm with division. Since the norm of the measured values is known for most of the posture measurement sensors, the proposed method, skipping normalization, is suitable for embedded devices. If normalization is required, the algorithm can be integrated with any normalization method, such as fast inverse square root for MCUs or single instruction, multiple data (SIMD) instructions for advanced computing devices.
In this study, we extend our proposed method to compute even more accurate approximations of trigonometric and inverse trigonometric functions at a similar computational cost.

2. Concept of Approximation by Residual Correction Method

We have been developing wearable posture and fatigue measurement devices [43,44] and assistive suits [45,46,47] using small, low-power MCUs. For this purpose, it is necessary to measure human posture in real time from sensor data using only simple arithmetic and logical operations that are possible with MCUs. Therefore, in our previous research, we developed approximate formulas based on multiply–add operations of trigonometric and inverse trigonometric functions to realize posture calculation on MCUs [41].
We proposed the concept of designing approximate expressions utilizing the residual correction method using simple algebraic expressions with only multiply–add operations, without division operations and iterative convergence algorithms, to achieve efficient calculations for embedded systems.
In a previous study, as an example of approximation using the residual correction method, an approximation formula based on the multiply–add operation of trigonometric functions (sin and cos) and inverse trigonometric functions (atan2) was created.
The following procedure was used to design the approximate formula.
(1).
Linear approximation with fixed origin and domain endpoints:
f ( x ) a x + b ;
(2).
Evaluate residuals with approximate target function:
e r e s = a x + b f ( x ) ;
(3).
Design of residual correction function (RCF):
r ( x ) e r e s ;
(4).
Approximate formula with residual correction:
f a p p r o x ( x ) = a x + b r ( x ) .
The concept of the residual correction method is to express the RCF using only multiply–add operations, which must be explored heuristically. Due to the restriction of using only multiply–add operations, the order of the candidate functions for the RCF is less than a third. The RCF was evaluated in terms of both computational cost and approximation error, and it was shown that, for the nonlinear functions mentioned above, the use of parabolic functions provides an efficient approximation formula that expresses each function using only multiply–add operations.
Although the above method can easily obtain a quadratic approximation formula for a nonlinear function, some applications require an even smaller error. In the next section, a method to reduce the approximated error while maintaining the computational cost by extending the approximation formula with residual correction is proposed.

3. Second Approximation by the Residual Correction Method for Trigonometric Functions

In a previous study, the approximations made by the residual correction method for trigonometric functions were as follows [41]:
sin ( θ ) s ( θ ) = 2 π 2 θ ( π | θ | )
cos ( θ ) c ( θ ) = s π 2 | θ |
These are of the same form as the famous quadratic approximation for a sine function [48] in [ π , π ] , but the error is larger around θ = π 4 , as shown in Figure 1.
Here, the above approximation formula is used as the first approximation by the residual correction method, and a method to further reduce the error by utilizing the residual correction method under the constraint of using only multiply–add operation is examined.

3.1. Exploring RCF Candidates

Here, we consider the sin-function and evaluate the residuals of the first approximation. The residuals of the first approximation have the shape of an odd function, as shown in Figure 2a. The following RCF candidates obtained by the multiply–add operation can be considered for this residual:
(1).
RCF with signed quadratic function:
r s q ( θ ) = α s q s ( θ ) 1 | s ( θ ) | ;
(2).
RCF with cubic function:
r c u b i c ( θ ) = α c u b i c s ( θ ) 1 s 2 ( θ ) ;
(3).
RCF with co-function:
r c o f u n c ( θ ) = α c o f u n c s ( θ ) sign ( s ( θ ) ) 1 c 2 ( θ ) .
Although the square root appears in the RCF using the co-function, the built-in implementation will use the fast inverse square root.
Figure 2b shows the evaluation of the RCF candidate functions. Here, the coefficients of the RCF candidate functions are designed as α s q = 0.224 , α c u b i c = 0.145 , and α c o f u n c = 0.5 so that the residuals are well corrected. As can be seen from the graph, the signed quadratic function is closest to the residuals of the first approximation. In addition, the signed process is faster than the multiplication and fast inverse square root, indicating that the signed quadratic function is suitable for RCF.

3.2. Second Approximation of Trigonometric Functions

Using r s q ( θ ) for RCF, the second approximation of the trigonometric function is
s 2 ( θ ) = s ( θ ) r s q ( θ ) = s ( θ ) ( 1 α s q ) α s q | s ( θ ) | .
A similar argument can be made for the second approximation of the cos function:
c 2 ( θ ) = c ( θ ) ( 1 α s q ) α s q | c ( θ ) |
The approximate shape of the final second approximation function is shown in Figure 3.
To evaluate the performance of the second approximation, the final residuals and computation time are shown in Figure 4 and Table 1. The final residual is, at most, approximately 0.00092, which is sufficiently small when compared with the first approximation. The computation time was evaluated using an Intel Xeon E3 (2500 MHz) CPU. As a result, the processing time was 1.4 ns for the sine’s approximation and 2.04 ns for the cosine’s approximation, which is approximately twice as long as the first approximation, but more than 30 times faster than the built-in functions in math.h.
Comparing the first approximation, Equation (3), with the second approximation, Equation (4), we observe that they have the same parabolic function shape. The second approximation first keeps the results of the calculation of the first approximation in memory, and then uses the results to input them into a parabolic function of the same shape. The first approximation requires the computational cost of one parabolic function, and the second approximation requires the computational cost of two parabolic functions. Therefore, the computation cost is approximately doubled; however, the computation error is reduced to approximately 1/60.

4. Second Approximation by the Residual Correction Method for Inverse Trigonometric Functions

Approximate formulas based on the residual correction method for inverse trigonometric functions were also proposed in our previous study [41].
atan ( y / x ) at ( y , x ) = π 2 2 3 x y
Note that this equation is designed assuming that the norm is known, x 2 + y 2 = 1 , and the domain of definition is [ π / 2 , π / 2 ] . In many cases, the norm is known for sensors that measure the directional cosine in embedded systems, but when the norm changes occasionally, it is assumed that the norm is normalized in advance using fast inverse square root. The computational complexity of this normalization will be discussed later. Next, we will consider a higher accuracy by the second approximation for inverse trigonometric functions and trigonometric functions.

4.1. Domain Expansion

Since the domain of Equation (5) is [ π / 2 , π / 2 ] , it is necessary to extend the domain in order to treat it the same as atan 2 ( y , x ) . Two methods are evaluated in this paper: one based on a conditional branch and the other based on an extension using the half-angle identity.
The method using the conditional branch changes the formula used by using the sign of the variable.
atan 2 ( y , x ) at 2 i f ( y , x ) = at ( y , x ) , if x 0 π at ( y , x ) , if x < 0 and y 0 π at ( y , x ) , otherwise .
Another way to expand the domain of definition by a factor of two is to use the half-angle identity. This can be expressed in a single algebraic expression, eliminating the need to switch equations. The atan’s half-angle identity,
atan 2 ( y , x ) = 2 atan y x 2 + y 2 + x ,
together with the constraints of x 2 + y 2 = 1 , can be organized into
atan 2 ( y , x ) at 2 e x ( y , x ) = π 2 1 1 + x + 2 3 y
                                                 = π 2 G ( 1 + x ) + 2 3 y .
Here, 1 / · appears in the equation, and G ( · ) is used as the fast inverse square root. This allows us to extend the definition domain from [ π / 2 , π / 2 ] to [ π , π ] , so that we can approximate atan2 with a single algebraic expression.
The results of each domain extension are shown in Figure 5a, and the residuals are shown in Figure 5b.

4.2. Exploring RCF Candidates

Next, we consider candidates for RCF, but, as the atan2 function is a two-variable function, it is difficult to search for functions by simple multiply–add operations, such as parabolic functions. Therefore, we consider using Newton’s method by referring to the steps for improving the accuracy of the fast inverse square root.
From the constraints on the trigonometric functions, the evaluation function f ( θ ) and its derivative in Newton’s method by x, y, and θ are
f ( θ ) = 1 2 ( x cos θ ) 2 + ( y sin θ ) 2        = 1 x cos θ y sin θ ,
f ( θ ) = x sin θ y cos θ .
Using this, the error converges to zero by repeating this equation
θ n + 1 = θ n f ( θ n ) f ( θ n ) .
With reference to this, the second term on the right-hand side, which is the correction term, can be used as a candidate for RCF.
Thus, it was found that the sin and cos functions are necessary for RCF to cancel the approximation error of atan2. From the evaluation in the previous chapter, the trigonometric functions in the math library are computationally expensive when implemented in programs. Therefore, we replace them with s 2 ( θ ) and c 2 ( θ ) , which are highly accurate and fast approximate calculations.
Therefore, let the candidate functions of RCF be as follows:
r a t a n 2 ( y , x ) = 1 x c 2 ( θ ) y s 2 ( θ ) x s 2 ( θ ) y c 2 ( θ )
There are two problems with this RCF candidate. One is that it violates the design concept of the residual correction method because it involves division. The other is that the denominator may become zero due to approximation errors in the trigonometric functions. Therefore, r a t a n 2 ( y , x ) is not suitable for RCF. However, by analyzing the behavior of r a t a n 2 ( y , x ) , a new RCF can be found. When there is no approximation error in the trigonometric function, the problem of the denominator being zero does not occur because the numerator also becomes zero at that time, and the effect of the correction term disappears. From this, the requirement for RCF is that the denominator of r a t a n 2 ( y , x ) , f ( θ ) is zero when it is also zero. Therefore, f ( θ ) itself is a candidate for RCF here.
r ˜ a t a n 2 ( y , x ) = α a t a n 2 x s 2 ( θ ) y c 2 ( θ )
Although there are design degrees of freedom in the coefficients, the result of this optimization is approximately 1.0 as shown next, thus eliminating one multiplication.
α a t a n 2 = arg min α π π at 2 ( y , x ) r ˜ a t a n 2 ( y , x ) θ d θ 1.0
The same result can be obtained using either at 2 i f ( y , x ) or at 2 e x ( y , x ) as at 2 ( y , x ) . A plot of r ˜ a t a n 2 ( y , x ) is shown in Figure 6a. Here, we use the domain expansion at 2 e x ( y , x ) based on the half-angle identity. It can be seen that r ˜ a t a n 2 ( y , x ) , which can be computed by the multiply–add operation, agrees very well with the residual of at 2 e x ( y , x ) and is appropriate as RCF.
Figure 6b shows the case where at 2 i f ( y , x ) is evaluated with the same RCF, and this RCF can always be applied if a good approximation is obtained in the first approximation. Therefore, other atan2 approximation methods, such as DSP-Trick, for example, are also RCFs that can be made highly accurate.

4.3. Second Approximation of Inverse Trigonometric Functions

Having obtained the RCF, the second approximation of atan2 can be computed using the second approximation of trigonometric functions as follows:
θ 1 = at 2 ( y , x )
at 2 2 ( y , x ) = θ 1 r ˜ a t a n 2 ( y , x )      = θ 1 x s 2 ( θ 1 ) + y c 2 ( θ 1 ) .
Figure 7 shows the residuals of at 2 i f ( y , x ) and at 2 e x ( y , x ) . Both at 2 i f ( y , x ) and at 2 e x ( y , x ) can be used for at 2 ( y , x ) in terms of error, but at 2 i f ( y , x ) should be used in terms of the computational cost.
In addition, the constraint x 2 + y 2 = 1 was used in the approximate formula design up to this point, but normalization may be necessary in some applications. Our proposed atan2 approximation can be combined with any normalization algorithm. Recent CPUs can compute faster with the SIMD instruction set. However, for MCUs with only a simple instruction set, normalization is a major problem [49]. Therefore, fast inverse square root, which can be executed using only multiply–add and logic operations, is a useful technique for MCUs. Any normalization algorithm or SSE instruction can be used in general; however, we assume an MCU and verify the use of fast inverse square root. The normalized second approximation is defined as at 2 2 n ( y , x ) when the fast inverse square root is used for the variables. at 2 2 n ( y , x ) can be calculated using the fast inverse square root G ( · ) as follows:
at 2 2 n ( y , x ) = at 2 2 n y , n x ,
n = G ( x 2 + y 2 ) 1 x 2 + y 2
The error between the calculation results and the Math library is shown in Figure 8. The increase in computational complexity due to normalization is shown as at 2 2 n in Table 2. Compared with at 2 2 , it is two times greater, which indicates that a highly accurate approximation of at 2 2 can be performed with the same computational complexity as that of the fast inverse square root.
In this study, the original fast inverse square root [32] is used for the evaluation implementation, but faster implementations have been studied, such as the algorithm [50], which is faster for FPGA.

5. Results

Second approximations of the trigonometric and inverse trigonometric functions have been developed, and the results are summarized here. The maximum error indicates the maximum value of the final residuals of the second approximation. The computational cost indicates the number of multiply–add operations used. Although an example of the computation time has already been shown, only the number of operations is shown here because it varies depending on the CPU performance and architecture.
For the second approximation of the trigonometric functions, Equations (3) and (4) are considered good approximations, and the error and computational cost are as shown in Table 3. The final error is less than 0.001 for the entire definition region, and the computational cost is very low.
Equation (17) is considered a good second approximation of the inverse trigonometric function. The error and computational cost are shown in Table 4. The small approximation error is less than 0.001 rad over the entire definition region. The computational cost is a little higher because two second approximations of trigonometric functions are required, but the approximation formula is only a finite number of multiply–add operations.

6. Discussion

In this study, we have designed a second approximation of RCF to achieve a higher accuracy in approximating trigonometric and inverse trigonometric functions.

6.1. Approximation of Trigonometric Function

As mentioned in Introduction, the Taylor series is slow to converge but can compute mathematically exact values of trigonometric functions. We compare the Taylor series and the proposed method in the domain of definition [ π , π ] . Taylor series are evaluated from the first to the ninth order. In addition, the computational cost is evaluated based on the definition formula and the following formula expansion to reduce the number of multiplications for embedding.
sin ( x ) n = 0 k ( 1 ) n x ( 2 n 1 ) ( 2 n 1 ) ! = x 1 x 2 1 3 ! + x 2 1 5 ! x 2 1 7 ! +
The left-hand side of Equation (20) is the definition for the kth-order Taylor expansion and the right-hand side is the optimized number of multiplications by storing x 2 in memory. The error is evaluated numerically by Romberg integration, and the computational cost is compared to the number of multiplications.
Table 5 and Figure 9 show the comparison results. As mentioned above, the Taylor series is mathematically exact, but its convergence is slow, so a large error remains if a wide domain of definition is used. Although the proposed method cannot reduce the error to zero, it enables obtaining results with a reduced number of calculations and reduced error.

6.2. Approximation of Inverse Trigonometric Function

Next, we discuss the approximation of the inverse trigonometric function. Depending on the characteristics of the application, the desired accuracy can be used by selecting whether the first approximation is sufficient, second approximation is necessary, or whether normalization is necessary. Table 6 shows a comparison with other methods. Basically, all are trade-offs between accuracy and computational complexity, and should be used appropriately depending on the application. The approximate formula by the residual correction method is an algebraic formula and is extremely easy to implement as a program, so it can be one of the options for approximate calculation. While more exact calculation methods are suitable for obtaining exact values, such as in numerical simulations, multiply–add operations are very effective in compact systems such as MCUs, FPGAs, and large-scale machine learning operations. In such fields, the proposed method should be used to obtain highly accurate approximations with a finite number of multiply–add operations without using division.
The following is an example of compilation and execution results on an actual MCU. The program memory occupancy of the proposed algorithm with an MCU grade is shown in Table 7. The famous Microchip PIC family and the XC8 compiler are considered in these examples. In the lowest grade model, the atan2 function alone consumes the entire program memory, and thus the program cannot be compiled. In contrast, the proposed method makes it possible to successfully implement the algorithm. Other grades of MCUs without FPU or DSP occupy the same amount of program memory. Since embedded systems usually require not only function calculations but also various processes, such as measurement and equipment control, the program memory of the MCU must be saved as much as possible in order to implement these processes. The second approximation of the proposed method also uses s 2 and c 2 in the calculation process; thus, trigonometric functions can be simultaneously used. A total of 7.7 KB is required to include them in the Math library, so the approximation is advantageous for performing complicated calculations.
Next, the calculation speed is measured using an actual MCU (PIC18F2580). The expetimental results are summarized in Table 8. The calculation speed is fast enough for the first approximation of atan2, and the speed of the second approximation is close to that of the Math library as the calculation volume increases. Therefore, if there is sufficient program memory, using the Math library is the most accurate way to calculate accurate values. However, for limited program memory, such as in low-grade MCUs, the first approximation can be used for speed and the second approximation can be used if accuracy is required. This increases the degree of freedom of implementation. As mentioned above, MCUs perform not only function calculations but also various other functions, such as measurement and equipment control, so function calculations should be as fast as possible.
In this study, we developed highly accurate approximation formulas for sin and cos as trigonometric functions and atan2 as an inverse trigonometric function using the residual correction method. The tan, asin, and acos functions can be calculated from the fast inverse square root and the approximation formula proposed here using the trigonometric identities, respectively. Therefore, it is known that log and exp functions can be computed from the IEEE754 structure as the fast inverse square root, implying that, besides the four arithmetic operations, logarithm and exponential, trigonometric, and inverse trigonometric functions can also be computed quickly. This enables the computation of highly accurate approximations of all elementary functions by a finite number of multiply–add operations and bit shifts, when considered for embedded systems.

7. Conclusions

In this paper, we proposed a second approximation of the residual correction method that can approximate trigonometric and inverse trigonometric functions with a high accuracy using a small number of calculations. Using our proposed algorithm, nonlinear calculations that were previously impossible to implement can now be realized even on small, low-power computers, such as MCUs. It cannot reduce the error to completely zero; however, it can compute accurate approximations with low-cost computation. This can contribute to machine downsizing and cost reduction, and is a candidate algorithm for developing MCU-based devices that require real-time performance, such as drones and IoT devices.
To approximate trigonometric and inverse trigonometric functions with a high accuracy using finite multiply–add operations, a second approximation using the residual correction method was proposed. Trigonometric functions can be computed with a maximum error of less than 0.001 by two additions and four multiplications, and atan2 can be computed with a maximum error of less than 0.001 rad by seven additions and fourteen multiplications. For the formulation of the second approximation, we designed and evaluated a formula using floating-point arithmetic. In future research, we aim to optimize the formula for embedded systems by converting it to fixed-point arithmetic. Since the residual correction method is formulated using only the multiply–add operation without division, it is well suited to the fixed-point arithmetic, and further acceleration can be expected when considering implementation in embedded systems.

Author Contributions

Conceptualization, T.K.; investigation, T.K.; methodology, T.K.; supervision, T.T.; validation, T.K. and T.T.; writing—original draft, T.K.; writing—review and editing, T.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Moroz, L.; Samotyy, V. Efficient Floating-Point Division for Digital Signal Processing Application [Tips &Tricks]. IEEE Signal Process. Mag. 2019, 36, 159–163. [Google Scholar] [CrossRef]
  2. Liu, W.; Nannarelli, A. Power Efficient Division and Square Root Unit. IEEE Trans. Comput. 2012, 61, 1059–1070. [Google Scholar] [CrossRef]
  3. Viitanen, T.; Jääskeläinen, P.; Esko, O.; Takala, J. Simplified floating-point division and square root. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 2707–2711. [Google Scholar] [CrossRef]
  4. Marius, C. Intel® AVX-512 Instructions and Their Use in the Implementation of Math Functions. Available online: http://www.fit.vutbr.cz/~iklubal/IPA/AVX-512_Cornea.pdf (accessed on 24 April 2022).
  5. Saber, M.; Jitsumatsu, Y.; Kohda, T. A low-power implementation of arctangent function for communication applications using FPGA. In Proceedings of the 2009 Fourth International Workshop on Signal Design and its Applications in Communications, Fukuoka, Japan, 19–23 October 2009; pp. 60–63. [Google Scholar] [CrossRef]
  6. Oberman, S.; Flynn, M. Design issues in division and other floating-point operations. IEEE Trans. Comput. 1997, 46, 154–161. [Google Scholar] [CrossRef] [Green Version]
  7. Instruction Tables: Lists of Instruction Latencies, Throughputs and Micro-Operation Breakdowns for Intel, AMD and VIA CPUs. Available online: https://www.agner.org/optimize/ (accessed on 17 May 2022).
  8. PIC® MCUs | Microchip Technology. Available online: https://www.microchip.com/en-us/products/microcontrollers-and-microprocessors/8-bit-mcus/pic-mcus (accessed on 4 July 2022).
  9. 8-Bit Microcontroller Based SOC Chip for Drone, Robot, and Many More Applications. Available online: https://www.eeherald.com/section/new-products/owns20170312003-8-bit-mcu-soc-np.html (accessed on 4 July 2022).
  10. The 8-Bit Microcontroller and Beyond: An Interview with Tam Hanna. Available online: https://www.elektormagazine.com/news/8-bit-microcontroller-interview-tam-hanna (accessed on 4 July 2022).
  11. Bolanakis, D.E. A Survey of Research in Microcontroller Education. IEEE Rev. Iberoam. Tecnol. Del Aprendiz. 2019, 14, 50–57. [Google Scholar] [CrossRef]
  12. Buturugă, A.; Constantinescu, R.C.; Stoichescu, D.A. Current consumption analysis for 8-bit microcontrollers. In Proceedings of the 2019 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania, 27–29 June 2019; pp. 1–6. [Google Scholar] [CrossRef]
  13. Moyer, B. MPU Vs. MCU. 2020. Available online: https://semiengineering.com/mpu-vs-mcu/ (accessed on 4 July 2022).
  14. Chen, C.H.; Lin, M.Y.; Liu, C.C. Edge Computing Gateway of the Industrial Internet of Things Using Multiple Collaborative Microcontrollers. IEEE Netw. 2018, 32, 24–32. [Google Scholar] [CrossRef]
  15. Joo, S.; An, Y.J.; Oh, T.W.; Jung, S.O. Comparative analysis of MCU memory for IoT application. In Proceedings of the 2018 International Conference on Electronics, Information, and Communication (ICEIC), Honolulu, HI, USA, 24–27 January 2018; pp. 1–3. [Google Scholar] [CrossRef]
  16. Altitude Data for Drone Navigation: A Compact Barometric Pressure Sensor and Thermometer from Bosch—News. Available online: https://www.allaboutcircuits.com/news/drone-altitude-bmp388-barometric-pressure-sensor-thermometer-bosch/ (accessed on 4 July 2022).
  17. Alakananda, B.G.; Venugopal, N. Development of a Programmable System on Chip(Psoc) based Quadcopter. In Proceedings of the 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Tirunelveli, India, 15–17 June 2020; pp. 93–98. [Google Scholar] [CrossRef]
  18. An IIR Filter in an 8-bit Microcontroller. Available online: http://e-rokodelnica.si/A003/A003_EN.html (accessed on 4 July 2022).
  19. Leong, B.T.M.; Low, S.M.; Ooi, M.P.L. Low-Cost Microcontroller-based Hover Control Design of a Quadcopter. Procedia Eng. 2012, 41, 458–464. [Google Scholar] [CrossRef] [Green Version]
  20. Cococcioni, M.; Rossi, F.; Ruffaldi, E.; Saponara, S. Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic. Sensors 2020, 20, 1515. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Romeric. Fastapprox: Approximate and Vectorized Versions of Common Mathematical Functions. Available online: https://github.com/romeric/fastapprox (accessed on 25 April 2022).
  22. Bouguezzi, S.; Faiedh, H.; Souani, C. Hardware Implementation of Tanh Exponential Activation Function using FPGA. In Proceedings of the 2021 18th International Multi-Conference on Systems, Signals Devices (SSD), Monastir, Tunisia, 22–25 March 2021; pp. 1020–1025. [Google Scholar] [CrossRef]
  23. Tsmots, I.; Skorokhoda, O.; Rabyk, V. Hardware Implementation of Sigmoid Activation Functions using FPGA. In Proceedings of the 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM), Polyana, Ukraine, 26 February–2 March 2019; pp. 34–38. [Google Scholar] [CrossRef]
  24. Chang, C.H.; Chen, S.H.; Chen, B.W.; Wang, J.C.; Wang, J.F. A division-free algorithm for fixed-point power exponential function in embedded system. In Proceedings of the 2013 1st International Conference on Orange Technologies (ICOT), Tainan, Taiwan, 12–16 March 2013; pp. 223–226. [Google Scholar] [CrossRef]
  25. Warren, H.S. Hacker’s Delight, 2nd ed.; Pearson Education: London, UK, 2013. [Google Scholar]
  26. Blinn, J. Floating-point tricks. IEEE Comput. Graph. Appl. 1997, 17, 80–84. [Google Scholar] [CrossRef]
  27. Moroz, L.V.; Samotyy, V.V.; Horyachyy, O.Y. Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants. Computation 2021, 9, 21. [Google Scholar] [CrossRef]
  28. Walczyk, C.J.; Moroz, L.V.; Cieśliński, J.L. Improving the Accuracy of the Fast Inverse Square Root by Modifying Newton–Raphson Corrections. Entropy 2021, 23, 86. [Google Scholar] [CrossRef] [PubMed]
  29. Eberly, D.H. GPGPU Programming for Games and Science; A K Peters/CRC Press: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
  30. id-Software/Quake-III-Arena. 2022. Available online: https://github.com/id-Software/Quake-III-Arena/blob/dbe4ddb10315479fc00086f08e25d968b4b43c49/code/game/q_math.c (accessed on 24 April 2022).
  31. Lomont, C. Fast Inverse Square Root; Technical-315 Report, Volume 32; 2003. Available online: http://www.matrix67.com/data/InvSqrt.pdf (accessed on 5 July 2022).
  32. Robertson, M. A Brief History of InvSqrt. Bachelor’s Thesis, University of New Brunswick, Fredericton, NB, Canada, 2012. [Google Scholar]
  33. Lyons, R.G. Efficient Approximations for the Arctangent Function. In Streamlining Digital Signal Processing: A Tricks of the Trade Guidebook; IEEE: Piscataway, NJ, USA, 2012; pp. 265–276. [Google Scholar] [CrossRef]
  34. Abrarov, S.M.; Quine, B.M. A rational approximation of the arctangent function and a new approach in computing pi. arXiv 2016, arXiv:1603.03310. [Google Scholar]
  35. Ukil, A.; Shah, V.H.; Deck, B. Fast computation of arctangent functions for embedded applications: A comparative analysis. In Proceedings of the 2011 IEEE International Symposium on Industrial Electronics, Gdansk, Poland, 27–30 June 2011; pp. 1206–1211. [Google Scholar] [CrossRef]
  36. Girones, X.; Julia, C.; Puig, D. Full Quadrant Approximations for the Arctangent Function [Tips and Tricks]. IEEE Signal Process. Mag. 2013, 30, 130–135. [Google Scholar] [CrossRef]
  37. Benammar, M.; Alassi, A.; Gastli, A.; Ben-Brahim, L.; Touati, F. New Fast Arctangent Approximation Algorithm for Generic Real-Time Embedded Applications. Sensors 2019, 19, 5148. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Haviland, G.; Tuszynski, A. A CORDIC Arithmetic Processor Chip. IEEE J. Solid-State Circuits 1980, 15, 4–15. [Google Scholar] [CrossRef] [Green Version]
  39. Zhu, H.; Ge, Y.; Jiang, B. Modified CORDIC algorithm for computation of arctangent with variable iterations. In Proceedings of the 2016 IEEE 13th International Conference on Signal Processing (ICSP), Chengdu, China, 6–10 November 2016; pp. 261–264. [Google Scholar] [CrossRef]
  40. Pilato, L.; Fanucci, L.; Saponara, S. Real-Time and High-Accuracy Arctangent Computation Using CORDIC and Fast Magnitude Estimation. Electronics 2017, 6, 22. [Google Scholar] [CrossRef] [Green Version]
  41. Kusaka, T.; Tanaka, T.; Kajiwara, H. Residual Correction Method for Fast Calculation of Arctangent in Embedded Systems. In Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Busan, Korea, 7–11 July 2015; p. 6. [Google Scholar]
  42. DSP Trick: Fixed-Point Atan2 With Self Normalization. Available online: https://dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normalization/ (accessed on 25 April 2022).
  43. Tsuchiya, Y.; Imamura, Y.; Tanaka, T.; Kusaka, T. Estimating Lumbar Load During Motion with an Unknown External Load Based on Back Muscle Activity Measured with a Muscle Stiffness Sensor. J. Robot. Mechatronics 2018, 30, 696–705. [Google Scholar] [CrossRef]
  44. Tsuchiya, Y.; Kusaka, T.; Tanaka, T.; Matsuo, Y. Wearable Sensor System for Lumbosacral Load Estimation by Considering the Effect of External Load. In Advances in Human Factors in Wearable Technologies and Game Design; Ahram, T., Falcão, C., Eds.; Advances in Intelligent Systems and Computing; Springer International Publishing: Cham, Switzerland, 2018; pp. 160–168. [Google Scholar] [CrossRef]
  45. Kusaka, T.; Tanaka, T.; Kaneko, S.; Suzuki, Y.; Saito, M.; Kajiwara, H. Assist Force Control of Smart Suit for Horse Trainers Considering Motion Synchronization. Int. J. Autom. Technol. 2009, 3, 723–730. [Google Scholar] [CrossRef]
  46. Yoshida, M.; Tanaka, T.; Tsuchiya, Y.; Kusaka, T. Reducing Lumbar Load with Active Corset. J. Robot. Mechatronics 2018, 30, 740–751. [Google Scholar] [CrossRef]
  47. Hashimoto, K.; Tanaka, T.; Kusaka, T. Walking Assistance and Resistance of Walking Motion by Trunk and Pelvis Motion Assist. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 8597–8602. [Google Scholar] [CrossRef]
  48. Berry, N. Approximating the Sine Function. 2019. Available online: https://datagenetics.com/blog/july12019/index.html (accessed on 6 July 2022).
  49. Cordes, P. Answer to “Is It Still Worth Using the Quake Fast Inverse Square Root Algorithm Nowadays on x86-64?”. 2022. Available online: https://stackoverflow.com/a/71608406 (accessed on 9 July 2022).
  50. Hasnat, A.; Bhattacharyya, T.; Dey, A.; Halder, S.; Bhattacharjee, D. A fast FPGA based architecture for computation of square root and Inverse Square Root. In Proceedings of the 2017 Devices for Integrated Circuit (DevIC), Kalyani, India, 23–24 March 2017; pp. 383–387. [Google Scholar] [CrossRef]
Figure 1. Summary of 1st approximations of sin-function and cos-function. (a) First approximation of sin and cos functions by RCM. (b) Comparison with unit circle.
Figure 1. Summary of 1st approximations of sin-function and cos-function. (a) First approximation of sin and cos functions by RCM. (b) Comparison with unit circle.
Electronics 11 02285 g001
Figure 2. Residual error of 1st approximation of sin-function and candidates of its RCF. (a) Residual error of 1st approximation of sin function. (b) Comparison among designed RCFs.
Figure 2. Residual error of 1st approximation of sin-function and candidates of its RCF. (a) Residual error of 1st approximation of sin function. (b) Comparison among designed RCFs.
Electronics 11 02285 g002
Figure 3. Summary of 2nd approximations of sin-function and cos-function. (a) Second approximation of sin and cos functions by RCM. (b) Comparison with unit circle.
Figure 3. Summary of 2nd approximations of sin-function and cos-function. (a) Second approximation of sin and cos functions by RCM. (b) Comparison with unit circle.
Electronics 11 02285 g003
Figure 4. Residual error of 2nd approximation of sin-function.
Figure 4. Residual error of 2nd approximation of sin-function.
Electronics 11 02285 g004
Figure 5. First approximation of atan2 and its domain expansion (plot using x = cos θ , y = sin θ ). (a) First approximation of atan2 and domain expansion. (b) Residual error in each expansion.
Figure 5. First approximation of atan2 and its domain expansion (plot using x = cos θ , y = sin θ ). (a) First approximation of atan2 and domain expansion. (b) Residual error in each expansion.
Electronics 11 02285 g005
Figure 6. Residual error of atan2’s 1st approximation and consideration of its RCF candidates (plot using x = cos θ , y = sin θ ). (a) Evaluation of Newton’s method (NM) for RCF. (b) Applying desined RCF for each expansion.
Figure 6. Residual error of atan2’s 1st approximation and consideration of its RCF candidates (plot using x = cos θ , y = sin θ ). (a) Evaluation of Newton’s method (NM) for RCF. (b) Applying desined RCF for each expansion.
Electronics 11 02285 g006
Figure 7. Residual error of 2nd approximation of atan2 (plot using x = cos θ , y = sin θ ).
Figure 7. Residual error of 2nd approximation of atan2 (plot using x = cos θ , y = sin θ ).
Electronics 11 02285 g007
Figure 8. Atan2’s 2nd approximation with normalization using fast inverse square root(FISR) and its residual error. (a) Atan2 approx. over entire domain using FISR. (b) Error with atan2 in math library.
Figure 8. Atan2’s 2nd approximation with normalization using fast inverse square root(FISR) and its residual error. (a) Atan2 approx. over entire domain using FISR. (b) Error with atan2 in math library.
Electronics 11 02285 g008
Figure 9. Comparison of maximum error and number of multiplications of sin function approximation between Taylor series and proposed method (log scale is used in vertical axis to compare error order).
Figure 9. Comparison of maximum error and number of multiplications of sin function approximation between Taylor series and proposed method (log scale is used in vertical axis to compare error order).
Electronics 11 02285 g009
Table 1. Calculation time evaluation of 2nd approximations of sin-function and cos-function (Average ± S.D.).
Table 1. Calculation time evaluation of 2nd approximations of sin-function and cos-function (Average ± S.D.).
1st Approx.2nd Approx.Math Library (math.h)
Sine function0.54 ± 0.16 ns1.4 ± 0.2 ns68.28 ± 0.52 ns
Cosine function1.22 ± 0.13 ns2.04 ± 0.39 ns69.1 ± 0.90 ns
Table 2. Calculation time evaluation of 2nd approximations of atan2 (Average ± S.D.)
Table 2. Calculation time evaluation of 2nd approximations of atan2 (Average ± S.D.)
1st Approx.2nd Approx.
at2if/at2ex
2nd Approx. with FISR
at2nif/at2nex
Math Library (math.h)
Calculation
time
1.43 ± 0.26 ns2.96 ± 0.27 ns
4.76 ± 0.26 ns
7.3 ± 0.19 ns
9.14 ± 0.25 ns
21.96 ± 0.83 ns
Table 3. Error and computational cost of approximating trigonometric functions.
Table 3. Error and computational cost of approximating trigonometric functions.
Sine FunctionMaximum ErrorComputational Cost
RCM 1st approximation5.6 × 10−21 addition
2 multiplications
1 absolute value
RCM 2nd approximation9.2 × 10−42 additions
4 multiplications
2 absolute values
Table 4. Error and computational cost of inverse trigonometric functions.
Table 4. Error and computational cost of inverse trigonometric functions.
Atan2 FunctionMaximum ErrorComputational Cost
RCM 1st approximation4.2 × 10−2 rad1 addition
2 multiplications
RCM 2nd approximation9.2 × 10−4 rad7 additions
14 multiplications
4 absolute values
Table 5. Comparison summary of sin function approximation between Taylor series and proposed method.
Table 5. Comparison summary of sin function approximation between Taylor series and proposed method.
Taylor Series (Original/Optimized for Embedding)Proposed Method
1st Order3rd Order5th Order7th Order9th Order1st Approx2nd Approx
Maximum abolute error3.142.035.24 × 10−17.52 × 10−26.93 × 10−35.6 × 10−29.2 × 10−4
Mean absolute error5.842.254.23 ×10−1 4.80 × 10 2 3.66 × 10 3 1.9 × 10 1 3.3 × 10 3
Number of multiplication0/03/38/415/524/624
Table 6. Comparison with other methods.
Table 6. Comparison with other methods.
Atan2 Approx. MethodsAccuracyDivisionIterationMemory
Rational approx. [34]HighNecessary--
Tayler expansionDepends on order---
LUTDepends on memory--Large
CORDIC [38]Depends on iterationNecessaryNecessary-
DSP-trick [42]Middle (<0.01)Necessary--
RCM 1st approx.Middle (<0.01)---
RCM 2nd approx. *High (<0.001)---
*: The proposed methods in this study.
Table 7. Comparison of occupied program memory compiled with actual MCUs.
Table 7. Comparison of occupied program memory compiled with actual MCUs.
GradeExample of MCUAvailable
Program
Memory
Proposed MethodsMath Library (math.h)
at 2 i f 1 KB Required at 2 2 (w/s2 and c2) 1.8 KB req’datan2 Only 3 KB req’datan2, sin, cos 7.7 KB req’d
LowPIC12F15722 KBAARCCCC
MiddlePIC16F18274 KBAAARCC
MiddlePIC16F17698 KBAAAAR
HighPIC18F258016 KBAAAA
A: available (less than 50% occupancy); AR: available with restriction; CC: cannot be compiled.
Table 8. Experimental result on actual MCU (PIC18F2580@4MHz internal clock).
Table 8. Experimental result on actual MCU (PIC18F2580@4MHz internal clock).
1st Approx of atan22nd Approx of atan2atan2 (math.h)
Average time of 100 iterations1.5 ms6.5 ms8.0 ms
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kusaka, T.; Tanaka, T. Fast and Accurate Approximation Methods for Trigonometric and Arctangent Calculations for Low-Performance Computers. Electronics 2022, 11, 2285. https://doi.org/10.3390/electronics11152285

AMA Style

Kusaka T, Tanaka T. Fast and Accurate Approximation Methods for Trigonometric and Arctangent Calculations for Low-Performance Computers. Electronics. 2022; 11(15):2285. https://doi.org/10.3390/electronics11152285

Chicago/Turabian Style

Kusaka, Takashi, and Takayuki Tanaka. 2022. "Fast and Accurate Approximation Methods for Trigonometric and Arctangent Calculations for Low-Performance Computers" Electronics 11, no. 15: 2285. https://doi.org/10.3390/electronics11152285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop