A Comparison of Two Schemes, Based upon Multi-Level LUTs and Second-Order Recursion, for Parallel Computation of FFT Twiddle Factors

K. Jones*

Retired Consultant Mathematician, Weymouth, Dorset, UK

Submitted on 24 May 2025; Accepted on 12 July 2025; Published on 23 July 2025

To cite this article: K. Jones, “A Comparison of Two Schemes, Based upon Multi-Level LUTs and Second-Order Recursion, for Parallel Computation of FFT Twiddle Factors,” Trans. Appl. Sci. Eng. Technol., vol. 1, no. 1, pp. 1-11, 2025.

Copyright: 

Abstract

The paper describes two schemes, together with architectures, for the resource-efficient parallel computation of twiddle factors for the fixed-radix version of the fast Fourier transform (FFT) algorithm. Assuming a silicon-based hardware implementation with suitably chosen parallel computing equipment, the two schemes considered provide one with the facility for trading off the arithmetic component of the resource requirements, as expressed in terms of the numbers of multipliers and adders, against the memory component, as expressed in terms of the amount of memory required for constructing the look-up tables (LUTs) needed for their storage. With a separate processing element (PE) being assigned to the computation of each twiddle factor, the first scheme is based upon the adoption of the single instruction multiple data (SIMD) technique, as applied in the ‘spatial’ domain, whereby the PEs operate independently upon their own individual LUTs and may thus be executed simultaneously; the second scheme is based upon the adoption of the pipelining technique, as applied in the ‘temporal’ domain, whereby the operation of all but the first LUT-based PE is based upon second-order recursion using previously computed PE outputs. Although the FFT radix and LUT level (where the LUT may be of either single-level or multi-level type) may each take on arbitrary integer values, we will be particularly concerned with the radix-4 version of the FFT algorithm, together with the two-level version of the LUT, as these two algorithmic choices facilitate ease of illustration and offer the potential for flexible computationally-efficient FFT designs. A brief comparison of the resource requirements for the two schemes is provided for various parameter sets that cater, in particular, for those big data memory-intensive applications involving the use of long (with length of order one million) to ultra-long (with length of order one billion) FFTs.

Keywords: butterfly; FFT; LUT; parallel; recursion; twiddle factor

Abbreviations: FFT: fast Fourier transform; LUTs: look-up tables; PE: processing element; SIMD: single instruction multiple data; DFT: discrete Fourier transform; DIT: decimation-in-time; DR: digit-reverse; NAT: naturally; DIF: decimation-in-frequency; FPGA: field-programmable gate array; RAM: random access memory; FHT: fast Hartley transform

References

  1. N. Ahmed and K. R. Rao, Orthogonal Transforms for Digital Signal Processing, Berlin, Heidelberg: Springer‑Verlag, 1975 (softcover reprint 2012).
  2. G. Birkhoff and S. MacLane, A Survey of Modern Algebra, New York: Macmillan, 1977.
  3. J. H. McClellan and C. M. Rader, Number Theory in Digital Signal Processing, Englewood Cliffs, NJ: Prentice‑Hall, 1979.
  4. E. O. Brigham, The Fast Fourier Transform, Englewood Cliffs, NJ: Prentice‑Hall, 1974.
  5. E. Chu and A. George, Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms, Boca Raton, FL: CRC Press, 2000.
  6. D. F. Elliott and K. R. Rao, Fast Transforms: Algorithms, Analyses, Applications, San Diego, CA: Academic Press, 1982.
  7. K. Jones, “Design of Scalable Architecture for Real‑Time Parallel Computation of Long to Ultra‑Long Real‑Data DFTs,” Engineering, vol. 2, no. 4, Nov. 2024.
  8. H. Hassanieh, P. Indyk, D. Katabi, and E. Price, “Simple and Practical Algorithm for Sparse Fourier Transform,” in Proc. ACM–SIAM Symp. Disc. Algorithms, Kyoto, Japan, 2012, pp. 1183-1194.
  9. K. Jones, “Design for Resource‑Efficient Parallel Solution to Real‑Data Sparse FFT,” J. Appl. Sci. Technol., vol. 1, no. 2, Aug. 2023.
  10. K. Jones, “A Comparison of Two Recent Approaches, Exploiting Pipelined FFT and Memory Based FHT Architectures, for Resource‑Efficient Parallel Computation of Real‑Data DFT,” J. Appl. Sci. Technol., vol. 1, no. 2, Jul. 2023.
  11. K. Jones, “Schemes for Resource‑Efficient Generation of Twiddle Factors for Fixed‑Radix FFT Algorithms,” Engineering, vol. 2, no. 3, Jul. 2024.
  12. K. Jones, The Regularized Fast Hartley Transform: Low‑Complexity Parallel Computation of the FHT in One and Multiple Dimensions, 2nd ed., New York: Springer, 2022.
  13. S. G. Akl, The Design and Analysis of Parallel Algorithms, Englewood Cliffs, NJ: Prentice‑Hall, 1989.
  14. C. Maxfield, The Design Warrior’s Guide to FPGAs, Burlington, MA: Newnes (Elsevier), 2004.
  15. G. H. Hardy, A Course of Pure Mathematics, Cambridge: Cambridge University Press, 1928.