7 March 2013
I work on design and implementation of digital signal processing algorithms, primarily deterministic algorithms such as digital filters and transforms. By combining algorithm, arithmetic and architecture knowledge I aim at low complexity and low power implementations. This is both for ASIC and FPGA implementations, but, obviously, the aims are slightly different depending on technology.
We have proposed a processing element for short length discrete Fourier transform (DFT) computation, each length based on Winograd’s approach to compute the DFT. This may be suitable in memory-based FFT implementations of non-power-of-two-length transforms. Looking at previous work there have been only limited contributions, where some have used more complex algorithms than Winograd’s.
First, deriving the signal flow graphs of the Winograd algorithms may be beneficial as this has not been reported as far as we have seen and, therefore, may have been obstructing the use of these efficient algorithms. Second, the efficient merging of these algorithms onto the same basic structure may turn out to be an efficient way to build FFT processors. We know that the merging is done in a highly efficient way. The question is now how the memory and control structures will behave.
The primary goal is for the processing element to be used for the 3780-point DFT/IDFT in the Chinese DTMB digital-TV standard. However, there may be other applications where the DFT length can be divided in the factors 2, 3, 4, 5 and 7. For example, there is some use in the LTE mobile phone standard. The main benefit of these seemingly inconvenient lengths is that you may avoid the intermediate twiddle factor multiplications by using prime factor FFT algorithms. Hence, there may be cases where one could prefer to use combinations of these lengths rather than the traditional power-of-two lengths.
As the first author obtained his PhD degree about a year ago and has now returned to his home country Pakistan, as required by his scholarship contract, we need to find someone else to finish the implementation of the complete FFT processor. Once that is obtained, one can compare with other existing implementations and see if the whole processor architecture is as efficient as the proposed processing element or if, for some reason, the memory and control structures void the benefits of the processing element. Here, there is a trade-off between the amount of twiddle factor multiplications required when using a standard FFT decomposition and the more complex memory management when using a prime factor FFT algorithm.
I work in a rather wide range of projects where the combined knowledge of algorithms, arithmetic and architectures may be of benefit. These are ranging from design of semi-analogue FIR filters and non-linear digital-to-analogue converters, where the size of the analogue current sources is optimised as if they were digital coefficients, to the implementation of particle filters which are rather well defined on a mathematical level, but can be further approximated to better fit in dedicated hardware (ASIC and/or FPGA).
While the implementation of FFT processors is a rather old field, there are still advances to be made in connecting algorithms, arithmetic and architectures. Ideally, one should select FFT algorithms based on the architecture, which in turn should be selected based on the processing requirements. There are significant optimisations to be made in terms of algorithm and processing elements with more knowledge of the operation settings. This is something which has not yet been fully utilised in most cases.
A PDF version (new window) of this interview is available.
Browse or search all papers in the latest or past issues of Electronics Letters on the IET Digital Library.