# MODERN DESIGN OF 16 POINT FFT ALGORITHM USING GRAPH TECHNIQUE FOR OFDM RELEVANCE'S Avuluri Anjaneyulu<sup>1</sup>, S.V.V.N. Chanukya<sup>2</sup>, L.Srinivas Reddy<sup>3</sup> <sup>1</sup> M. Tech Scholar (DECS), Nalanda Institute of Engg & Tech. (NIET), Siddharth Nagar, Guntur, A.P, (India) <sup>2</sup> Asst. Professor (ECE), Nalanda Institute of Engg & Tech. (NIET), Siddharth Nagar, Guntur, A.P, (India) <sup>3</sup> Assistant. Professor (ECE), Nalanda Institute of Engg & Tech. (NIET), Siddharth Nagar, Guntur, A.P. (India) #### **ABSTRACT** DFT and FFT are widely using transforms in the applications of DSP. These are useful where filtering and correlation is needed. In this article, we are going to present 16 point FFT (Fast Fourier Transform) and the time to complete the total computation. Main goal of this paper is to implement 16 point FFT with Radix 4 with less computational time. In DSP, FFT is a fundamental building block. Here the FFT algorithm is of Decimation in Frequency (DIF). This algorithm is widely applied in digital communication systems. This is designed by using Verilog HDL and synthesized on XILINX tool. Keywords— Decimation In Time And Frequency, Discrete Fourier Transform And Fast Fourier Transform ### I. INTRODUCTION The method of changing from time representation to frequency representation is called the Fourier Transformation. In Fourier analysis, present most and preferred transform is discrete fourier transform (DFT). The DFT equation for a sequence is defined as follows: $$X(k) = \sum_{n=0}^{N-1} x(n) W_N^{kn} \quad , \qquad 0 \le k \le N-1$$ $$W_N = e^{-j2\pi/N}$$ Where $W_N$ is twiddle factor. Discrete Fourier Transform is a method used in signal processing. One special and efficient method used to compute the DFT process. FFT is a faster version DFT. This FFT method includes butterfly operation. In these butterfly operations of FFT, twiddle factors have referred as the root-of-unity complex multiplicative constants. These twiddle factors have used to combine smaller discrete fourier transforms as recursively. Practically, there is only real values will present at the input because of the time domain but we are assuming imaginary values including real values at the input sequence for our convenience. The direct computation of X(k) for each value of k performs N complex multiplications (4 times of N - real multiplications) and complex additions are amount of N-1 (4 times of N-2 - real additions). So, to compute all N values, the operation of DFT need to require complex multiplications of $N^2$ and complex additions are an amount of $N^2 - N$ . A standard three loop structure can able to use in the computation of FFT. Radix will differentiate the FFT process. Mainly there are two different radix are present in the process i.e., radix-2 and radix-4. Here 2 and 4 indicates the size of the butterfly. Representation of the number of data points in DFT makes a difference in both radix-2 and radix-4. In radix-2, the representation is in the power of 2 i.e., $N = 2^{\nu}$ and in radix-4, the representation is in the power of 4 i.e., $N = 4^{\nu}$ . Radix-4 can able to perform the operation four times faster than the radix-2. For radix-4, 4 banks of memory needed to store the data. Number of stages for radix-2 is $log_2 N$ and for radix-4, it is $log_4 N$ . Number of multiplications per stage is N/2 thus total multiplications $N/2 log_2 N$ in radix-2 and coming to radix-4 it is 3N/4 and total of $3N/8 log_2 N$ . Coming to number of additions per stage is N and total of $N log_2 N$ in radi-2 and for radix-4 it is 3N and total of $3N/2 log_2 N$ . We need to minimize the memory storage. For that, butterfly results must be stored in the same locations they were obtained. From this, the N- point FFT requires N-complex valued locations. ### II. ALGORITHM AND ANALYSIS OF FFT FFT processor is a first designed chip. In OFDM transmitter and receiver it is positioned at the centre. Butterfly operations will have to do for the result. These butterfly operations are the building blocks of FFT. In FFT algorithm, several techniques have developed. Here we are going to consider two techniques i.e., DIT and DIF. The difference between two butterfly operations of these two techniques has shown in below. Fig 1: (A) Radix 2 DIF Butterfly (B) Radix 2 DIT Butterfly As shown in above figure, the difference in between these two types is position of twiddle factor multiplication. It is performed either after or before the addition and subtraction. The hardware architecture of FFT has divided into two classes i.e., memory based architecture and pipeline based architecture. ### 2.1 Memory Based Architecture This architecture is shown in below fig 2. The memory based architecture increases utilization rate of the elements of butterfly processing and reduces the multiple butterfly's redundancy. This architecture has one or two large memory blocks and all the data is accessed from the block of centralized memory where as in pipelined, it uses small separated memory blocks, those cooperate with the local arithmetic units. Depending on the butterfly processing element and the buffering strategies utilized during the process of FFT, memory based architecture has divided into several independent types. This architecture have several subunits, these are: - a) Controller - b) A set of buffer - c) ROM - d) Butterfly processing unit - e) Main memory. Fig 2: Architecture of the Memory Based ## 2.2 Pipeline Based Architecture This pipelined architecture is non-stopping process. It has smaller latency with low power consumption. It is characterized by real-time. For processor which has using pipeline architecture need to replace a series of smaller memories with a N-word memory block. In this type of architecture input data is read in series only. This architecture has classified into two types. These are: - I. Single path architectures - II. Multi path architectures. There are two kinds of buffering strategies were present in this architecture. These are: - I. Delay-Commutator Architecture - II. Delay-Feedback Architecture The FFT processor has a modular design and it consists of the following modules: - 1. Butterfly processor - 2. Address generation unit and - 3. Micro coded state machine Fig 2: Architecture of the Overall FFT Processor #### 2.3 Butterfly Processor To carry out the complex butterfly operations, this processor is present. Fig 3: Butterfly Operation We used CSDs (canonical signed digit multiplication) instead of complete digital multiplier to do the multiplication process with the twiddle factors. In CSDs, the multiplication has done with shifts and additions. #### 2.4 Address Generation Unit (AGU) The address bus, which is going to the memory has controlled by this address generation unit. The FFT processor works concurrently on the 8 dual port memory banks for read and write. The address mapping scheme has used here to ensure that read and write operations hasn't worked on same memory location at the same time. It controls 8 read address buses and 8 write address buses. #### 2.5 Micro Coded State Machine As the FFT processor's process is going on there is need to store or to generate control signals at every level of processing. For that this machine has needed. It stores or generates all the control signals, and its progression is controlled by the clock. It is necessary to reset the state machine counter and signals before starting a new 16 point FFT operation, a reset signal is used to reset i.e., "en\_fft". Another signal "done\_fft" is asserted by the processor after the completion of the FFT calculation. We need to swap the real and imaginary parts if the inverse FFT process is to do. ## III. PROPOSED METHOD Fig 4: Butterfly Structure of 16 Point Radix-4 DIF FFT The proposed 16 point radix 4 DIF FFT butterfly diagram has shown in below fig 4. For hardware realization of FFT, multi-bank memory and "in place" addressing strategy are often used for the realization of FFT hardware. These are holds a key operation to speed-up the memory access time and minimize the hardware consumption i.e., requires less area and low power consumption. Already have to know that r banks of memory are needed to store data for radix-r FFT. There is the each memory bank could be two-port memory. The outputs are replaced in the same memory location where the inputs have stored. This is the process called as "in-place" strategy. An efficient addressing scheme is needed to realize the FFT process in parallel and pipelined. This scheme is used to avoid the conflict of data. ## IV. SIMULATION RESULTS Here we have designed the FFT using the Verilog HDL. The same design is synthesised in Xilinx ISE 13.2. This is design of 16 bit. The resultant wave forms are shown here. | Y13r[12:0] | 111111111 | 11111111111000 | |-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------------------------------------| | ▶ 🌃 Y14r[12:0] | 111111111 | 11111111111000 | | ¥ 15r[12:0] | 111111111 | 11111111111000 | | ▶ 🌃 Y0i[12:0] | 000000000 | 0000000000 | | ▶ 🌃 Y1i[12:0] | 000000010 | 0000000100111 | | ▶ 🌃 Y2i[12:0] | 00000000: | 0000000010010 | | ▶ 🌃 Y3i[12:0] | 000000000 | 0000000001011 | | ► 🌃 Y4i[12:0] | 000000000 | 0000000000000 | | ¥ 15i[12:0] | 000000000 | 0000000000110 | | ► 🌃 Y6i[12:0] | 000000000 | 000000000011 | | ▶ 🌃 Y7i[12:0] | 000000000 | 00000000000 | | ► 🌃 Y8i[12:0] | 000000000 | 000000000 | | Y9i[12:0] | 111111111 | 111111111 | | ¥ 10i[12:0] | 111111111 | 1111111111110 | | <b>13 (0.10)</b> (1.10) (1.10) (1.10) (1.10) (1.10) | 111111111 | 11111111111111000 | | | | l: 18.902200000 ms | | 127[2210] | | 1111111111000 | | Y14r[12:0] | 111111111 | 11111111111000 | | | | | | Y14r[12:0] | 11111111 | 111111111111000 | | Y14r[12:0] | 111111111 | 1111111111000<br>11111111111000 | | 714r[12:0] 715r[12:0] 70i[12:0] | 11111111:<br>11111111:<br>000000000 | 1111111111000<br>11111111111000<br>0000000000 | | 714r[12:0] 715r[12:0] 70:[12:0] 71:[12:0] | 11111111111111111111111111111111111111 | 1111111111000<br>11111111111000<br>0000000000 | | ₩ Y14r[12:0] ₩ Y15r[12:0] ₩ Y0i[12:0] ₩ Y1i[12:0] ₩ Y2i[12:0] | 11111111111111111111111111111111111111 | 1111111111000<br>11111111111000<br>0000000000 | | | 11111111111111111111111111111111111111 | 1111111111000<br>11111111111000<br>000000000 | | ₩ Y14r[12:0] ₩ Y15r[12:0] ₩ Y0i[12:0] ₩ Y1i[12:0] ₩ Y2i[12:0] ₩ Y3i[12:0] ₩ Y4i[12:0] | 11111111111111111111111111111111111111 | 11111111111000<br>11111111111000<br>00000000 | | Y14r[12:0]<br> Y15r[12:0]<br> Y0i[12:0]<br> Y1i[12:0]<br> Y2i[12:0]<br> Y3i[12:0]<br> Y4i[12:0]<br> Y5i[12:0] | 11111111111111111111111111111111111111 | 1111111111000 11111111111000 00000000 | | Y14r[12:0]<br> Y15r[12:0]<br> Y0i[12:0]<br> Y1i[12:0]<br> Y2i[12:0]<br> Y3i[12:0]<br> Y4i[12:0]<br> Y5i[12:0]<br> Y6i[12:0] | 11111111111111111111111111111111111111 | 1111111111000 11111111111000 00000000 | | Y14r[12:0] Y15r[12:0] Y15r[12:0] Y11[12:0] Y2i[12:0] Y3i[12:0] Y4i[12:0] Y5i[12:0] Y6i[12:0] Y6i[12:0] | 11111111111111111111111111111111111111 | 1111111111000 11111111111000 00000000 | | Name | Value | 0 ms 5 ms 10 ms 15 ms | 54 | |---------------------------------------------------------------------------------------------|----------------------------------------------------------------------|----------------------------------------------------------------------|-------| | Y10i[12:0] | 111111111 | 111111111111 | | | Y11i[12:0] | 111111111 | 1111111111011 | | | Y12i[12:0] | 111111111 | 1111111111000 | | | Y13i[12:0] | 111111111 | 111111111111111111111111111111111111111 | | | Y14i[12:0] | 11111111 | 1111111101101 | | | Y15i[12:0] | 11111110: | 1111111011000 | 00000 | | la clk | 1 | 2000000 | *** | | X0r[7:0] | 00000000 | 000000¢0<br>000000¢1 | | | <ul><li>X1r[7:0]</li><li>X2r[7:0]</li></ul> | 00000010 | 00000010 | | | X3r[7:0] | 00000011 | 0000011 | | | X4r[7:0] | 00000100 | 00000100 | | | ▶ <b>■</b> X5r[7:0] | 00000101 | 00000101 | | | v Verifinal | 00000010 | 20000010 | | | VEI[1.0] | 00000010 | | | | V2-17-01 | 0000000000 | | | | ▶ 🚮 X3r[7:0] | 00000011 | 00000011 | | | > <b>5</b> X3r[7:0]<br>> <b>6</b> X4r[7:0] | 00000011 | 0000011 | | | S - 2 SV 3 | 1.3240.30.1.20x125.x3x111.23x | | | | > <b>₩</b> X4r[7:0] | 00000100 | 00000100 | | | > ■ X4r[7:0]<br>> ■ X5r[7:0] | 00000100<br>00000101 | 00000100<br>00000101 | | | | 00000100<br>00000101<br>00000110 | 00000100<br>00000101<br>00000110 | | | X4r[7:0] X5r[7:0] X6r[7:0] X7r[7:0] | 00000100<br>00000101<br>00000110<br>00000111 | 00000100<br>00000101<br>00000110<br>00000111 | | | X4r[7:0] X5r[7:0] X6r[7:0] X7r[7:0] X8r[7:0] | 00000100<br>00000101<br>00000110<br>00000111<br>00001000 | 00000100<br>00000101<br>00000110<br>00000111 | | | X4r[7:0] X5r[7:0] X6r[7:0] X7r[7:0] X8r[7:0] X9r[7:0] X9r[7:0] | 00000100<br>00000101<br>00000110<br>00000111<br>00001000<br>00001001 | 00000100<br>00000101<br>00000110<br>00000111<br>00001000 | | | X4r[7:0] X5r[7:0] X6r[7:0] X7r[7:0] X8r[7:0] X8r[7:0] X9r[7:0] X10r[7:0] | 00000100<br>00000110<br>00000111<br>00000100<br>00001001 | 00000100<br>00000110<br>00000111<br>00001000<br>00001001 | | | X4r[7:0] X5r[7:0] X6r[7:0] X7r[7:0] X8r[7:0] X9r[7:0] X10r[7:0] X11r[7:0] | 00000100<br>00000110<br>00000111<br>000001000<br>00001001 | 00000100<br>00000110<br>00000111<br>00001000<br>00001001<br>00001010 | | | X4r[7:0] X5r[7:0] X6r[7:0] X7r[7:0] X8r[7:0] X9r[7:0] X10r[7:0] X11r[7:0] X12r[7:0] | 00000100<br>00000110<br>00000111<br>00001000<br>00001001 | 00000100 00000101 00000110 00000111 000001000 00001001 | | # V. CONCLUSION This paper presented the efficient FFT architecture of 16 point radix 4 FFT. It is with the technique of Decimation in time. The design is developed by using Verilog HDL and synthesized on XILINX tool. This paper has concluded that the reduction of operations and complexity of the system over existing systems. The reduction of operations makes the reduction in area on chip. By this, we have proved that the technique consumes less power and area. Future research has to base on the 64, 4K, 16K and more number of points and with highly efficient architecture. #### REFERENCES - [1] Saad Bouguezel, M. Omair Ahmad, "IMPROVED RADIX-4 AND RADIX-8 FFT ALGORITHMS" IEEE. Department of Electrical and Computer Engineering from Concordia University 1455 de Maisonneuve Blvd. West Montreal, P.Q., Canada. - [2] Ali Saidi, "DECIMATION-IN-TIME-FREQUENCY FFT ALGORITHM" Motorola Applied Research, Paging and Wireless Data Group Boynton Beach. - [3] J.G.Proakis and D.G.Manolakis, "Digital Signal processing, Principles, algorithms and applications" Prentice Hall India Publication. - [4] L. Xiaojin, L. Zongsheng Lai, and C. Jianmin Cui, "A low power and small area FFT processor for OFDM demodulator", IEEE Transactions on Consumer Electronics, 53(2):274-277, May 2007. - [5] M. C. Pease, "Organization of large scale Fourier processors", J. Assoc. Comput. Mach., 16:474–482, July 1969. - [6] E. H. Wold and A. M. Despain, "Pipeline and parallel-pipeline fft processors for vlsi implementations," IEEE Trans. Comput., vol. 33, no. 5, pp. 414–426, May 1984. - [7] M. Hasan and T. Arslan "Implementation of low power FFT processor cores using a novel order-base processing scheme," in Proc. IEEE Circuits Devices Syst., June 2003, vol. 150, pp. 149–154. - [8] D. Cohen, "Simplified control of FFT hardware", IEEE Trans. Acoust., Speech, Signal Processing, 24:577-579, December 1976. - [9] Alan V. Oppenheim, Ronald W. Schafer, "Digital Signal Processing", 3rd edition, Pearson Education, 2009. - [10] C.Gonzalez, V.Rodellar, A.Alvarez, E.Martinez and P.Gomez, "A Portable Hardware Design of a FFT Algorithm", in Proc. Latin American Applied Research, 37:79-82(2007), pp 79-82. - [11] S. He and M. Torkelson, "A new approach to pipeline FFT processor",10th International Parallel Processing Symposium (IPPS 96), April 1996, pp 766-770. - [12] Xilinx Co. "Xcell journel vol 58-59", 2007 spring. - [13] Xilinx, Inc., <a href="http://www.xilinx.com/">http://www.xilinx.com/</a> #### **AUTHOR DETAILS** **AVULURI ANJANEYULU**, pursuing M. Tech (DECS) from Nalanda institute of Engineering and Technology (NIET), Siddharth Nagar, Kantepudi village, Satenepalli mandal, Guntur Dist., A.P, INDIA. His interest in digital concepts and has speciality to develop algorithms. **S.V.V.N.** Chanukya, has received his M.tech degree and currently working as an Assistant professor (DIP) from Nalanda institute of Engineering and Technology (NIET), Siddharth Nagar, Kantepudi village, Satenepalli mandal, Guntur Dist., A.P, INDIA. He has specialized in image processing and has ability to design filter techniques. **L.SRINIVAS REDDY**, He completed his post graduation in DECS. His area of interest includes digital electronics, digital communication, digital system design and VLSI technology and design. His research areas are optimal communication technology. He is currently working as Asst.professor (ECE) from Nalanda institute of Engineering and Technology (NIET), Siddharth Nagar, Kantepudi village, Satenepalli Mandal. Guntur Dist., A.P,