Chapter 3        DSP ALGORITHMS AND NEURAL                                                                   NETWORK SYSTEM

3.0       Introduction

3.1       DSP algorithms

3.1.1 Fourier Transform

3.1.2 Fast Fourier Transform

3.1.3 Applications of FFT

3.2       Neural Network

3.2.1 Multilayer Feedforward Networks

3.2.2 Single-Layer Feedback Networks

3.3       Summary

CHAPTER 3 – DSP ALGORITHMS AND NEURAL NETWORK SYSTEM

3.0       Introduction

This chapter focuses on study on various algorithms that applied to the project and also programming tools which is suitable for the project. There are many algorithms for manipulating the digital signals; however, we are going to discuss two types of famous algorithm which are fourier transform and fast fourier transform.

For programming tools, there are a few programming tools that has been learnt by the author. The author will discuss features of several programming tools and make comparison between them.

3.1       DSP algorithms

Basically we going to cover two fast algorithms for computing the Discrete Fourier transform (DFT). In 1965, J.W. Cooley and J.W. Tukey published a milestone paper defining what is now called the fast Fourier transform or FFT. If the continuous-time signal can be sampled to produce a time-series x [n], an estimate of the fourier transform can be produce using equations which we will discuss later. Obviously, this has enormous importance, since it makes spectral analysis works.  [Fred J., 1994]

3.1.1 Fourier Transform

The Fourier transform is a mathematical concept used for determining the frequencies, and their amplitudes, of any data which can represented by a wave or for image data. The main principle behind this concept is that any wave is made up of N frequencies each with a certain amplitude, which may be zero, all added together. [Fred J., 1994; Rodger E., William H. and D. Ronald, 1998; Sanjit K., 2001; John G. and Dimitris G., 1996]

If the product of two waves of frequency k and m are taken then the result is a new wave of frequency k-m. If we then take the sum of all the points in this new frequency we get zero because all the positives in the wave have corresponding negatives which cancel each other out.

However if we took the product of two waves of the same frequency then we would get a wave of frequency zero which is equal the one. Summing all N values in this frequency we get a value of N.

In order to calculate the amplitude of a frequency m the wave is multiplied by a wave of frequency m. This has the affect of setting all the frequencies to zero except that of m that is all other frequencies are eliminated.

The equation for the fourier transform is:

(source from http://www.icaen.uiowa.edu/~dip/LECTURE/LinTransforms.html)

x[n] = the values of the wave at sample point n.

N = total number of samples.

n = current sample value.

X[m] = amplitude of the frequency m.

Further detail, the Fourier Transform is based on the discovery that it is possible to take any periodic function of time x(t) and resolve it into an equivalent infinite summation of sine waves and cosine waves with frequencies that start at 0 and increase in integer multiples of a base frequency f0 = 1/T, where T is the period of x(t). Here is what the expansion looks like:

An expression of the form of the right hand side of this equation is called a Fourier Series. The job of a Fourier Transform is to figure out all the ak and bk values to produce a Fourier Series, given the base frequency and the function x(t). Assume that the a0 term outside the summation as the cosine coefficient for k=0. There is no corresponding zero-frequency sine coefficient b0 because the sine of zero is zero, and therefore such a coefficient would have no effect.

Of course, we cannot do an infinite summation of any kind on a real computer, so we have to settle for a finite set of sines and cosines. It turns out that this is easy to do for a digitally sampled input, when we stipulate that there will be the same number of frequency output samples as there are time input samples. Also, we are fortunate that all digital audio recordings have a finite length. We can pretend that the function x(t) is periodic, and that the period is the same as the length of the recording. In other words, imagine the recording repeating forever, and call this repeating function x(t). The duration of the repeated section defines the base frequency f0 in the equations above. In other words, f0 = samplingRate / N, where N is the number of samples in the recording.

For example, take a sampling rate of 44100 samples/second, and the length of the recording is 1024 samples, the amount of time represented by the recording is 1024 / 44100 = 0.02322 seconds, so the base frequency f0 will be 1 / 0.02322 = 43.07 Hz. If process these 1024 samples with the FFT, the output will be the sine and cosine coefficients ak and bk for the frequencies 43.07Hz, 2*43.07Hz, 3*43.07Hz, etc. To verify that the transform is functioning correctly, generate all the sines and cosines at these frequencies, multiply them by their respective ak and bk coefficients, add these all together, and this will get the original recording back. [Don Cross, 2001]

3.1.2 Fast Fourier Transform (FFT)

The Fourier Transform is very useful in many applications however it is very computationally expensive. To illustrate this, take an example if fourier transform a wave with 1024 sample points would require 1024 loops to sum the sample values of the wave, this would have to be done 1024 times for each frequency component this would lead to over one millions loops. It is obvious then that what is needed is a more feasible method. This comes with the fast fourier transform.

The main idea behind the fast fourier transform is that of decomposition. This is where the original signal is broken down into smaller and smaller, interleaved subsequences. This process is done by taking the entire even valued sample and evaluating them and then taking all the odd values and evaluating then. The odd values must be multiplies by a factor to take account of the fact that the odd values are shifted on space to the right relative to the even values. This will later be referred to as the twiddle factor. The odd and even values are then added to get the original fourier transform value. The process of decomposing the values can be repeated on each subsequence until only two values remain in the subsequences. [Richard G.Lyons, 1997]

If all the decomposed equations are written out for a small four element example transform, then it could easily be seen that there is a lot of redundant calculations that is where the same calculations are preformed repeatedly. The fast fourier transform eliminates this by calculating a value once but using it more than once.

The basic computational element of the FFT is the butterfly. As discussed the transform has been decomposed into its two element parts. The butterfly provides the way for them to be recombined to obtain the amplitude of the frequency. Each butterfly is not only providing a solution to one of the frequency but it also provides data for all the other frequency amplitudes to be calculated at the same time. The butterfly consists of one addition, a subtraction and a multiplication by a twiddle factor. A fourier transform which has 2^n input values will have n stages to it and each stage will have (2^n)/2 butterflies. This is a lot better than (2^n)*(2^n) loops needed for the original transform.

The following FFT algorithm basically consists of two for loops inside of which a butterfly, which was mentioned earlier, is performed. This consists of taking two of the inputs adding then to produce on of the outputs and subtracting them to produce another which is then multiplied by a twiddle factor.

There are various different twiddle factor values which depends on the position in the sequence the butterfly and also on its stage. In order to improve speed all the possible twiddle factor values are stored in an array of data. This is efficient because the some values are used over and over again and this saves us from having to calculate the values repeatedly. Other values which need to be calculated on numerous occasions are also placed into variables which are used in their place.

Diagram 3.1 butterfly calculation

The above diagram gives an illustration of a bufferfly where x[a] and X[b] are two of the inputs. Both a and b are used twice to work out the outputs of X+1[a] and X+1[b]. Tw represents the twiddle factor multiplied by X[a] – X[b]. The outputs are used as the input values to the next stage.

3.1.3 Applications of FFT

The FFT algorithm tends to be better suited to analyzing digital audio recordings than for filtering or synthesizing sounds. A common example is when one wants to do the software equivalent of a device known as a spectrum analyzer, which electrical engineers use for displaying a graph of the frequency content of an electrical signal. He can use the FFT to determine the frequency of a note played in recorded music, to try to recognize different kinds of birds or insects, etc. The FFT is also useful for things which have nothing to do with audio, such as image processing, scientific/statistical applications, analyzing seismographic information to take "sonograms" of the inside of the Earth or even analyze DNA sequences.

The main problem with using the FFT for processing sounds is that the digital recordings must be broken up into chunks of n samples, where n always has to be an integer power of 2. One would first take the FFT of a block, process the FFT output array then perform the inverse FFT to get a filtered time-domain signal back. When the audio is broken up into chunks like this and processed with the FFT, the filtered result will have discontinuities which cause a clicking sound in the output at each chunk boundary. For example, if the recording has a sampling rate of 44,100 Hz, and the blocks have a size n = 1024, then there will be an audible click every 1024 / (44,100 Hz) = 0.0232 seconds, which is extremely annoying to say the least.  [Don Cross, 2001]

3.2       Neural Network

A neural network is basically an attempt to model the workings of the brain and how it is believed to learn. Although this is the underlying principle neural nets are implemented in many different ways some of which are not biologically sound. Neural network concept has been discussed in earlier chapter. The following are types of neural network being studied.

3.2.1 Multilayer Feedforward Networks

Multilayer networks are often called layered network. They can implement arbitrary complex input/output mappings or decision surfaces separating pattern classes.

Each layer of a multilayer network is composed of a linear network. It is based in the original concept of the linear discriminant function. Multilayered networks are of the feedforward type and are functionally similar to the networks such as single-layer continuous perceptron networks and multicategory single-layer perceptron networks.

Although multilayer learning machines have exits for more than 25 years, the lack of appropriate training algorithms has prevented their successful applications for practical tasks. The most important attribute of a multilayer feedforward network is that it can learn a mapping of any complexity. The network learning is based on repeated presentations of the training samples. [Jarek M., 1992]

3.2.2 Single-Layer Feedback Networks

Single-layer neural networks are inherently feedback-type nonlinear networks. Neurons with either a hard-limiting activation function or with a continuous activation function can be used in such systems.

In the case of a discrete-time operation, also called recursive, single-layer feedback networks can be termed as recurrent. A recurrent network is a discrete-time dynamical system, which, at any given instant of time, is characterized by a binary output vector.

Fully coupled single-layered neural networks represent nonlinear feedback systems. Such systems are known to possess rather complex dynamics. Fortunately, single-layer feedback networks represent a class of networks that allows for great reduction of the complexity. As a result, their properties can be controlled and solutions utilized by neural network designers. This presents possibilities for solving optimization problems and applying such networks to modeling of technological and economical system.[Jarek M., 1992]

3.3 Summary

The basic ideas behind the fast Fourier transform (FFT) used for faster computation of the DFT samples are explained and several commonly used FFT algorithms are derived. From the findings, it is very obvious that Fast Fourier transform is better than fourier transform. Hence, the author is going to implement the Fast Fourier transform algorithm in the coding of the project.  After comparison of two types of neural networks, the author decided to use multilayer network with three layers. An input layer with six inputs for the six harmonics will be used to determine the instrument type. The hidden layer with four elements and an output layer with two outputs is used to state if the harmonics are for a guitar or not. The learning algorithm used is backward error back propagation.