Site hosted by Angelfire.com: Build your free website today!

CHAPTER 1 – SOUND BASICS AND SIGNAL CONCEPTS

1.0       Introduction

 

This chapter provides some basic principles of sound so the physical aspects are no longer an unknown. It contains first objective that have been set earlier in project specification. It took a lot of time and effort of studying and produces statement of understanding from the literature review. The author could not determine if the information is sufficient to apply in the project, but at least it acts as a foundation of the project.

 

1.1       Sound Basics

 

According to David J. MacKenzie (1996), Craig A. (2000) and Gilbert (2000), sound is vibrations of the air that our eardrums perceive, convert to nerve impulses and send to the brain. It is our brains that interpret the nerve energy and allow us to hear. Vibrating objects like guitar strings, guitar soundboards, and speakers create periodic changes in air pressure, where it is alternately increased above the normal value (when there's no sound) as they move toward people, and then decreased below the normal value as they move away from that person.

 

The number of times that the air pressure changes in any given period of time determines what pitch we hear the tone as how high or low the tone is. Each movement back and forth is called a cycle. Usually pitch is measured in cycles per second, also known as Hertz (named after a 19th century physicist). The pitch of a tone is also called its frequency, because it is determined by how frequently the air pressure is changing, which is determined by the frequency at which the part of the instrument that created it was vibrating.

The middle A key of a piano is a string vibrating at 440 Hertz; that is the standard pitch that orchestras tune their instruments to, so tunings based on it are said to be in concert pitch. When a tone has a frequency that is twice as high as another tone, we hear it as being "the same note", only an octave higher. The next A above the middle A on a piano is vibrating at 880 Hertz, and the next A below the middle A is 220 Hertz.

If one vibrating object (such as guitar string) is half as long as another one that has the same thickness and is tightened to the same tension, that shorter object will vibrate at twice the speed as the other one--one octave higher than it. Therefore, the 12th fret of a guitar, which is one octave higher in pitch than the open string, is located in the middle of the string.

The amount of difference in the air pressure determines how loud we hear a sound as being. If the pressure changes are large, we hear a loud noise. So if the air molecules (and thus our eardrums) are vibrating rapidly but only a short distance, we hear a high-pitched but quiet sound.

 

It was Jean-Baptiste Fourier (1768-1830) who first proposed that all sounds could be represented by the summation of a series of simpler sine (and cosine) waves. This gives rise to the concept of a frequency spectrum that describes the set of sine waves that makes up a given sound. A sound has a fundamental frequency which is the frequency of the sine wave with the greatest amplitude of all those that make up the sound. The simplest of sounds tend to be periodic, whereas more complex sounds may have non-periodic tendencies. Overtones or harmonics are frequencies that are near multiples of the fundamental frequency. The harmonic content of a sound determines its complexity. The Fourier transform is a mathematical device used to extract the frequency content of a complex sound.

 

1.1.1 Sampled Sound

 

Most sound processing occurs in the digital domain (via digital circuitry and microprocessors) instead of in the analog domain (using capacitors and inductors). The reasons for this include:

 

·        Digital circuitry is more stable than the equivalent analog circuitry.

·        Digital signal processing does not introduce the noise into the processed sound as analog signal processing tends to do.  

·        High speed processor today can process sound in software (in real time) instead of building custom hardware to do so.

·        Digital circuitry is cheaper to produce and maintain.

 

An Analog-to-Digital converter converts the analog signal in its input to a series of samples that represent the signal in the digital domain. Once a sample is acquired it must be stored in a known format for processing. Many sampling systems use Pulse Coded Modulation (PCM) as the standard for storage.

 

PCM only implies the quantization and digitization of analog signal. The range of values the signal can achieve (quantization range) is divided into segments and each segment is assigned a unique code word (a sequence of bits). The value that the signal achieved at a certain point in time is called sample. Pulse code modulation is what compact discs and most WAV files use. For example, in a compact disc audio recording, there are exactly 44,100 samples taken every second. Each sampled voltage gets converted into a 16-bit integer.

 

According to Henry Nyquist (1889-1976), the sampling rate determines the maximum frequency information that is preserved in the sampled signal. Nyquist established the fact that in order to recreate an analog waveform accurately from digital samples, it must have been sampled at a rate that was at least twice the frequency of the highest frequency component. This number is referred to as the “Nyquest rate”. Sampling at a lower rate than 2x the highest frequency will cause aliasing to occur. A good rule is to sample at something above twice the highest frequency.

 

1.1.2 Structure of Wave File

The project focuses how to analyze wave type of file. Hence, in order to be read data from the wav file, the structure of the wav files needed to be known. The structure of wav files is based on a file format known as Resource Interchange File Format (RIFF) defined by Microsoft.

This format was designed so that data in a file is broken up into self-described, independent "chunks". Each chunk has a prefix which describes the data in that chunk. The prefix is a four-character chunk ID which defines the type of data in the chunk, followed by a 4-byte integer which is the size of the rest of the chunk in bytes. The size does not include the 8 bytes in the prefix. The chunks can be nested. In fact, a RIFF file contains a single chunk of type "RIFF", with other chunks nested inside it. Therefore, the first four bytes of a WAV file are "RIFF", and the four bytes after that contain the size of the whole file minus 8 bytes. [Don Cross, 2000]

After the RIFF header is the WAV data, consisting of the string "WAVE" and two important chunks: the format header and the audio data itself. There may also be other chunks in a WAV file that contain text comments, copyrights, etc., but they are not needed to play the recorded sound.

1.1.3 The Wave File Format Header

 

The format header describes how the audio data is formatted in the                                    file. The WAV format header description is stated below:

 

Name

size [bytes]

description

ckID

4

The ASCII string "fmt ". Note the single trailing space character. All chunk ID's have to be 4 characters, so trailing spaces are used to pad shorter strings.

nChunkSize

4

This is a 32-bit unsigned integer which holds the length of the entire 'fmt ' chunk in bytes. Note that this and all other multi-byte integer data in a WAV file are expressed with the least significant byte first. For example, if a WAV file's the chunk size is 16, then a hex dump of nChunkSize would print out 10 00 00 00.

wFormatTag

2

This defines how the audio data is encoded in the WAV file. This value will almost always be 1, which means Pulse Code Modulation (PCM).

nChannels

2

This is the number of channels of audio present in the WAV file. For monaural sounds there is 1 channel; for stereo sounds, there are 2 channels. It is possible to have more than 2 channels but this is rare. The number of channels should never be less than 1.

nSamplesPerSec

4

The sampling rate expressed in samples per second, or Hz. The reciprocal of this number is the amount of time between samples expressed in seconds. Typical values are 11025 (telephone quality), 22050 (radio quality), and 44100 (CD quality). No sampling rates less than 8000 Hz or higher than 48000 Hz.

nAvgBytesPerSec

4

The average number of bytes per second that a player program would have to process to play this audio in real time. For PCM audio, this is redundant because we can calucate it by multiplying together the sampling rate, number of channels, and number of bytes per sample.

nBlockAlign

2

This number shows how many bytes there are to output at a single time. In PCM, this is the same as the number of bytes per sample multiplied by the number of audio channels.

nBitsPerSample

2

This field is present only in PCM recordings. It defines the number of bits per sampled audio amplitude. It will usually be either 8 or 16. Eight-bit audio files have only 256 different amplitude levels possible, so they are low quality and contain inherent "hiss" known as quantization distortion. Sixteen-bit audio files sound much better but are twice as large (assuming the same sampling rate and number of channels).

 Table 1.1 Types of wave file header

 [Adopted from Don Cross, www.intersrv.com]

Wav files are compatible with sampling rates of up to 44100 samples per second which is equivalent to that of a CD, that is to say it gives high quality sound.

 

 

 

1.2       Signal Concept

 

Signals play an important role in our daily life. Examples of signals that we encounter frequently are speech, music, picture, and video signals. A signal is a function of independent variables such as time, distance, position, temperature, and pressure. For example, speech and music signals represent air pressure as a function of time at a point in space.  [Sanjit K., 2001; John G. and Dimitris G., 1996]

 

The signal contains loudness information, frequency information, and in any real instrument harmonics and some noise. The simplified example does not do justice to a real guitar note, which is much less of a sine wave, at least in the simple case where only one note has been struck. If two or more strings have been played, both base frequencies and both sets of harmonics are present, and the supposed regularity gets really rough. Frequency and harmonics are very hard to extract useful info from, so almost all note processing goes for the amplitude envelope. [R.G. Keen, 2001]

 

The types of signals can be defined depends on the nature of the independent variables and the value of the function. For example, independent variables can be continuous or discrete. Here the author only discusses continuous- time signals, discrete-time signals and digital signals.

     

1.2.1 Continuous-Time Signals

        

Continuous-time signals also referred to as analog signals. A signal is said to be continuous if its derivative is defined everywhere, and is said to be discontinuous if it is not. It is important to note that “continuous time” does not imply that a signal is a mathematically continuous function, but rather that it is a function of a continuous-time variable. [A speech signal is an example of an analog signal. [Fred J., 1994; Rodger E., William H. and D. Ronald, 1998; Sanjit K., 2001]

 

1.2.2 Discrete-Time Signals

 

A discrete-time signal is a signal defined by specifying the value of the signal only at discrete times, called sampling instants. If the sample values are then quantized and encoded, a digital signal results. A digital signal is formed from a continuous-time signal through the process of analog-to-digital conversion. In this project, the author is going to do analysis on those discrete-time signals which stored in wave file. [Fred J., 1994; Rodger E., William H. and D. Ronald, 1998; Sanjit K., 2001; John G. and Dimitris G., 1996]

 

 

 

 

1.2.3 Digital Signals          

 

Digital signals are discrete-time signals that are also quantized along the dependent axis. One way to produce digital signals is to pass a discrete-time sample signal through an analog-to-digital converter. ADC will quantize a sample value into one of 2n finite values. [Fred J., 1994] Examples of these three classes of signals can be found in Figure 1.1.

 

 

Figure 1.1 [Fred J., 1994] Signal hierarchy consisting of analog, discrete-time or sampled, and digital or quantized processes

 

1.3 Summary

 

After studying on those sound basics and signal concept, the author gained more understanding the underlying background knowledge of digital field. This is essential in order to develop digital signal processing system. In the next chapter, the author moved on discuss what digital signal concept is and neural network is also included.
 

Home   Chapter 1 Chapter 2 Chapter 3   Chapter 4