Audio, Image and Video Processing: October 2012

Friday, 26 October 2012

Week 5: Dynamic Range

In the lab for week 5 we were asked to open the 'englishwords' WAV file in Soundbooth and work out its dynamic range. Unfortunately though Soundbooth does not use decibels as a measurement, instead using dBFS (decibels below full scale) which results in negative values. As far as I'm aware it is not possible to get a log of a negative number so I could not find out the dynamic range.

The dynamic range of an audio signal is the ratio between the largest and smallest possible values.

Sunday, 21 October 2012

Week 4: Lab (audio wave processing)

As I found the lab a bit too loud (with earplugs) and couldn't download the audio files we were supposed to be working on (due to Google's download limit) I decided to do the lab activity at home. The only difference being that I would use Adobe Soundbooth CS5 instead of CS4.

The first task after downloading the two music files - 'sopranoascenddescend.wav' and 'english words2.wav' was to play the soprano file in Windows Media Player and look at how long it was. Of course when using Windows 7 you could just click on the file in Explorer and get the length from there, but I played it anyway. It was 7 seconds long.

I have never used Adobe Soundbooth before so that was a new experience. I have however used Adobe Audition - which has more features but a very similar workspace. Therefore I already knew how to use all the basic tools.

I applied a Reverb effect to the file to see what it would sound like. It added an echo to the words in the file which gave the impression that it had been recorded in a very large echoey room.

I then undid that effect and added a Special: Sci-Fi Sounds effect instead. This resulted in the file sounding like the words had been spoken underwater.

Lastly I tried using a Voice: Telephone effect. This made the voice on the file sound much quieter and slightly electronic and distorted, giving the impression that the voice had been recorded from a telephone conversation.

Saturday, 20 October 2012

Week 4: Hearing

How human hearing works

The human ear is responsible for converting variations in air pressure – from speech, music, or other sources – into the neural activity that our brains can perceive and interpret. The ear can be divided into three sections: the outer ear, the middle ear and the inner ear. Each of these parts performs a specific function in processing sound information.

Sound waves are first collected by the outer ear, which is made up of the external ear (also called the pinna) and a canal that leads to the eardrum. The external ear amplifies sound, particularly at the frequency ranges of 2,000 to 5,000 Hz – a range that is important for speech perception. The shape of the external ear is also important for sound localisation – picking up where the sound is coming from.

From the ear canal, the sound waves vibrate the eardrum, which in turn vibrates three tiny bones in the middle ear. These three tiny bones are called the malleus, incus and stapes. The stapes vibrates a small membrane at the base of the cochlea (which is called the oval window) which transmits amplified vibrational energy to cochlea, which is full of fluid. The round window separates the tympanic canal from the middle ear.

The inner ear converts sound into neural activity. The auditory portion of the inner ear is a coiled structure called the cochlea. The region nearest the oval-window membrane is the base of the spiral; the other end, or top, is referred to as the apex.

Inside the length of the cochlea are three parallel canals; the tympanic canal, the vestibular canal, and the middle canal. The main elements for converting sounds into neural activity are found on the basilar membrane, a flexible structure that separates the tympanic canal from the middle canal.

This diagram shows the cochlea ‘unrolled’ so that we can see the basilar membrane more clearly.

The basilar membrane is about five times wider at the apex (top) of the cochlea than at the base, even though the cochlea itself gets narrower towards its apex. It vibrates in response to sound transmitted to the cochlea from the middle ear.

High frequency sounds displace the narrow, stiff base of the basilar membrane more than they displace the wider, more flexible apex. Mid-frequency sounds maximally displace the middle of the basilar membrane. Lower frequency sounds maximally displace the apex.

Within the middle canal and on top of the basilar membrane is the organ of Corti. The organ of Corti is the collective term for all the elements involved in the transduction of sounds. It includes three main structures: the sensory cells (hair cells), a complicated framework of supporting cells, and the end of the auditory nerve fibres.

On the top of the organ of Corti is the tectorial membrane. The stereocilia of the outer hair cells extend into indentations in the bottom of the tectorial membrane.

The movement of fluid in the cochlea produces vibrations of the basilar membrane. These vibrations bend the stereocilia inserted into the tectorial membrane. Depending on the direction of the bend, the hair cells will either increase or decrease the firing rate of auditory nerve fibres.

References:

The human ear - http://bcs.whfreeman.com/thelifewire/content/chp45/4502001.html

Sunday, 14 October 2012

Week 3: Digital Signal Processing

This week we covered quite a lot of topics. Ironically I couldn't listen to most of the lecture due to having my fingers in my ears to block out the painful sound effects included with the slides. Turns out learning about audio processing is tricky when you have hypersensitive hearing.

Missing most of the lecture made the class test afterwards quite difficult. I could not process what we were meant to be doing in the lab either due to the lack of written instructions and too much background noise so I have just read through the lecture notes, picked out the topics I hadn't yet covered in my blog and researched them.

Harmonics

Harmonics are the specific frequencies created by standing waves. The 'fundamental' of the harmonic is the loudest tone you can hear. A harmonic is a integer multiple of the fundamental's frequency (e.g. if the fundamental is 'f' you can get 2f, 3f, 4f, etc.).

Sound intensity and level

Sound intensity measurements are extremely difficult to make so the intensity of sound is generally expressed as an equivalent sound level. This is done by comparing any sound to a standard sound intensity (the quietest 1KHz tone the average human can hear).

Echoes and Reverberation

Reverberations are the reflections of sound waves hitting off various different surfaces (e.g. walls, desks, ceiling, floor) and reaching your ear. Since they take less than 0.1 secs of time difference to reach your ear from the original sound source they are processed all together and may just result in hearing the sound for a slightly prolonged time.

Echoes on the other hand are reflections of sound that have a time delay of more than 0.1 secs. Because of this delay there is a gap between the original sound and the reflection. The second sound hear is called an echo.

The Inverse-Square Law

The Inverse-Square Law means that the intensity of the sound heard varies inversely as the square of the distance 'R' from the source of the sound.

Sound will be roughly nine times less intense at a distance of 3m from its origin, as at a distance of 1m in open air.

Spectrum

A spectrum is a graph of sound level (amplitude) against frequency over a short period of time. Since many sound waves contain different frequencies this graph is often useful.

Spectrogram

Instead of a spectrum, the variation of sound intensity with time and frequency can be displayed by representing intensity by colour or brightness on a frequency vs time axis. This is called a spectrogram.

Digital Signal Processing Systems

Steps in a Digital Signal Processing System:

1. The signal is inputted via a microphone or other recording equipment.

2. The recording is then converted from analogue to digital (into binary numbers).

3. Editing is then done to the digital copy (e.g filtering, pitch warp, echo, reverb, etc.).

4. The signal is then changed from digital back into analogue.

5. Then the signal is smoothed out.

6. The edited recording is outputted.

Computers cannot understand analogue signals, which is why they must be converted into digital first and then converted back again so that we can listen and process them.

Why use digital processing?

1. Precision

Precision of DSP systems is only limited by the conversion process at both input and output - analogue to digital and vice versa. This is only in theory though since in reality the sampling rate and word length (no. of bits) restrictions affect the precision.

2. Robustness

Digital systems are less susceptible to component tolerance and electrical noise (pick-up) variations due to logic noise margins.

An important factor for complex systems is that adjustments for electrical drift and component ageing are essentially removed.

3. Flexibility

Flexibility of the DSP is due to its programmability, which allows it to be upgraded and for its processing operations to be expanded easily without necessarily incurring large scale hardware changes.

Sound card architecture

Sampling a signal

Sampling a signal is when the system samples the signal at a specific time, nT seconds. It the continues sampling the signal over periods of T seconds.

The rate that a signal is usually sampled at is double the frequency of the human hearing range. For example, a signal heard at 10 Hz would be sampled at 20Hz.

References:

Echos and reverberation: http://www.physicsclassroom.com/mmedia/waves/er.cfm
Spectrum graphic: http://www.tablix.org/~avian/blog/archives/2008/11/the_sound_of_hot_tea/
Sampling: http://cnx.org/content/m15655/latest/

Friday, 5 October 2012

Week 2: Getting familiar with waves

1. In a recording room an acoustic wave was measured to have a frequency of 1 KHz. What would its wavelength in cm be?

Velocity (v) = Frequency (f) x Wavelength (λ)

The velocity of sound in air is 340 m/s. Since we already know that the frequency of the wave is 1 KHz (or 1000 Hz) it is easy to work out the wavelength using the above equation.

Wavelength (λ) = Velocity (v) / Frequency (f)

= 340 / 1000

= 0.34 m

Then we convert the value into cm. The wavelength is 34 cm.

2. If a violinist is tuning to concert pitch in the usual manner to a tuning fork what is the likely wavelength of the sound from the violinist if she is playing an A note along with sound from the pitch fork?

The frequency of an 'A' on the violin is 440 Hz and its wavelength is 0.77 m (λ = v/f). This is the same as the frequency and wavelength of a concert pitch A note tuning fork. Therefore the two sine waves will add together to form a wave with a frequency of 880 Hz and a wavelength of 0.39 m. (This is of course assuming that the violinist is actually in tune when she plays her A note.)

3. If an acoustic wave that is traveling along a work bench has a wavelength of 3.33 m what will its frequency be?

Firstly we need to know what the material that the work bench is made from. For the purposes of this question lets presume it is made from wood. The velocity of sound in wood changes depending on the type of wood from about 3300 - 3600 m/s. The work bench is probably quite hard wood so about 3600 m/s should be okay.

Frequency (f) = Velocity (v) / Wavelength (λ)

= 3600 / 3.33

= 1081 Hz

Why do you suppose that is it easier for this type of wave to be travel through solid materials?

Sound travels faster through solids rather than in air because of the nature of gases and solids. Gases like air have particles with quite a lot of space around them, meaning that when the sound wave hits these particles and starts them moving they will take longer to reach the next particles. On the other hand, in solids the particles are packed really tightly together. Almost as soon as the sound wave hits one particle it will hit the next one because they are so close together. This makes the velocity of sound in solids much faster than in gases.

4. Sketch a sine wave accurately of amplitude 10, frequency 20Hz. Your sketch should show two complete cycles of wave. What is the duration of one cycle? What is the relationship between the frequency and the duration of one cycle?

The relationship between the frequency and duration is Time (sec) = 1 / Frequency (Hz)

5. Research the topic “Standing Waves”. Write a detailed note explaining the term and give an example of this that occurs in real life.

A standing wave (or stationary wave) is a wave that remains in the same position and does not move. Most sound waves that reach our ears are not standing waves. Normally waves travel outwards, slowly spreading out and losing strength. A standing wave is created when the wave is 'trapped' between two or more surfaces. Musical instruments work by using trapped sound waves to produce different pitches and tones (sounds with a particular pitch). To be a tone a group of sound waves has to be very regular and all exactly the same distance apart.

When the sound waves are produced or bounce back off the end of a container, if it is the perfect length for that certain wavelength then instead of interfering with each other and cancelling each other out, the waves will reinforce each other. If you could watch the reinforced waves it would look like they are standing still - which is why they are called standing waves.

All standing waves have nodes, where there is no wave motion, and antinodes, where the wave is largest. The nodes determine which wavelengths will fit into a certain container/musical instrument.

6. What is meant by terms constructive and destructive interference?

Interference is caused when two waves meet while travelling along the same medium.

Constructive interference is when the two waves have a displacement in the same direction. The displacement of the waves adds together to create a larger displacement.

Destructive interference is when the two waves have displacement in opposite directions. This causes them to cancel each other out or 'destroy' each other's displacement.

7. What aspect of an acoustic wave determines its loudness?

The amplitude of an acoustic wave determines how loud is it. The volume of sound waves is measured in decibels (dB).

8. Why are decibels used in the measurement of relative loudness of acoustics waves?

Decibels (dB) are used to measure the intensity or loudness of a sound. The decibel scale has to cover a vast range of sound intensities because the human ear is so incredibly sensitive. We can hear everything from whispers to jet engines. A jet engine is about 1,000,000,000,000 times louder than the quietest audible sound.

On the decibel scale, 0 dB is the smallest audible sound. It is almost total silence but not quite. A sound 10 times more powerful than 0 dB is 10 dB, 100 times more powerful is 20 dB and 1,000 times more powerful is 30 dB.

Some common sounds and their decibel ratings:
Almost total silence - 0 dB
A very quiet whisper - 15 dB
Normal conversation - 60 dB
A lawnmower - 90 dB
A car horn - 110 dB
A rock concert or a jet engine - 120 dB
A gunshot or firecracker - 140 dB

All these dB measurements were taken while standing near the sound. The further away you get from the source of the sound, the less powerful and intense it becomes.

Any sound above 85 dB can cause hearing loss. This loss is related both to the length you are exposed to the sound as well as its power or intensity. Eight hours of 90 dB sound can cause damage to your ears and any exposure to 140 dB sound causes immediate damage as well as pain.

Younger people can hear a greater range of sounds than older people. This is because your hearing becomes damaged as you age and you are less able to pick up the higher frequencies. Some people can be hypersensitive to sound, which means that a normal level (such as conversation at 60 dB) will sound louder to them and may actually cause pain.

9. How long does it take a short 1KHz pulse of sound to travel 20m verses a 10Hz pulse?

The frequency of a sound wave does not affect the time it takes to travel. Therefore the 1 KHz wave and the 10 Hz wave will take the same amount of time to travel 20 m.

We can work out the time using the equation Time (t) = Velocity (v) / Distance (d)

So the time taken = 340 / 20 = 17 seconds.

10. Does sound travel under water? If so what effect does the water have?

The speed of sound in water is 1500 m/s. It travels considerably faster through water than through air because in a liquid the particles are closer together than in a gas.

Summary of key learning points:

The velocity of sound changes depending on what medium it is travelling through - the more solid the material the faster the velocity.
Interference refers to how the displacement of two waves interacts together, either by adding together or destroying each other.
The frequency of a sound wave does not affect the time it takes to travel a certain distance.
Amplitude determines how loud a sound wave will be. The volume is measured in Decibels (dB).

References:
Basic wave theory - http://www.passmyexams.co.uk/GCSE/physics/basic-waves-theory.html
BBC Bitesize (General properties of waves) -
http://www.bbc.co.uk/schools/gcsebitesize/science/aqa/waves/generalwavesrev3.shtml
Standing waves - http://cnx.org/content/m12413/latest/
Musical notes and their frequencies/wavelengths: http://liutaiomottola.com/formulae/freqtab.htm

Constructive and destructive interference:

http://astro-canada.ca/_en/a2313.php and http://www.physicsclassroom.com/class/waves/u10l3c.cfm

Decibels: http://www.howstuffworks.com/question124.htm