Audio, Image and Video Processing: November 2012

Friday, 30 November 2012

Week 10: Lab (video and audio practice)

Today's lecture was about video processing, so during the lab our task was to take raw footage shot by a digital video camera man and edit it into a short film that lasted 1 minute. The footage was of bird life around a small Scottish loch.

The criteria of the task were:
- You should apply appropriate backing music (and occasionally tastefully mix this with original audio in the original footage). Music has been made available to you by a sound engineer and is downloadable.

- The final film should attempt to emote certain subjective qualities such as "drama and cuteness" as its main selling points.

- The film should be exactly 1 minute long.

- The film should include at least three sub-scenes editing together smoothly to create a natural flow to the final movie.

- You may use any of the features/effects/transitions of the video editing and audio editing software that you employ to make your final product attractive.

Week 10: Lecture (moving image and video)

The Moving Image

John Logie Baird is often credited with inventing the television, but in fact despite designing and making a working mechanical TV system, his system was not adopted due to it being so unreliable.

Persistence of Vision

'Persistence of Vision' is an old theory that now is not believed to be true. It is the phenomenon of the human eye were an after image was thought to persist for approximately one twenty-fifth of a second on the retina. This is now regarded as the 'myth of persistence' and it is no longer accepted that human perception of motion (in the brain) is the result of persistence of vision (in the eyes). A more modern and more plausible theory to explain motion perception are two distinct perceptual illusions : phi phenomenon and beta movement.

Bitrate

The bitrate is the number of bits that are processed/conveyed per unit of time. It describes the rate at which bits are transferred from one location to another (i.e. it measures how much data is transmitted in a given amount of time).

Interlaced and Progressive video

Interlaced video is a technique of doubling the perceived frame rate without consuming extra bandwidth. The TV tricks your eyes by first drawing the odd number lines on the screen 25 times per second. Then the even lines of the next frame and so on. Progressive video does not interlace and appears sharper.

Display resolution

The display resolution of a digital television, computer monitor or display device is the number of distinct pixels in each dimension that can be displayed. Modern HD televisions can display 1080i or 1080p - the 'i' and 'p' stand for interlaced or progressive video.

Video file formats

I decided to research more about video file formats and how they are used.

Platform: PC

File formats used:

WMV (Windows Media Video): Microsoft’s family of proprietary video codecs, including WMV 7, WMV 8 and WMV 9. This format uses the VC-1 codec.

MPEG-4: a standard developed by the MPEG (Moving Picture Experts Group)

MPEG-2: an older standard developed by the MPEG (Moving Picture Experts Group)

AVI (Audio Video Interleave): a multimedia container format introduced by Microsoft in 1992.

FLV (Flash Video): Flash video files usually contain material encoded with codecs following the Sorenson Spark or VP6 video compression formats. It is developed by Adobe and used on the popular video streaming site YouTube.

MOV (Apple QuickTime Movie): common multimedia format often used for saving movies- compatible with both Macintosh and Windows. QuickTime format uses the H.264 codec.

PC: WMV (Windows Media Video)

Windows Media Video 9 (WMV9) is Microsoft’s implementation of the VC-1 codec.

WMV9 supports three profiles: Simple, Main and Advanced. The Simple and Main profiles support a wide range of bit-rates. These include bit-rates appropriate for high-definition content as well as video on the internet at dial-up connection speeds. The Advanced profile supports higher bit-rates.

VC-1 can be used to compress lots of different streaming and downloadable video content, from podcasts to HD movies on demand. It is designed to achieve state-of-the-art compressed video quality at bit rates that can vary from very low to very high. It can easily cope with 1920 x 1080 pixel video at 6-30 Mbps for high-definition video. It is capable of handling up to a maximum bit rate of 135 Mbps. VC-1 works using a block-based motion compensation and spatial transform scheme similar to that used in other video compression standards such as H.264 and MPEG-4.

Platform: Apple Macintosh

File formats used:

MOV (Apple QuickTime Movie): the standard format that QuickTime uses, along with MPEG-4.

MPEG-4: a standard developed by the MPEG (Moving Picture Experts Group)

MPEG-2: an older standard developed by the MPEG (Moving Picture Experts Group). Only supported by the most recent version of Apple’s operating system, OS X Lion.

AVI (Audio Video Interleave): a multimedia container format introduced by Microsoft in 1992.

Apple Macintosh: MOV (QuickTime Movie)

The MOV video file format is a QuickTime movie. It is used for saving movies and other video files and uses a proprietary compression algorithm developed by Apple, which is compatible with both Macs and PCs. The format is a multimedia container file that contains one or more ‘tracks’- which each store a particular type of data including audio, video, effects or text (e.g. for subtitles). Each of the tracks either contains a digitally-encoded media stream or a data reference to the media stream located in another file.

Both the MOV and MP4 formats can use the same MPEG-4 codec (also known as H.264), so they are mostly interchangeable in a QuickTime-only environment. The codec uses lossy compression, which means that an algorithm is used to remove any image data that is unlikely to be noticed by the viewer. Lossy compression does not retain the original data meaning that the data that is removed is lost. The amount of data lost depends on the degree of the compression.

MP4/MPEG-4 - based on MOV (QuickTime Movie)

MP4 is a video file format that uses MPEG-4 compression, which is a standard developed by the Moving Picture Experts Group (MPEG). It uses separate compression for both the audio and video tracks. The video is compressed using AAC compression, which is the type of compression used in .aac files.

MPEG-4 uses lossy compression, which uses an algorithm to remove image information that is unlikely to be noticed by the viewer. Lossy compression does not retain the original data meaning that the data that is removed is lost. The amount of data lost depends on the degree of the compression.

The Apple iTunes store uses the M4V file extension for their MP4 video files, including TV episodes, movies and music videos. These files use MPEG-4 compression but also may be copy-protected using Apple’s FairPlay digital rights management (DRM) copy protection. This means to play the file the computer must be authorised to access the account it was purchased with.

Comparison of file formats/codecs

MPEG-4/H.264 and VC-1 are both industry standards. However, just because a platform or mobile device supports one of these “standards,” it doesn’t guarantee compatibility, because H.264 and VC-1 consist of many different “Profiles” and “Levels.”

A Profile outlines a specific set of features that are required in order to support a certain delivery platform. A Level gives the limits of the Profile, such as the maximum resolution or bit-rate. H.264 Profiles that are suitable for web video applications are: Baseline Profile (BP) for limited computing power and Main Profile (MP) for typical user’s computers. VC-1 uses different options- these are Low and Main Profiles, both at low, medium or high levels depending on the user’s target platform.

At high bit-rates, the differences between MPEG-4/H.264 and VC-1 are small:

Under high bandwidth stress, H.264 tends to degrade to softness instead of blockiness, while VC-1 tends to retain more detail, but can show more artefacts due to its stronger in-loop deblocking filter. At web/download bit-rates, both codecs offer great quality and there is no difference between them.

The MPEG-4/H.264 codec tends to take more CPU power to decode at maximum resolution. Comparing VC-1 Advanced Profile and H.264 High Profile, each with all of its options turned on, VC-1 can decode about twice as many pixels per MIPS (million instructions per second). This is more important when considering HD-resolution content than lower resolution content and doesn’t matter at all when there’s a hardware decoder. When constrained by software decoder performance, VC-1 can handle Advanced Profile, while H.264 is limited to Baseline, where VC-1 outperforms H.264 in quality/efficiency. This has become an important consideration in recent years, because consumer broadband access has outpaced the hardware upgrade cycle of many households and businesses.

While there do not seem to be many differences between the VC-1 and MPEG-4/H.264 in terms of desktop computing (except for HD quality video), the Windows Media Video format has considerably less support on the Mac platform so H.264 has the advantage there. There is not the same difference in support on the Windows PC platform, however since the Windows Media Video format (and therefore VC-1 codec) is proprietary to that platform it would make sense to say it has a slight advantage.

Friday, 23 November 2012

Week 9: Lab (Adobe Premiere Pro)

During today's lecture we continued looking at image processing. Next week we start looking at video processing so the task for today's lab was to research Adobe Premiere Pro and watch several tutorials on how to use it.

This is the basic interface of Adobe Premiere Pro CS5.5:

Since I studied Interactive Media at college for two years I have used Adobe Premiere Pro before in my video classes. I used both Adobe Premiere Pro 2.0 (at college) and Premiere Pro CS4/CS5.5 (at home) so I am quite familiar with how to perform basic editing techniques on video clips, using simple effects and exporting into different file formats optimised for different platforms - including PC, Mac, iOS, Android and other mobile devices.

Week 9: Lecture (digital image processing)

What is digital image processing?

The lecture divided DIP into four common categories: analysis, manipulation, enhancement and transformation.

Analysis
These operations provide information on the photo-metric features of an image, such as a histogram or colour count.

Manipulation
Operations such as flood fill and crop are classed as manipulating the image.

Enhancement
To try and enhance an image operations such as heightening the contrast, enhancing the edges or anti-aliasing are used.

Transformation
Images can be transformed using operations such as skew, rotate and scale.

Image processing
Images take up a lot more memory storage than audio files. Compression algorithms are used to reduce the file size without losing quality or data. This allows more image files to be stored in the same amount of space.

The optical nerves work like wires that connect the eyes to the brain. Digital image processing allows technology in lenses to be implemented in the same way. This means that a good knowledge of optics is very important in order to print good photos. The lenses in a camera are directed to bend the light so that it lands onto sensor arrays - which are grids of thousands of microscopic photocells. This grid creates picture elements (pixels) by sensing the light intensity of the image. If the sensor array has too low a resolution then the picture will look pixelated or blocky.

Friday, 16 November 2012

Week 8: Lab (manipulating images using arithmetic)

Today's lecture was on light and images. The task for the lab is to open an image in Photoshop CS4 and manipulate it using filters and arithmetic.

First I opened the file 'ch_tor.jpg' in Photoshop and rotated it so that it was the right way up. I then applied a custom filter to it.

As you can see, the custom filter made the image much sharper and very grainy.

The high pass filter made the image look quite monotone, with a glow. I thought it was quite a nice effect.

The maximum present filter mask made the image look brighter while the minimum filter made the image look darker. They both made the image look slightly blurred.

I experimented with several other filters to see the effects. Two of the filters that I tried were the Stylised: Find Edges filter and the Pixelate: Colour Halftone filter.

Find Edges:

Colour Halftone:

I then created a new filter that was all zeros except for the centre value, which I made 1. After applying this filter I did not see any difference to the image.

Next I created a filter with a two by two matrix of 1s at its centre. This filter made the image significantly whiter.

Lastly I created another filter with a two by two matrix of 1s on one diagonal and -1s on the other. This made the image almost entirely black with only a very faintly visible outline.

Week 8: Lecture (Light)

Today's lecture was on light. Light is a form of energy that is detected by the human eye. It can seem to behave like a wave in some conditions and like a stream of particles in other situations. Light is not like sound - it does not need a medium to travel through. This is why light can reach us from the sun despite space being a vacuum.

Like water, light is a transverse wave. The vibrations are at right angles to the direction of motion from the source of the light.

The visible light spectrum ranges from red to violet. Just outside of this spectrum is infrared and ultraviolet. If you go lower in frequency than infrared you get microwaves and radio waves, while going higher in frequency than ultraviolet there are x-rays and gamma rays.

'White' light is made up of the entire visible spectrum. If you shine light into a glass prism it will refract and bend to show all the different colours.

The velocity of light depends on the medium it is travelling through. In a vacuum it travels at 3*10⁸m/s (or 300,000 km/s). It slows down slightly in air (but is still about 1 million times the speed of sound) and by about 2/3 in glass.

Friday, 9 November 2012

Week 7: Lab (audio processing)

The task for today's lab was to download a wav file called 'speechtone.wav' from the supplementary materials blog (here) and examine the file audibly, in time and frequency. Since I decided to carry out this task on my laptop at home (due to not being able to concentrate in the lab) I used Adobe Audition CS 5.5 to view and edit the file.

Here is what the sound file looked like when I first opened it:

I found the file incredibly difficult to listen to or work with due to the tone in the background. To be perfectly honest it made me feel like putting my fingers in my ears! Despite my discomfort from my hypersensitive hearing I still attempted the rest of the task.

The next part of the task was to devise a strategy to improve the quality of the speech contained in the wav file. The first thing I tried was to add a special effect of 'speech enhancement'. This did not seem to improve the actual quality of the speech though.

I then tried another effect (after undoing the first) called 'speech volume leveller'. This particular effect only seemed to reduce the volume level of the entire file, not specifically the speech.

The last effect I tried was to 'normalise' the wav file. This did improve the quality of the speech, but also made the background tone even more unbearable to my ears!

I will come back to the last three parts of the task once I have worked out a strategy to make the background tone less painful to listen to.

-----------------------

I could not work with the file without hurting my ears, so after doing some research I decided to write down what I think would have worked instead.

The speech in the file is hard to make out because there is a continuous tone in the background. This tone has a consistent sound level. Therefore to remove the high-frequency tone you could go into the Spectral Frequency Display, select the high frequency area and remove it. To improve the speech left behind you could apply a vocal enhancer effect.

To make the voice sound angry you could increase the volume and also make the start of each word louder.

Lastly, to make the file sound as if it was recorded in a church hall a convolution reverb could be added to it.

Friday, 2 November 2012

Week 6: Lab (analysing and modifying signals)

Musical and Non Musical Audio

Applying effects using Soundbooth - Compression:

First I opened Soundbooth CS4 and the 'sopranoascenddescend' WAV file. I applied a compression effect to the waveform. While this compressed the waveform and changed how it looked it didn't seem to affect how it sounded.

Compressors:

Compressors are used to limit or decrease the dynamic range between the quietest and loudest parts of an audio signal. They achieve this by boosting the quieter parts of the signal and reducing the louder parts.

Compression is used to make a sound more consistent and even. Instead of some parts being very quiet and some parts very loud, the overall volume is evened out.

Benefits of using a compressor are:

- increasing the overall gain of a wave by compressing the peaks
- useful for making vocal performance stand out more from the backing instruments

- used in broadcasting to boost the volume of audio signals and reducing the dynamic range so that they can be broadcasted by narrow-range broadcast signals.

Looking at Spectrograms:

Then I selected the spectral frequency display view. This is what was displayed on the screen in Soundbooth:

The program is showing the wave's frequency over time as a spectrogram.

This is the spectral frequency display for the 'englishwords' WAV file. As you can see it is quite dramatically different. Instead of a continuous pattern there are gaps with no colour, showing the gaps between the words. Most of the speech energy appears to be in the lower frequencies - particularly up to 1KHz.

Applying effects using Soundbooth - Reverbs:

The next task was to apply a convolution reverb with a 'clean-room-aggressive' preset to the 'englishwords' WAV file. This effect made the file sound like it was recorded in a large room since there was a slight delay to the words.

After that, I applied the same reverb effect, but with a 'roller disco aggressive' preset instead. This made the file sound very far away from the listener.

About Reverbs:

Reverberations in a room are the reflections of sound waves hitting off various different surfaces (e.g. walls, desks, ceiling, floor) and reaching your ear. Since they take less than 0.1 secs of time difference to reach your ear from the original sound source they are processed all together and may just result in hearing the sound for a slightly prolonged time.

Speech transcript:
Next I reopened the original 'englishwords' WAV file and selected Edit/SpeechTranscript Transcribe.

The result was:
[Speaker 0] welcome the police allow for sure I hope compassion
[Speaker 1] in mystery the unity he quality creativity free and help steer it inspiration really is

Since the actual transcript is completely different it seems like the computer is attempting to make sense of the random words by trying to put the sounds into sentences.

Computer Speech Transcription:
Computer Speech Transcription works by detecting the words that are said in digital audio files and converting them into written words. This process is very complicated and in terms of performance it is based on speed and accuracy.