Image and Video Compression
© Mercury Communications Ltd - July 1992
Technology advances in different areas often develop together in a symbiotic manner: none more so than the combination of IC technology, compression algorithms, and multimedia. Multimedia, once firmly ensconced in the camp of a technology with little practical use, is now blossoming in telecommunications and computers products. In telecommunications, justifiable cost video-conferencing services are now available. The environment for running multimedia applications on personal computers is now standardised in the Multimedia-PC (MPC), standard and is embedded in Microsoft Windows 3.1. Up and coming PC standards include the Audio Video Interleaved (AVI), standard that will manage a software-only audio/video playback system for PCs.
All video based applications rely on one underlying technology - compression algorithms. Any article on multimedia seems to exude compression related acronyms, such as H.320, H.261, G.728, DVI, CDI, JPEG and MPEG. This issue looks at video compression and tries to make sense of all the activity.
The Need for Compression
There are two closely interlinked reasons for compression. Firstly, to reduce the size of stored or transmitted files to manageable sizes, or to reduce the time it would take to transmit these files to another computer. Secondly, to reduce the real-time bandwidth required to transmit time-sensitive video data across a communication link.
Let's take file compression first. Figure 1 shows how the number of bits required to display a full colour PC screen has increased since 1982. Not only has the resolution increased from a monochrome 720x350 pixel display to the now common 1,024x768 SVGA standard, but the number of bits per pixel has increased from 1 to 24. The 24-bit TrueColour standard allows true photographic quality images to be displayed. The price to be paid for this high resolution is that 2.3Mbyte of raw data needs to be stored to refresh just a single screen.
Figure 1 - Growing PC File Sizes
To demonstrate the amount of storage required for colour images, a 35mm colour slide scanned at 12µm for a 2,048x3072 pixel (a pixel is the smallest area of a display that can be addressed), 24 bits/pixel, display requires 144Mbytes of data. This means that a standard 600Mbyte CD-ROM could only hold 36 images at most. A lower resolution example is an 8" by 11" colour photo, scanned at 300 dots/inch and 24-bits/pixel for a photographic quality image, which would require 25 Mbytes of memory without compression. Consequently, a typical 40-Mbyte PC disk could only store one image. The data transfer rate of a typical hard-disk is about 1Mbyte/s. so it would take 25 seconds to capture or transfer the image to disk. Transmitting the file over a typical 1Mbyte/s. Ethernet LAN would take about 2 minutes, and over a 2,400 baud modem would take around 5 hours.
For full-motion video the numbers get much more dramatic. The required capacity for one second of uncompressed colour video at 25 frames per second, at a resolution of 512x480 pixels at 24-bits/pixel, requires 153Mbyte. Thus a 600Mbyte CD-ROM would only hold 4 seconds of full-motion uncompressed video. At a data-transfer rate of 150kbit/s it would take over an hour to replay 30 seconds of video! Without compression, 90Mbit/s of bandwidth is required to support video.
These numbers make it clear why size and bandwidth compression is key to making multimedia available to everyone. It is also explains why multimedia market growth is only just starting - PCs are now just achieving the processing power and storage capabilities at an affordable price to support multimedia applications (although it should be remembered that workstations have had the capability for many years). This fact, in combination with developments in compression algorithms, is making multimedia a reality for all of us.
There is a tremendous amount of activity across the market place in compression technology. Some of it is proprietary activity, due to historical reasons or inventiveness. Many areas are now covered by standards necessitated by the need of users and software to interoperate.
Utilities for compressing individual files on a PC or MAC have been around for many years in the form of shareware packages. The two most common are ZIP and ARC. Files compressed by these programs can be easily identified on a PC as they use the file extensions .ZIP and .ARC. These are based on popular compression algorithms such as lempel-ziv-welch (LZW), and run length encoding ( RLE). Both algorithms work by substituting long common patterns in the data, with shorter codes representing them. As all compressed data is retained, they are known as lossless compression algorithms and they can attain, depending on the form of the data, a compression ratio of up to 4-to-1. Since the calculations for this type of compression are very simple, compression and decompression are quick and do not require algorithms to be executed in hardware to obtain higher speeds.
As compressed files are data files, it is important that no data in the compression and decompression cycle is lost. When compressing still images or video streams this is less important than to achieve compression beyond 4-to-1. To achieve compression beyond this ratio, data needs to be lost which is seen in the form of lowered resolution. These algorithms are called lossy.
File compression utilities are also symmetric. This means that about the same time is taken to compress a file as to de-compress it. An asymmetric algorithm takes more time to compress than de-compress. If a particular algorithm can achieve high compression ratios, but is severely asymmetric, its applicability is constrained.
Hard Disk Compaction
Hard disk compaction is a variation of file compression and has been driven by a short term problem caused by the voracious needs of Windows 3.1 for disk space. This, combined with the fact that the average hard disk is still only 40 Mbyte, means that upgrading users to Windows instantly run out of disk space. Two US vendors have marketed disk compaction utilities, in the form of SuperStor and Stacker. Both of these utilities are transparent to the user and effectively double the size of the hard disk. Whenever a file is saved to disk, it is compressed, and whenever it is read it is de-compressed. This is an application that needs a symmetric algorithm. Both of these algorithms are proprietary, but are based on similar concepts to those used for file compaction. It is interesting to note that when the software is used on a 486 based PC, disk access times are degraded by only 15%, but if used on a slower machine, such as a 286, a hardware version of the compressor should be used.
Name Description Fractal Transforms Proprietary algorithm developed by ISI Wavelet Transforms Proprietary algorithm Photo-CD Propriety lgorithm developed by Kodak & Philips JPEG Still-Frame for ltimedia
Table 1 - Still Image Algorithms
Still Image Compression
Still image compression is a key technology for multimedia applications on computers. As described earlier, uncompressed files are usually just too unwieldy for uncompressed storage and transmission. There are many proprietary algorithms, but the four main ones are Fractal, Wavelet, Photo-CD and Joint Photographic Experts Group, (JPEG) as shown in Table 1 above.
Fractal based image compressio until recently received a good deal of scepticism as it seemed very avant garde and unrealisable. However, in February of this year the main commercial protagonist of this methodology, Iterated Systems Inc., licensed its technology to Microsoft and, by doing so gained overnight credibility.
Fractals are a particularly elegant branch of pure mathematics that enables the generation of extremely complex structures using only very simple equations. The image detail generated by these equations increases on each iteration of the algorithm. In application to compression, an image is split into small blocks and a Fractal equation representation of that area is developed. Decompression then only involves running these Fractal equations to regenerate the image. The nearer the starting image is to the final image, the fewer the number of iterations required to regenerate the picture. For example, for motion video, the preceding frame is used as the starting point. Unlike JPEG, where a compression session is started by specifying the required quality of the compressed image, you start Fractal compression by specifying the required file size.
The first description of the technique was by Arnaud Jacquin now at Bell Labs. The exact details of the algorithm used by ISI are a commercial secret, but ISI has indicated that they correspond to those detailed in Jacquin's paper. The supporters of Fractal compression claim three distinct performance advantages over discrete cosine transform (DCT) based techniques found in JPEG; high compression efficiency, resolution independence, and software decompression. In terms of compression efficiency ISI claim ratios of up to 75-to-1, reducing a 768 kbyte colour image to a mere 10 kbytes. However, this can be achieved by DCT techniques. The big advantage though, is that resolution is increased with each iteration of the decompression software. Thus it is possible to trade-off decompression time against display resolution. A Fractal representation of an image is threfore totally independent of the resolution of the display, allowing images to be easily scaled.
The principle limitation of Fractal compression is that it is inherently asymmetric and encoding is computationally intensive. ISI supplies a PC plug-in card for compression which takes 12 minutes to compress a 640x400 pixel, 24-bit colour image. On the other hand, software decompression is quick enough to support 30frames/s video applications.
In the UK, Racal Radio is using ISI's Fractal compression in its Pictor system for transmitting images over HF radio.
Photo CD is a new concept recently launched by Kodak. Selected High Street photo finishers and provide a set of 35mm prints, negatives and a Photo CD disk for around £18. Photo CDs are not locked to local TV standards and may be played back on all Photo CD machines, compact interactive, CD-I decks, and CD-ROM-XA drives on PCs.
Kodak's system relies on a proprietary compression system developed jointly by Kodak and Philips called Photo YCC. Photo YCC preserves picture quality but reduces data considerably. Each image is stored as a hierarchy of components, extending from low resolution images with 128x192 pixels, through to high resolution images of 2048x3072 pixels. This image hierarchy is held as an image pack, varying in size from 3 to 6 Mbytes, with average size being around 4.5 Mbyte. An uncompressed image occupies 18 Mbytes of storage. A compact disk stores 600 Mbyte of data, so a Photo CD system can hold up to 100 photos.
The Photo YCC algorithm is asymmetric and requires the use of a high-power workstation to undertake the scanning and compression utilities. Software is available to run the system under Windows 3.1, enabling the transfer of high resolution colour images to desktop published documents. Without any stretch of the imagination, this system will have a tremendous impact of the multimedia market which has been begging for such facilities for many years. It is likely that this technology will prematurely terminate the electronic camera approach that also addressed these needs.
The CCITT/ISO Joint Photographic Experts Group (JPEG) standard proposes compression / decompression algorithms aimed primarily at both grey scale and colour images. The standard proposes a lossy symmetrical encoding technique based on DCT and a uniform quantiser. The standard is expected to be approved soon.
DCT compression involves first dividing the image into 8x8 pixel blocks. A block may include low-frequency elements, a uniform background for example, or many high-frequency elements if it contains a lot of detail. Each block's information is transformed to the frequency domain. The transformed data is then filtered and quantised, a fancy way of saying that the high-frequency elements are thrown away as they are of less significance than low-frequency elements. The removal of these high frequency components forms the compression action. The more these high-frequency elements are discarded, the smaller the resulting file and the lower the resolution of the reconstituted image. The last step is to compress the quantised data using either Hoffman or run-length encoding (RLE) coding, both of which are lossless algorithms. Although JPEG represents a great step forward, it is not perfect. It suffers from two shortcomings: blocking, the undesired visibility of colour block boundaries, and undesired colour shift.
Depending on the computer running the JPEG software, image expansion can range from 1 second on a 486-based machine to 10 seconds in a 286-based machine. The reduction of a 150 kbyte file to 5,318 bytes, a compression ration of 26-to-1, took only 6 seconds on a 33 MHz 386-based machine. If the algorithm is executed in a hardware chip, such as those marketed by Oak Technology, or C-Cube, an 8" by 11" colour photograph can be expanded in 100 mS.
In 1991, a group of companies including Sun, C-Cube, Radius, and NexT got together and defined the JPEG file interchange(JFIF). Until then the formal standard did not include enough information to actually convert a JPEG compressed image stored in a file back to an image.
Wavelet Image Compression (WIC) is a response to some of the limitations of the DCT algorithm. The DCT converts a waveform to sums of cosine waves. But cosines are regular, repeating functions so, DCT tends to mask sudden changes in frequencies such as a black pixel next to a white pixel. This causes blurring around what should be distinct edges, as well as perceived colour shifts.
WIC essentially follows the same principles as a DCT, but decomposes an image into two sequences, one for high-frequency data, the other for low-frequency data. These Wavelet functions are theoretically better at preserving edges, because they capture, rather than discard, high-frequency data. The software performs about four or five transforms in succession, iterating the image down to its most basic elements.
WIC is still too new to have made much headway in the market or to allow any real conclusions to be drawn about its quality. But, it certainly shows a lot of promise.
There are two algorithms getting attention at the moment as shown in Table 2.
Name Description MPEG Motion-video for multimedia DVI Proprietary algorithm for video for multimedia form Intel Corp.
Table 2 - PC Motion Video Algorithms
The ISO/CCITT Moving Picture Experts Group (MPEG) standard, formed in 1988, focuses on compression of motion video for use with multimedia applications on a PC. The original intent of the MPEG was to support VHS quality at a resolution of 352 by 240 pixels with 1.2 - 3Mbyte/s data rates using a CD-ROM for storage and retrieval. But to improve image quality, the MPEG committee is considering raising the speed limit of MPEG-I to 5 Mbit/s which equates to a four fold increase in performance. An MPEG-II is also being proposed for data rates of up to 10 Mbit/s for use with HDTV.
Like JPEG, MPEG uses the DCT encoding technique and provides, good quality images at compression ratios of up to 100-to-1. The MPEG algorithm compares adjacent frames to detect spatial differences between moving objects in the image. This allows substantial reduction in the amount of data needed to represent a sequence of video frames. To achieve the decompression rate needed to support 30 frames/s, MPEG requires a hardware based algorithm in the form of a chipset. The standard also addresses audio compression to synchronise the audio signal with the video image. MPEG standards are currently evolving very rapidly and are far from finalised.
In 1989 Intel announced the Digital Video Interactive (DVI) low-cost chipset. DVI was originally developed at RCA's David Sarnoff Research Centre. When this work was taken over by Intel it was refocused towards obtaining full-motion video from a CD-ROM. CD-ROM has a maximum data rate of 1.5Mbit/s so a compression ratio of 144-to-1 is required. With this level of compression, a CD-ROM can store more than an hour of video.
DVI is closely associated with two RCA algorithms, production level video (PLV) and real time video (RTV). PLV is efficient in storage density, but is severely asymmetric. Compression is currently performed on an eight Intel 860 RISC-based system, and even with this horsepower it requires 50 minutes to compress 1 minute of video. This limits the technology to the distribution of precompressed video on CD-ROM. Compression by users is not feasible with DVI. Although Intel hoped that DVI would become the de facto compression standard for PC moving video. It likely that it will be superseded by MPEG in coming years.
In September 1992, Intel announced that Microsoft's Audio Video Interleaved (AVI) software-only playback system running under Windows 3.1 would be delayed so that the DVI algorithm could be supported. AVI is a simple compression technique that principally relies on the removal of redundant frames which can lead to jerkyness. The first product that Microsoft will launch for the Christmas 1992 market will be 'Cinemania', a CD-ROM encyclopaedia of films. The CD will store 19,000 film reviews from 1914 to 1991. Stills can be cross-referenced by actor, award, and title.
Video conferencing is now a major market area brought about by recent advances in desktop video technology, chipsets, and compression standards as shown in Table 3 below.
Name Description CTX Proprietary algorithms developed by Compression Labs. CTX+ SG3 Proprietary algorithm developed by Picturetel. H.130 CCITT standards for working at 2.048Mbit/s. H.261 CCITT Video coding, also known as PX64. H.320 CCITT umbrella standard for narrow bandwidth audio- visual systems. G.728 CCITT audio compression at 16kbit/s.
Table 3 - Video Conferencing Standards
In 1984, when video conferencing was very expensive, Europe adopted the CCITT's H.120 and H.130 standards that worked at 2.048 Mbit/s. US companies at that time decided that it was possible to achieve usable systems at far lower bandwidths and ploughed ahead to develop proprietary systems that did just that. An example of this is Picturetel's system using 384kbit/s. This was based on a commercially proprietary algorithm known as SG3. Other companies followed the same path, the most prominent of which was Compression Labs.
If the user base was to expand, standards were needed to promote interoperability. This arrived in the form of the H.320 series, which includes H.261 - video coding and compression, H.221 - framing information, H.230 - control and indication signalling, and H.242 - call set-up and disconnect. A typical video encoder/decoder otherwise known as a codec is shown in figure 2. A codec is a video equivalent to a modem.
Figure 2 - A typical Video Codec
Like JPEG and MPEG, the H.261 algorithm is based on DCT. Without compression at least 90 Mbit/s of bandwidth is required to handle a full-motion video conference.
The basic building block of a video picture is a pixel. The highest quality codecs support around 210,000 pixels, while videophone products have less than 15,000 pixels per frame. This does not mean that the quality of a picture produces by a high-quality codec is ten times better than a videophone. Even a low number of pixels looks good from far away.
The quality of the picture is determined not only by the number of pixels, but also by the colour rendition, contrast and smoothness of movement. Picture smoothness is determined by how often the screen is refreshed. Standard European PAL television refreshes the screen 50 times per second in an interlaced manner, effectively creating a refresh rate of 25 frames per second. To compress large amounts of data, video codecs often reduce frame rates to 10 or 15 per second, although state-of-the-art algorithms operate at the full 25 frames/sec rate.
Most effort is going into optimising video quality at 112 and 128 kbit/s bandwidths. To do this involves several procedures prior to the video undergoing DCT compression. In effect, these techniques reduce the amount of information that must be sent via the codec, thereby making efficient use of the bandwidth. There are two types of intraframe filtering, temporal filtering and spatial filtering. If a picture element on a particular frame is not significantly different to that on the previous frame, temporal filtering averages the mathematical representation of the element thus reducing the amount of data that needs to be sent to the codec. Spatial filtering is used to handle the virtually still parts of the frame. Take, for example, a black and white pattern on a wall behind videoconference participants. A lot of information would be required to recreate this pattern within a single frame time. But if black and white are represented by shades of grey and sent over a number of frames, this cuts down the amount of data sent within a single frame, and thus reduces the required bandwidth. Over time, all the data is transmitted to recreate the pattern with good resolution.
Motion compensation, an interframe activity, also cuts the amount of data that needs to be transmitted. In fact it is the main technology that allows today's codecs to work efficiently at 128 kbit/s bandwidth. When a participant in a video conference moves slightly, the image essentially stays the same, but is shifted slightly. Motion compensation looks at a block on the screen and compares that block to the previous frame. If the block pattern has changed, the algorithm searches for the new location. When it is found, the codec just transmits the address of the block together with its new location. It has no need to retransmit the pixel information itself.
The CCITT H.261 standard, completed in November 1990, defines the procedures to communicate over multiples of 64 kbit/s, which explains its alternative name of Px64. H.261 provides all the information needed for codecs to decode information that describes the video part of a video conference. It also specifies display formats with particular resolutions. These are known as the quarter common intermediate format (QCIF) and the common intermediate format (CIF). The use of CIF is optional while all codecs are expected to support QCIF. QCIF specifies a resolution of 176 pixels by 144 lines, while CIF specifies a resolution of 352 pixels by 288 lines. QCIF is adequate for desktop applications but leaves much to be desired for videoconferencing on large monitors. With state-of-the-art video- conferencing systems it is now possible to achieve frame refresh rates of 30 frames per second over a bandwidth of 128kbit/s. For example, Compression Labs. uses a proprietary technique known as cosine transform extended(CTX+) to transmit video of 368 by 576 pixel resolution over a digital link bandwidth of 384kbit/s.
In 1993 the CCITT is expected to ratify a standard known as H.233 which describes how video conferencing equipment will exchange encryption keys at the beginning of a session. Further work is being done to allow the inclusion of high-resolution still images in a videoconference. Such work is likely to be included in the H.261 standard. Also very interesting is the possibility of extending the standard to allow codecs to transmit non-video data such as spreadsheet or word processor data during a conference call. The catch to all this standard work is that there is currently no body that provides a seal of approval for codecs that interoperate. This relies on manufacturers running interoperability tests themselves.
It is clear from all the activities discussed that the state of compression technology, and the products that use it, is one of extreme flux. Most areas seem to be adequately covered by standards that are capable of being run either in software or are sufficiently stable to allow semiconductor manufacturers to develop chipsets. Because of the potential size of the markets, and the open nature of the PC, standards need to be in place prior to market expansion. This has now been achieved. Over the next few years we will see many new products based on multimedia applications incorporating high resolution still images and moving video stored on CD-ROM. This will be paralleled in the consumer market with the growth of CD-I technology using TV sets as the display medium.
On the other hand, videoconferencing is moving away from proprietary solutions caused by the need to market products in the absence of standards, or standards that could not be supported with the extant silicon technology. Full support for H.320 will enable videoconferencing manufacturers to remove one of the major blockages that limit market growth: interoperability.
Whatever happens in the near future, there are a few things that are a sure bet: image quality will improve, file sizes will reduce, and bandwidths will drop.