|
Video codingAcademic contact: Prof Ebroul IzquierdoInvolved people:
SummaryThe research aims of multimedia compression is to develop cutting edge video compression techniques. The coding schema proposed by the group, MMV-SVC, has been found to be one of the best two in terms of compression performance in a worldwide competition organized by MPEG in 2003 to demonstrate evidence on Advanced Scalable Video Coding technology. The development of the complete framework for full granularity Scalable Video Coding (MMV-SVC), the only one of its kind worldwide, originated £500k of industry and EU funding. The research is also focused on multimedia communication over lossy channels that can exhibit wide variability in throughput, delay, and packet loss. Providing acceptable video quality in such environments is a demanding task for both the video encoder/decoder as well as the communication and networking infrastructure. In addition to that the research is also concentrated on surveillance centric coding and multiple description coding Sub-topics:
Scalable video codingAcademic contacts: Prof Ebroul IzquierdoInvolved people:
Figure 1: Scalability functionalities The recent convergence trend of multimedia technology and telecommunications along with the materialization of the Web as strong competitor of conventional distribution networks have generated an acute need for enrichment in modalities and capabilities of the delivery of digital media. Within this new trend a main challenge relates to the production of easy adaptable content capable of optimally fitting into evolving and heterogeneous networks as well as iterative delivery platforms with specific content requirements. Network supported multimedia applications involve many different transmission capabilities including Web based applications, narrowcasting, conventional terrestrial for interactive broadcasting, wireless channels, high definition television for sensitive remote applications, e.g., remote medical diagnosis, etc. These applications are used to deliver content to a wide range of terminals and users surrounded by different environments and acting under totally different circumstances. Conventional video coding systems encode video content using a fix bit-rate tailored to a specific application. As a consequence conventional video coding does not fulfil the basic requirements of new flexible digital media applications. Contrasting this, scalable video coding (SVC) emerges as a new technology able to satisfy the underlying requirements. SVC targets seamless delivery of and access to digital content, enabling optimal, user centred multi-channel and cross-platform media services providing a straightforward solution for universal video delivery to a broad range of applications.
Figure 2: Spatio-temporal decomposition of video Current digital video applications require at least three types of scalability features: quality scalability, spatial resolution scalability, and temporal (frame rate) scalability. These scalabilities can be used individually or as combinations, according to the application. The main tool for achieving a scalable video coding architecture is signal transforms that provide an intrinsic hierarchical multi-resolution organisation of the input signal. Wavelet transforms provide the best solution for this requirement. Since video is a 3D signal, it contains both temporal and spatial redundancies. Wavelet transforms in the lifting form are used in the motion compensated temporal filtering (MCTF) process as the temporal decorrelation technique and the 2D wavelet transforms are used as the spatial decorrelation technique. This 3D transform provides the hierarchical multi-resolution organisation of video data required for scalable video coding (Figure 2). The above process is performed on groups of frames (GOP) of the video sequence. The bitstream is organised according to the GOP number, spatio-temporal sub bands and the significant bit planes. The bit stream can be truncated according to the temporal sub bands, spatial sub bands and significant bit planes in order to achieve temporal, spatial resolution and quality scalabilities in the decoded bit stream, respectively. PublicationsJournals
Surveillance-centric codingAcademic contacts: Prof Ebroul IzquierdoInvolved people:
Figure 3: Process diagram Surveillance Centric Coding (SCC) aims at exploiting specific properties of surveillance video in a comprehensive application framework including coding adaptation to surveillance, rate-distortion optimization according to Video Content Analysis (VCA) and other related concepts. The architecture of a generic SCC system is outlined in Figure 3.
Figure 4: Decoded videos The SCC encoder communicates with VCA modules and performs encoding by rate-optimization according to events as specified by the VCA. The VCA can also be used at the decoder-side for off-line processing, e.g., car plate recognition, face detection, etc. The question in SCC is how to exploit the information resulting from the VCA to optimize the coding and transmission or streaming for surveillance scenarios. A possible approach is to encode video segments containing an event to a high quality / spatio-temporal resolution. Other video segments are encoder at low quality / spatio-temporal resolution. This approach is depicted in Figure 4.
Figure 5: Efficient event-based encoding Such functionality can easily be achieved by using a scalable video encoder, where the output bits-stream is adapted according to the significance of the video segment indicated by VCA. However, such event-based adaptation exploits the sparsity of events only in temporal domain. Events are usually sparse in spatial domain as well. In other word an event may occupy only small portion of a video frame. This observation leads to another possible approach: object-based coding. Such an approach is illustrated in Figure 5. Here, only foreground objects are encoded, while background is not coded at all or it is coded at very low frequency. PublicationsJournals
Conferences
TranscodingAcademic contacts: Prof Ebroul IzquierdoInvolved people:
Figure 6: Transcoding diagram Scalable Video Coding (SVC) provides a low complexity video adaptation to transmission requirements. Wavelet transform is often used as a main tool to achieve scalability, due to its multi-resolution content representation property. It is used in MMV-SVC, along with motion compensation temporal filtering (MCTF), to achieve a full scalable video codec. To better integrate MMV-SVC with other systems, that use hybrid DPCM/DCT video codecs, we are developing efficient transcoders from H.264/AVC to MMV-SVC and vice-versa. In the MMV-SVC to H.264/AVC transcoder, the complexity is largely reduced by re-using motion information provided from the decoded H.264/AVC bitstream. First, motion vectors are approximated from already available H.264/AVC motion vectors, then refinement of the approximated motion vectors is performed and finally the best block sizes in rate-distortion sense are chosen to be used in motion compensation. Error Resilience and Concealment in Video StreamsAcademic contacts: Prof Ebroul IzquierdoInvolved people:
Figure 7: Example of video streaming over heterogeneous networks using error-robust techniques When streaming compressed video over unreliable channels such that introduces errors and losses, the robustness of the compressed stream to errors is an important issue since errors can render decode video useless. Robustness is achieved through two different but complementary techniques: error resilience and error concealment. Error resilience techniques enable the compressed bit-stream to better resist channel errors so that the impact on the reconstructed image quality is minimal. However, some part of the information is inevitably lost and the goal of error concealment at the receiving side is to estimate the losses and conceal them in the displayed video. The first part of the research considers error resilient scalable video coding using joint source-channel coding. The goal of the research is optimal allocation of the available bit rate budget between compressed video bit-stream and the protection bit-stream in a way it maximises the expected decoded image quality.
The second part of the research aims to conceal the effects of mobile channel errors within a compressed video bit-stream. In a communication system, highly compressed video streams are sensitive to errors. Mobile channels have high error rates compared with wired networks. Since the demand in live-broadcasting over a mobile network has rapidly increased over the years, the effects of the 3G mobile network (W-CDMA) losses on the MPEG4 bit-stream are studied. The corrupted video stream is categorised based on the position of errors. Current research focuses on error concealment using motion vectors. Concealment Of Errors Introduced Into MPEG-4 Video Stream By Wireless Channel
Figure 11: Concealment of errors with the proposed technique The spatial error concealment technique reconstructs corrupted macroblock based on its correlation in the spatial domain. DC coefficient is estimated using its surrounding DC coefficients in combination with dynamic weight allocation. While high-rank AC coefficients are replaced using fixed relationship of DC coefficients in neighbour blocks.
Figure 12: Concealment of errors on the foreman sequence Following spatial concealment, we are proposing concealment technique using temporal information in a combination with spatial information for motion vector concealment. Our initial studies show that only spatial motion vector information alone is not sufficient to produce good prediction. Therefore, confidence measures which include spatial and temporal confidence measure are introduced. PublicationsConferences
Multiple description coding of videos transmitted over noisy channelsAcademic contacts: Prof Ebroul IzquierdoInvolved people:
Figure 13: Multi-network data transmission The problem of storage is not the only problem linked with the explosion of digital video. Transmission over many different networks without losing the quality of data is a huge challenge. Today’s networks achieve high transmission speeds and support data rates sufficient for video applications, even in mobile communications for example.
Figure 14: Multiple decription coding Video transmission over noisy channels is thus a tough question in digital communications. The problem of efficient video transmission involves good compression rates and effectiveness in presence of channel failures. Joint source-channel coding (JSCC) has received an increasing interest in the research community. In particular, multiple description coding (MDC) has already shown good results as error-resilient joint-source coding. Two main families of MD coding schemes exist, depending on whether the redundancy is introduced during or before the quantization. In the first class of approaches, the quantizers are designed to produce two redundant descriptions of the same signal. In the second class, the redundancy is introduced during signal transformation. The original signal may be reconstructed as soon as one of the descriptions is available. The availability of the second description allows to increase the quality of the reconstructed signal. MD coding has been recently applied to video coding.
Figure 15: Results of multiple decription coding The illustration in Figure 15 shows results for multiple description video coding. The MDC scheme used here is balanced and wavelet-based. It belongs to the second class of approaches: redundancy is introduced before quantization of the signal. The main goal is the optimal decoding of the signal after transmission over noisy channels. Indeed, the challenge at the decoder side is to reconstruct a signal with a distortion that is as small as possible. Maximum a posteriori estimations of the original source from the knowledge of the side descriptions corrupted by transmission errors have been realized. A priori information is represented by a model describing the distribution of the wavelet coefficients of the frames. Simulation results show a good robustness of the proposed decoding scheme against transmission errors, with an improvement of 2 dB in PSNR compared to a maximum likelihood technique. |
||||||||||||||||||||||||||||||||||||||||||||