Hardware implementation of H.264/AVC and H.265/HEVC video codecs

Research description

This research aims at designing and prototyping a multi-view low-delay video coding system. Its practical effect will be an application running on a hardware platform demonstrating real-time multi-view video coding. The digital part of the system is integrated in a single FPGA programmable circuit. Hardware acceleration and new coder architectures will improve video compression compared to existing solutions.
Hardware implementations designed during this research finally should possess the following features:
  1. Low delay.
  2. High throughput: real-time processing of high resolution and multi-view video demands massively parallel architectures.
  3. A compression system implemented on a single FPGA chip according to SoC methodology.
  4. Advanced choice of coding mode based on human visual system quality measures.
  5. Multi-view bitrate control: designed architectures should adapt to variable video content and channel capacity using statistical multiplexing.
  6. Adaptive motion and disparity estimation allowing for random generation of search points.

HDL Design:
The H.264/AVC standard allows for a high compression efficiency at the cost of computational complexity. To achieve the efficiency as high as possible, the designed architecture supports the mode selection based on the rate-distortion optimization. In particular, the dataflow assumes throughput of 32 samples/coefficient per clock cycle (in the main/reconstruction loop), on average, allowing a lot of compression options to be checked. Moreover, the architecture supports all transform sizes specified for High Profile using the same hardware resources. The binary coder conforms to H.264/AVC High Profile and supports two binary coding modes: Context Adaptive Binary Arithmetic Coding (CABAC) and Context Adaptive Variable Length Coding (CAVLC). Frame/Field input video can be compressed using 4:2:2 or 4:2:0 formats. The architecture saves a considerable amount of hardware resources since two coding modes share the same logic and storage elements. Five versions of the arithmetic coding path are developed to study the area/performance trade-off related to parallel symbol encoding. The implementation results show that the parallel symbol encoding allows higher efficiency. Synthesis results show that the whole design can work at 100 MHz for FPGA Stratix II and Aria II devices. This frequency allows 1080p at 30 fps. Additionally, hardware AAC codec and TS (de)multiplexer are developed to support the audiovisual communication for low delay.

Development Platform:
The video/audio (de)coding and streaming is developed in a set of PCBs stacked on one another as a "sandwich". They communicate via EPI, I2C and I2S buses. The coder/decoder modules for Audio MPEG4-AAC and video H.264/AVC are implemented in FPGA chip together with interfaces for the above-mentioned external buses and BT656 (alternatively other standards for HD) . Fig. 8 shows the schematics of the development kit. It consists of three boards: P2 - FPGA, P1 - ARM uC and converters, P0 - I/O connectors. The audio codec usesADAU1361 chip, the video coder uses ADV7403 and video decoder - ADV7393. All these chips are controlled via I2C from LM3S9B95 microcontroller. The FPGA board is connected to the others with 100-pin connectors. It contains also external memory: 8 DDR2 chips, 2 SRAMs and 1 FLASH.