FACTA UNIVERSITATIS (NIŠ) Series: Electronics and Energetics vol.13, No.1 August 2000, 245–256

# AN APPROACH IN FAST IC DEVELOPMENT FOR DIGITAL VIDEO PROCESSING BASED ON FPGA-S

## Nikola Teslić, Vladimir, Kovačević and Miodrag Temerinac

**Abstract.** This paper presents environment for fast development of integrated circuits for digital video processing based on FPGAs (Field Programmable Array Circuits). The environment which is described provides fully digital signal processing of a television picture in real-time. Principles are described through the example of an integrated circuit for a scan rate conversion design.

Key words: FPGA, Digital video processing, Real time, IC design

## 1. Introduction

A new approach is introduced in an integrated circuit design for digital video processing. In the past, there was a principle that an IC design starts after C simulation (without a real time test). The new approach is based on new large FPGAs that enable implementation of a complex algorithm with a high clock frequency. Hardware Description Languages Verilog and VHDL [20], [21], [22] are used as a tool for the design. The output in this approach is an HDL circuit description. In the earlier design methodologies this belonged to an IC design stage. Both design methodologies basic principles are presented in Fig.1. As seen in Fig. 1. real time test can be done after the whole IC design in the first approach. From the IC design there

Manuscript received December 21, 1999. A version of this paper was presented at the fourth IEEE Conference on Telecommunication in Modern Satellite, Cables and Broad-casting Services, TELSIKS'99, October 1999, Niš, Serbia.

N. Teslić and V. Kovačević are with Department of Computer Engineering on Faculty of Technical Science, 21000 Novi Sad, Fruskogorska 11 Yugoslavia (e-mails: teslic@krt.tmd.ns.ac.yu and rt\_kovac@uns.ns.ac.yu). M. Temerinac is with Micronas Intermetall, 79108 Freiburg Hans-Bunte-Strasse 19 Germany (e-mail: temerina@intermetall.de).

<sup>245</sup> 

is a feedback to all parts in the design from the real time. In a C simulation, many situations in the real environment can be tested. However, this process is time consuming and unfortunately, the possibility to predict all potential test combinations is very small. Based on modern FPGAs and the appropriate tools, the design of a complex IC and all real time tests can be performed on a model in FPGA.



Fig. 1. Difference in design methodologies

From this stage in an IC design, there is a feedback to the design specification. This phase also gives direction what additional tests have to be done in the C. As the output from this phase, there is an IC HDL description which is an input to the IC design on the gate level. Based on this approach, a fast error detection is possible. Thanks to that, we can quickly change an HDL code, recompile and download the new design version into FPGA. This approach speeds up the process of the IC design and lowers its costs (the number of IC redesigns is reduced). Tools that provide direct translation of the HDL code into the IC layer mask exist on the market. This can also be very interesting for customers.

Most commercially-available FPGA devices feature architectures wellsuited for implementing the logic functions needed for bus interfaces and similar control logic, such as state machines, decoders, counters, and multiplexers. The latest generations of high-density FPGAs, such as the Xilinx XC4000 [25] or Altera FLEX10K series, allow memory, as well as logic

functions to be integrated within an FPGA device. This on-chip memory capability facilitates the implementation of common bus interface functions such as configuration register files and FIFO buffers. Video systems are based on architectures that employ multiple heterogeneous processing subsystems interconnected and controlled using a custom data path and control logic. This custom logic is efficiently implemented in FPGAs.

A similar approach in testing an video algorithm in FPGA was proposed by Aitzola et al. [27], with the basic difference concerning the splitting of the processing in two or more FPGAs and main objective which was to check the algorithm in real time. Andreson et al. [26], use FPGA arrays for real time image segmentation algorithm testing. Basic difference between previously mentioned and the proposed approach is in the restriction of the two above exclusively on algorithm proving, while in the new approach the FPGA implementation is one step on way to the final IC.



Fig. 2. 10K100ARC240-2 target device

## 2. Function of the Developed IC

The main purpose of this project is the development of an integrated circuit for a high quality TV video signal, which is based on an FPGA (Field Programming Gate Array). This Integrated Circuit should provide the following possibilities: an improvement of the dynamic character of a video signal (SRC - Scan Rate Conversion) [1], [7], [8], a noise reduction feature and a capability of vertical expansion (ZOOM). Besides that, the important capability is the frame rate doubling, from 50Hz up to 100Hz [12], [13]. The requirement is that Verilog HDL (Hardware Description Language) [18], [19] is used for the development. Additional requirement is the use of ALTERA FLEX 10K100ARC240-2 as a target project device (Fig. 2.). ALTERA

#### 248 Facta Univ. ser.: Elect. and Energ., vol.13, No.1, August 2000

MAX+plus II version 8.3 is used as a project's development environment.

Functions implemented in this integrated circuit provide digital processing. Therefore, the developed integrated circuit is a circuit for a real-time digital video processing. The input digital video signal is YUV4:1:1 video format and its rate is 50 fields per second (50 half-pictures). The quality of output signal is upgraded. Its rate is 100 fields per second (100 half-pictures per second). The quality improvement is accomplished by the following functional blocks: Noise reduction (NR) - a filter, based on a picture singularities elimination. It is based on a motion detection. If fast motion is detected in the picture, the previous picture content is forwarded out. Missing pictures interpolation. The major function of this integrated circuit is frame rate doubling, which is used for the picture displaying on a monitor. It is necessary to produce a double number of received pictures, which will be displayed on a monitor. The interpolation results from this problem definition. If the mathematical model of the system is analysed, it is obvious that when the concept of picture transfer with two fields, one with odd lines and the other with even lines is used, the Niquist criteria is not satisfied. In other words, the sampling rate theorem is not fulfilled. In the case when interlacing is used, the bandwidth of a video signal is greater than the sampling frequency.



Fig. 3. Development test bench

From the above conclusion it is obvious that it is not possible to create an ideal inverse transformation, or to simplify, it is not possible to generate

missing pixels and create the whole picture. Until now, there are several proposed techniques of interpolation which can be partitioned into two classes. The first class encloses techniques based on spatial, temporal or temporalspatial interpolation without a motion detection (these solutions require very complex hardware structures). The algorithm used in this solution, is based on a combination of three tap median filter and a lowpass filter. The median filter is a spatial-temporal filter that uses two pixels from one picture and one pixel from the previous picture. A lowpass filter, uses two neighboring pixels from only one picture. These two solutions are not combined as a nonlinear system. In other words, the multiplexer is not used. Instead, linear method is applied, which is also used to eliminate singular errors. *Vertical expansion* of output picture. One of requirements is a capability of vertical expansion. It means that when the video signal is in movie mode (large black areas are present in a picture above and below the useful picture content), a possibility to expand the output picture to the whole screen must exist.

### 3. The Basic Principles and Test Environment

The basic principles used in this project methodology are illustrated in Fig. 3. In front of digital signal processing the IC for video signal digitalization must exist. It provides data and clock signals to FPGA where the whole processing is realized. At the chain end there is the IC for converting digital signal back to analog domain which goes directly to the tube. The development test bench consists of three parts: board with VPC 3215C chip (FRONT END), board with ALTERA FLEX 10K100ARC240-2 chip, board with DDP 3310B chip (BACK END). The boards are connected to the system for the receiving picture, its A/D conversions, digital picture processing, D/A conversion and transmitting picture to the TV receiver. Each integrated circuit can be configured through I2C interface.

#### 3.1 Digital video processing requirements

The following features characterize digital video processing

- High data rate (27 MHz)
- Constant data flow without possibilities of storing data for delayed processing
- Large amount of data for storage (for buffering on frame or field)

The format of a digital video signal is given in Fig. 4. Vertical synchronization (VS) signal marks field beginning: its frequency is the field



Fig. 4. Timing of typical signals used in digital video processing

frequency (in the case of a television signal at 50 Hz, its cycle is 20 ms, while for a television signal at 100 Hz, its cycle is 10 ms). Active video data (AV) signal signifies valid data on data lines (Y and C). This signal is line organized. Video data are transmitted from the left to the right and from up to down. Signal INTERLC determines which field is present (field A for even lines and field B for odd lines).

# **3.2** Typical digital video processing elements and the FPGA Implementation

Typical applications in digital video processing are various filterings. There are three typical kinds of filtering [2], [3], [4], [5], [6]: Spatial, Temporal and Spatial-Temporal.

**Spatial video signal filtering** is based on getting pixels around the position currently processed and passing them through some mathematical functions. This is illustrated in Fig. 5a.

Implementation of this processing requests the following resources inside the FPGA: line memories used for the line delay and registers for the taps. This is illustrated in Fig. 5. If the window height is H, H-1 line memories are





required. One of the basic requirements for the line memory is that it has to provide simultaneous read and write cycles for individual memory locations. This is illustrated in Fig. 6.



Fig. 6. Line memory block diagram and input/output

The existence of only one clock CLK makes solving the problem more difficult, i.e. the FPGA cannot operate at 2xCLK (in VFBX and SPRG applications it is 54 MHz). This is resolved by using a structure with twice longer memory word length (for luminance signal filtering Y[7:0] -8 bits, memory width is 16 bits). In one cycle two 8 bits locations are read, and in the next cycle the same two locations are written.

Output data from the line memories LM0 and LM1 have to be aligned so that it may be passed through the registers, as shown in Fig. 5b. From Facta Univ. ser.: Elect. and Energ., vol.13, No.1, August 2000

this point all necessary data for filtering are available.

**Temporal filtering** is more complex regarding the fact that data from the current field, but also data from temporal preceding fields, figure in the filter function. Temporal filtering is illustrated in Fig. 7. Filtering window is placed among fields.



Fig. 7. Temporal filter structure

The implementation requires memories to provide fields delay. Since field storage requires 2488320 bits, which is much more than available in today's commercially present FPGAs, memories are implemented as external ones. The field memory implementation is shown in Fig. 10.





**Spatial-Temporal filters** are the combinations of temporal and spatial filters, while the filtering window is three-dimensional, as illustrated in Fig. 8. The resources required for the spatial-temporal filter synthesis are determined by parameters D, W and H. Parameter D defines the number of required field buffers. The number of line memories is defined by the number of fields (D) and the number of lines (H) figuring in the filter. The number of registers for spatial-temporal filter is defined by NoReg=DxHx(W-1). Resources necessary for the spatial-temporal filter implementation are illustrated in Fig. 10.

## 4. Test Procedures

An input for the design phase is the algorithm described in one of the high level programming languages (in our case C++) [23], [24] or the mathematical description of the algorithm. After the integrated circuit description in HDL is completed, testing procedures are launched.

Functional simulation (first phase), checks whether the mathematical model implemented in HDL is equivalent to the target model (in C++ or in equations form). Dedicated procedures which provide test files creation were developed in this phase. In the test vector files, input and output signals for blocks implemented inside the IC are defined. One of the major problems, in those procedures, is the delay caused by the pipelined structure. This delay is not present in the software algorithm description. This phase cleans errors concerning the implementation of arithmetic in the hardware, types of used data and synchronization errors due to the pipelining structure. Slow simulator reduces the amount of data that can be simulated in a reasonable time to only few first lines.



Fig. 9. The percentage of error detection in each phase

#### Facta Univ. ser.: Elect. and Energ., vol.13, No.1, August 2000

The second phase is time simulation. Based on information on the implementation inside the array (Placing & Routing), the FPGA software forms internodes delaying data. Based on this information it is possible to simulate real integrated circuits (its FPGA implementation). In this phase, delays between critical nodes are tested i.e., whether the pipeline is correctly dimensioned. Here it is checked if processing in pipelining stages satisfies setup and hold time according to predefined working frequency.

The third phase is emulation. Due to inefficiency of the simulator, it was totally unacceptable to simulate processing of one whole field (not even think of field sequence). The integrated circuit environment emulator of the designed IC was made to resolve this problem. One side of the emulator is connected to Personal Computer through the LPT port, which is used as the control block, and the other side is connected to all input and output tested circuit lines. The emulator controlled through the PC enables easy creation of control and data signals which drive the FPGA. Comparing the output from the FPGA and the output from the software discovered the errors. This test environment provides fast checking on long test sequences (composed of few fields). During tests, edge errors are discovered on the emulator, i.e. errors between two fields.

The fourth phase is real time testing. For tests in real time environment MICRONAS INTERMETALL has developed an IMAS platform.

The board for video signal digitalization based on VPC [15] IC followed by FPGA ALTERA 10K100 [17] is on the input and the board for digital video stream conversion into VGA signal is at the end. This board is based on DDPB [16] IC. IMAS platform is shown in Fig. 3.

Operation of the integrated circuit in real time conditions is tested in this phase. Only errors on the interface and electrical adaptation are detected. As a result, real time test environment that enables performing of algorithm tests on real input sequences is formed. The statistics of error detection in each phase is shown in Fig. 9.

## 5. The Advantages of a new IC Design Fashion

The old fashion design strategy of integrated circuits for digital video processing is based on off-line software tests. If they satisfy criteria laid down in front of them, the integrated circuit design starts. After manufacturing an IC, it was possible to approach to the real time test (Fig. 1. upper branch).

Based on the FPGA concept it is possible to have real time test envi-

ronment in a much shorter time and detect a number of algorithmic errors (Fig. 1. lower branch).

This approach has the following advantages:

**Time to real time tests** - HDL code generation, compilation and downloading is much faster than the IC-design, thereby time to real time tests is much shorter in the FPGA approach. Changes inside the FPGA are also much faster than IC redesign.

**Fast feedback to algorithm definition** - system designers receive feedback information from the real time test faster than in the earlier concept.

The job is not doubled - output from this phase is an integrated circuit HDL description with all synchronization problems solved. This code is immediate input to gate level design.

**Price performance** - the FPGA based solution can be realized by a smaller expert team (design of a complex IC 100000 gates in the FPGA requires a team of 5 engineers, but for an IC design, a team of 100 engineers and other staff is needed).

**Time to market** - Nowadays, the FPGA manufacturers offer service of writing RAM mask into ROM, providing fast and cheap ASIC design. This concept reduces time and price from the idea to the moment when the integrated circuit is available on the market.

#### REFERENCES

- 1. DEYU QIAN MICRONAS INTERMETALL FREIBURG (GERMANY): European Patent for scan rate conversion. 1997.
- 2. MILAN TOPALOVIC, BRANISLAV NASTIC: Televizija prva knjiga Televizijski sistemi i osnovi višedimenzionalne digitalne obrade signala. Beograd 1992.
- 3. MILAN TOPALOVIC: Televizija druga knjiga, Višedimenzionalna digitalna obrada video signala. Beograd 1993.
- 4. G. DE HAAN: Motion estimation and compensation, an integrated approach to consumer display field rate conversion., Philips Electronics N.V. Eindhoven (Holand) 1992.
- RAFAEL C. GONZALES, RICHARD E. WOODS: Digital Image Processing. Addison-Wesley Publishing Company Inc., September 1993.
- 6. MARTYN J. RILEY, IAIN E. G. RICHARDSON: Digital Video Communications., Artech House Inc. Boston (USA), 1996.
- 7. FLICKER-FREE TELEVISION: Featurebox 88., Siemens, Germany.

- Facta Univ. ser.: Elect. and Energ., vol.13, No.1, August 2000
  - 8. CHRISTIAN HENTSCHEL: Fernsehen mit erhöhter Bildqualität, Flimmerreduktion durch erhöhte Vertikalfrequentz im Empfänger. Berlin (Germany) 1989.
  - 9. H. BLUME, M. LUECK: Bildformatkonversion für multimedia-Displays-Anwedungen, Displayeigenschaften, Konversionsverfahren. Universität Dortumnd.
- 10. M. LUECK, H. BLUME: Konversiontechnicken für die zeitsequentielle stereoskopische Bildwiedergabe., Universität Dortumnd.
- 11. M. LUECK: Zwischenbildinterpolation für die Echtzeit-Stereobildverarbeitung. Universität Dortumnd.
- 12. WWW PHILIPS SITE: http://www.sv.philips.com/newtech-dnrtech\_right.html.
- 13. WWW PHILIPS SITE: http://www.sv.philips.com/newtech/100hztech\_right.html.
- 14. VLADIMIR KOVACEVIC: Logicko projektovanje racunarskih sistema. Novi Sad 1993.
- 15. PRELIMINARY DATA SHEET, VPC 3205C, VPC 3215C VIDEO PROCESSOR FAMILY: Micronas Intermetall. Freiburg (Germany), August 1997.
- 16. PRELIMINARY DATA SHEET, DDP 3310B DISPLAY AND DEFLECTION PROCESSOR: Micronas Intermetall. Freiburg (Germany), May 1998.
- 17. DATA SHEET, ALTERA FLEX 10K EMBEDED PROGRAMMABLE LOGIC FAMILY: Altera Corporation. San Jose (California, USA), May 1998.
- ALTERA MAX+PLUS II VERILOG HDL VERSION 8.2: Altera Corporation. San Jose (California, USA) January 1998.
- 19. ALTERA MAX+PLUS II VHDL VERSION 7.1: Altera Corporation. San Jose (California, USA) December 1996.
- 20. KEVIN SKAHILL: VHDL for Programmable Logic. Addison-Wesley Publishing Company Inc., Mentlo Park (California USA) 1996.
- 21. BEN COHEN: VHDL Answers to frequently Asked Questions. Kluwer Academic Publishers 1997.
- 22. DOUGLAS L. PERRY: VHDL Second Edition. McGraw-Hill, Inc., 1993.
- 23. BRIAN W. KERNIGHAN, DENNIS M. RITCHIE: Programski jezik C. Savremena administracija, Beograd 1992.
- 24. CHRIS H. PAPPAS, WILLIAM H. MURRAY: C/C++ Vodic za programere. Mikro Knjiga, Beograd.
- 25. BRADLY K. FAWCETT: FPGA Applications in Digital Video Systems. www.xilinx.com.
- SCOTT B. ANDERSON, PHILIP P. DANG, PAUL M. CHAU: Configurable Hardware for Image Segmentation. Department of Electrical & Computer Engineering, University of California, San Diego.
- 27. AITZOL ZULOAGA IZAGUIRRE, JOSE LUIS MARTIN GONZALEZ, LUIS ANTONIO LOPEZ NOZAL: Hardware Architectures for Motion Determining from Image Sequences. Escuela Tecnica Superior de Ingenieros.