# DESIGN A MODULE OF $256 \times 16$ NON-VOLATILE RAM FOR VECTORIZATION WITH CHIP AREA & WIRE LENGTH MINIMIZATION

<sup>1</sup>Raj Kumar Mistri, 2B.S.Munda

<sup>1,2</sup> Electronics & Communication, National Institute of Technology, Jamshedpur

**ABSTRACT:** - This paper describes a design methodology of a  $256 \times 16$  RAM using VHDL to ease the description, verification, simulation and hardware realization. The a  $256 \times 16$  RAM has 16-bit data length. This can read and write 16-bit data. Vectorizing involves parallel access to data elements from a random access memory (RAM). However, a single memory module of conventional design can access no more than one word during each cycle of the memory clock. In this paper, a new memory organization is proposed, in which words can be formed row-wise, column-wise or diagonally at the control of an external input. The behavioral and structural representation of this design has been defined.

Index Terms: VHDL, XILLINX, RAM, C-RAM, M-RAM, TWB, SBO

## 1. INTRODUCTION

The continuing research and development (R&D) effort directed toward VLSI memory technology has led to memory LSIs with lower cost, smaller size, higher speed, and more ease of use, giving system designers invaluable benefits. In the design of memory systems for specific applications, it is important to be able to analyze in prior which data parallel access capabilities are necessary to complete the computations within a cycle budget. This information can be used to determine the minimum bandwidth that is required for the memory architecture. This problem is referred to as the Storage Bandwidth Optimization (SBO) problem. Methods to handle the SBO problem have been addressed in [1][2]. In [1], a conflict graph is derived, and the effect of the changes in the conflict graph on the memory configuration (number of modules, number of ports) is described. A memory architecture that satisfies all the constraints in this graph has to be then determined. The method in [2] uses a search procedure based on area and power costs to achieve this. Recently, a procedure for memory bank customization and assignment that considers area and delay costs has been presented in [3].

#### 2. BEHAVIOURAL DESCRIPTION OF RAM

Conventional memories of size N x N consist of N words, each of them consisting of N bits. To address this memory we need log2N bits to be fed to the address decoder. This type of memory requires N address lines and N data lines. Our vector memory of the same size consists of N horizontal words, N vertical words, and 2 diagonal words. To address it we need log2N address bits plus 2 tag bits to be fed to the address decoder. It requires 2N + 2 address lines, and 2N data lines[4].

#### 3. STRUCTURAL DESCRIPTION OF RAM

In this section, we present a 256 x 16 memory system for the proposed row, column, and diagonals access. As concluded in the previous section,  $\log 2N$  address bits plus 2 tag bits are required to drive 2N + 2 address lines in our memory system for vectorizing. The  $\log 2N$  bits indicate the word number, while the two tag bits indicate the way this word is formed.

#### 4. RAM DESIGN METHODOLOGY

We are already familiar with the concept of a one bit memory. A single D-type flip flop is a one bit memory with which we can associate a unique address by using a decoder. If a decoder detects the unique binary address of its one bit memory cell on the address lines it will enables the cell. The two AND gates determine whether data is read or written. If the Read input is 1 the clock pulse is suppressed and the Q value is placed on the output data line. If the Read input is 0 the flip flop receives a clock pulse and is loaded from the data in line. Notice the asymmetry in the circuit. For reading it is merely a combinational circuit, but for writing the address and data must be present and correct when the clock pulse sets the flip flop. RAM circuits conforming to this pattern are called static RAMs, and are used in special applications [5]

#### 4.1 Conventional RAM

conventional RAM is just a 1D RAM structure. Block diagram of conventional ram is given below



Figure 1: Block diagram of conventional 256x16 RAM

#### 4.2 Modified RAM

In modified 2D RAM systems, each address line passes through a row of cells in the memory array, so that only cells of the same row can be accessed simultaneously. That is, memory cells are grouped together to form rows. However, since we are interested in accessing simultaneously not only cells of a row but also of a column or a diagonal, we need a way to also group together cells of columns and diagonals. This can be easily achieved by having extra address lines.[4]. The block diagram of modified RAM is given below



Figure 2: Block diagram of modified 256x16 RAM

## 4.3 Conventional 256x16 RAM (C-RAM) Vs Modified 256x16 RAM (M-RAM)

The basic difference between conventional non-virtual 256x16 RAM and modified non-virtual 256x16 RAM is that there is concept of row memory select & column memory select in modified non-virtual 256x16 RAM i.e 2-D memory selection occurs, however in conventional non-virtual 256x16 RAM 1-D memory selection occurs.

|  | Table.1: Logic | Components | used in | C-RAM | VS | M-RAM |
|--|----------------|------------|---------|-------|----|-------|
|--|----------------|------------|---------|-------|----|-------|

| Logic       | Conventional 256x16 |     | Modified | 256x16 |
|-------------|---------------------|-----|----------|--------|
| components  | RAM                 |     | RAM      |        |
| used        |                     |     |          |        |
| Memory cell | Conventional        | 409 | Modified | 4096   |
|             | memory cell         | 6   | memory   |        |
|             |                     |     | cell     |        |
| Decoder     | 8X256 decoder       | 1   | 4x16     | 2      |
|             |                     |     | decoder  |        |
| OR gate     | 256 input OR        | 1   |          |        |
|             | gate                |     |          |        |
| Register    |                     |     | 4-bit    | 2      |
|             |                     | -   | register |        |
|             |                     |     |          |        |

#### 5. SIMULATION RESULT

# 5.1 VHDL Test Bench Waveform of Conventional 256x16 RAM



e 3: TBW of conventional 256x16 RAM from 0ns to 90ns



4: TBW of conventional 256x16 RAM from 90ns to 180ns



5: TBW of conventional256x16 RAM from 180ns to 270ns

# 5.2 Test Bench Waveform of Modified 256x16 RAM



**Figure** 6: test bench waveform of modified RAM from 0ns to 700ns

**Figure** 7: test bench waveform of modified RAM from 700ns to 1800ns



**Figure** 8: test bench waveform of modified RAM from 1800ns to 2600ns

#### 6. EXPERIMENTAL RESULT

#### 6.1 Cell Usage

Cell usage indicates the transistor count in the design. Cell usage for both RAMs are given under

Table.2: cell usage

|           | CONVENTIONAL | MODIFIED |  |
|-----------|--------------|----------|--|
|           | RAM          | RAM      |  |
| BELS      | 58218        | 47702    |  |
| AND2      | 28780        | 12658    |  |
| AND3      | 16           | 18       |  |
| AND4      | 4236         | 372      |  |
| AND7      | 128          | 80       |  |
| AND8      | 400          | 406      |  |
| INV       | 20514        | 21440    |  |
| OR2       | 4128         | 4128     |  |
| OR4       | 16           | 16       |  |
| FF/TACHES | 4104         | 4096     |  |
| LD        | 4104         | 4096     |  |
| IBUF      | 24           | 26       |  |
| OBUF      | 16           | 16       |  |

#### 6.2 Delay Table

It is the actually pad to pad delay during read\_write operation. pad to pad delay is one of the important factor for memory design. It indicate the time to propgate from (i) input to memory, (ii) memory to output & finally (iii) input to output. The minimization pad to pad delay confirms the wire length minimization.

Table for pad to pad delay for both RAMs are given below

| Table.3: | pad | to | pad | delay |
|----------|-----|----|-----|-------|
|----------|-----|----|-----|-------|

|                 | source<br>pad | destination<br>pad | delay | operation |
|-----------------|---------------|--------------------|-------|-----------|
| conventional    | input         | memory             | 74.26 | write     |
| RAM             | memory        | output             | 74.26 | read      |
|                 | input         | memory             | 68.66 | write     |
| modified<br>RAM | memory        | output             | 68.66 | read      |

# 7. CONCLUSION

The cell usage in conventional & modified RAM is differentiated by a bar graph which is presented below



This bar graph indicates the component used in conventional as well as modified RAM. Here almost all section component used in modified RAM in lesser than that of conventional, which indicate the all over area minimization in chip of modified RAM.

Form the delay table of experimental result, in conventional RAM pad to pad delay during R/W operation is 74.26ns,

however in modified RAM pad to pad delay is 68.66ns, which indicates wire length minimization in modified RAM.

# 8. REFERENCES

[1] S. Wuytack, F. Catthoor, G. de Jong, and Hugo De Man, "Minimizing the Required Memory Bandwidth in VLSI System Realizations", IEEE Transactions on VLSI Systems, Vol. 7, No. 4, Dec. 1999.

[2] F. Catthoor, S. Wuytack, E De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle, "Custom Memory Management Methodology - Exploration of Memory Organization for Embedded Multimedia System Design, Nonvell, MA: Kluwer, 1998.

[3] P. R. Panda, "Memory Bank Customization and Assignment in Behavioral Synthesis", IEEEIACM International Conference on Computer Aid Design, 1999

[4] Neeraj K. Sharma & Anastasia Iconomidou, "Memory Design for Vectorization", ICECS'96, pages 812-814

[5]www.doc.ic.ac.uk/~dfg/hardware/HardwareLecture16.pd f