# A VLSI RECEIVE BEAMFORMER FOR DIGITAL # ULTRASOUND IMAGING Mustafa Karaman, Ertuğrul Kolağasıoğlu, and Abdullah Atalar Electrical and Electronics Eng. Dep., Bilkent University Bilkent, Ankara, 06533 Turkey #### ABSTRACT A VLSI architecture for real-time digital receive beamforming in phased array ultrasound imaging is proposed. The architecture is an inverse binary tree like structure with N stages where N is the number of array elements. The sampled signals from the phased array channels are coherently added in a pairwise manner within the stages of the architecture in a pipelined data flow scheme. The storage requirement for the synchronization of the asynchronously received samples corresponding to a focal point is significantly reduced. The architecture is modular and has regular communication scheme which make the VLSI implementation rather easy and straightforward. ## 1. INTRODUCTION In reconstruction of phased array ultrasound sector images, short bursts of ultrasound are transmitted, and echoes reflected from internal structures of body are received by a transducer phased array. Imaging plane is scanned by steering and focusing of the array, called beamforming, both in transmit and receive modes of the array [1]. In transmit mode, beamforming process is performed per scan slice while in receive mode, it is dynamically repeated for every image point. Design of the transmit beamforming circuitry is relatively easy since each transducer can be fired by digital timing circuitry, while design of the receive beamforming circuitry is an involved task and have been the subject of considerable research [2, 3]. Receive beamforming requires coherent addition of signals received by the array elements. Receive beamforming hardware based on analog circuitry is bulky and expensive [4, 1]. Due to recent developments in integrated circuit technology, real-time digital receive beamforming techniques become attractive (Fig. 1) [5, 6, 7, 8, 9]. Figure 1: Symbolic representation of phased array receive beamforming by a) analog and b) digital circuitry. In digital receive beamforming, asynchronously received samples corresponding to a focal point must be synchronized and then added. In a straightforward approach, a "global" coherent summation scheme can be employed by using FIFO type registers at the adder front-end [5]: the samples from all channels are stacked in FIFO registers at each channel for synchronization, and then all of them are added. This scheme is basicly depicted in Fig. 1. However, since the FIFO and adder sizes dramatically increase with the number array of elements, the scheme is not very feasible for implementation at the board level or in VLSI. Alternatively, the samples can be added recursively using partial sum registers [7]. But this technique is not particularly easy due to the requirement of very fast additions. Hence, an efficient receive beamforming architecture should possess: i) reasonably small storage requirement for data synchronization, ii) flexibility in applications with different number of array elements, iii) operating capability at real-time rate, and iv) VLSI feasibility for custom chip(s) implementation. We propose a VLSI receive beamformer architecture which satisfies all these constraints. In this paper, firstly, the receive beamforming architecture is presented. Then, the VLSI implementation of the architecture is outlined. Finally, some possible applications of the designed receive beamformer are discussed. ### 2. ARCHITECTURE An efficient receive beamforming architecture can be obtained by employing a "local" coherent addition technique where the total coherent summation of all samples corresponding to a focal point is obtained by a sequence of pairwise partial coherent summations. For an N channel system, at first, N/2 partial coherent summations are obtained. Then, adjacent pairs of these partial sums are coherently added resulting in N/4 new partial coherent sums. This procedure is repeated until the number of new partial coherent sums becomes unity, which is the total coherent summations of all samples. This approach results in an inverse binary tree like architecture for receive beamformer [9]. For an N element array, the network consists of $log_2(N)$ stages, and k'th stage has $N/2^k$ (k = 1, 2,..., $log_2(N)$ ) processing units (see Fig. 2). Each unit consists of FIFO registers and a full adder for coherent summation of its two input data. FIFO length for each stage is different; since there is no regularity in the arrival times of the signals corresponding to different focal points, the worst case FIFO lengths for the stages are determined by means of the computer simulation of the phased array imaging system with a 90° sector scanning. VLSI implementation of the proposed architecture for N>64 seems to be difficult because of the large chip area and large number of I/O's [9]. Fortunately, using the modularity of the architecture and by choosing the FIFO sizes properly, the architecture can be implemented as a single VLSI chip for a reasonably Figure 2: Receive beamformer architecture. small N, and it can have flexibility to be used in forming networks for larger N's. We have designed a single chip VLSI receive beamformer for N=16. The chip consists of the lowest four stages of the network for N=256, outlined by a dashed rectangle in Fig. 2. Since the lowest four stages have the longest FIFO's, one can connect a number of the designed chips to form a network for $N\leq 256$ . The processing unit takes two input data, A and B, along with two status bits, SA and SB, and generates the coherent summation of the inputs, DO, with a corresponding output status bit, SO (Fig. 3). Each status bit indicates that the data is valid. For a reliable realtime operation, the unit is designed in three pipelined stages: cross switch, FIFO, and adder stages. The cross switch is a finite state machine which feeds the earlier of the inputs, A or B, to the subsequent FIFO. After synchronization by FIFO, the data are fed to the full adder. The unit is reset by an external reset signal (R) at the beginning of operation so that cross switch and FIFO pointers are set appropriately. Two nonoverlapping clocks ( $\phi_1$ and $\phi_2$ ) are used to control the pipelined operations and data flow. 15 processing units are connected in an inverse binary tree structure to form the core of the chip. Figure 3: Functional structure of one processing unit. ## 3. VLSI IMPLEMENTATION The proposed receive beamformer architecture is implemented in 1.5 $\mu m$ CMOS technology using fullcustom VLSI design techniques (Fig. 4). In VLSI design of the chip, magic, hspice, and irsim are used for layout editing, timing and logic simulations, respectively. The testing of the chip is easily accomplished by the functional test technique, since the operations of the units are selectively probed by issuing proper test vectors. Maximum throughput of the chip is 40 Mega-coherent-additions of 16 data, 16 bits each. The chip has 16 multiplexed inputs and one multiplexed output, 16-bits plus one status bit each. Overall result of coherent addition is truncated to 16bits from 20-bits, while the maximum dynamic range requirement is about 12 bits. The chip has a total of 100 pins and about 40 K transistors occupying a core area of $6.7 \times 6.7$ mm<sup>2</sup>. # 4. APPLICATIONS The designed beamformer chip can perform real-time receive beamforming operations for 16 inputs. Also, the chip offers the flexibility in applications with different number of array elements. If the number of array elements is larger than 16, then a number of chips can be used to form the receive beamforming scheme. For $16 \leq N \leq 256$ , the chip count is the Figure 4: Designed VLSI beamformer chip. Figure 5: Connections of the receive beamforming chips for 256 and 128 channels. smallest greater integer of (1 + N/16). For example, the beamforming schemes for N = 128 and N = 256 require 9 and 17 chips, respectively, where the connections of the chips are shown in Fig. 5. The designed chip can be also used in the digital signal processing applications where excessive amount of fast and parallel addition operations are involved. Two examples for such applications are the hardware realization of FIR filters with coefficients in discrete powers-of-two space [10], and hardware realization of estimation algorithms based on the sum of absolute differences [11, 12]. ### 5. CONCLUDING REMARKS A VLSI architecture for real-time digital receive beamformer is proposed, where the sampled signals from the phased array channels are coherently added in a pairwise manner within the stages of the network in a pipelined data flow scheme. The proposed architecture employ all features of an efficient receive beamformer scheme. The architecture is modular and has regular communication scheme which make the VLSI implementation rather easy and straightforward. It significantly reduces the storage requirement for the synchronization compared to the global coherent summation scheme. The reduction in storage requirement for N = 256 using 17 chips is 0.28 (38 Kbits vs 138 Kbits). The designed beamformer can be used in digital front-end hardware of real-time medical ultrasound imaging systems. # ACKNOWLEDGMENTS This work is partially supported by NATO's Scientific Affairs Division in the framework of the Science for Stability Programme, and by Turkish Scientific and Research Council, TUBITAK. The authors would like to thank to C. Aydın, M. Ş. Toygar, İ. A. Baktır, R. Tahboub, E. Erzin, F. Kılıç, and M. H. Asyalı for their cooperation in VLSI implementation of the chip. ## References - T. A. Shoup and J. Hart "Ultrasonic imaging systems," in *Proc. of 1988 IEEE Ultrason. Symp.*, pp. 863-871, 1988. - [2] M. E. Schafer and P. A. Lewin "The influence of front-end hardware on digital ultrasonic imag- - ing," IEEE Trans. Sonics. Ultrason., vol. 31, pp. 295-306, July 1984. - [3] M. O'Donnel "Applications of vlsi circuits to medical imaging," Proc. IEEE, vol. 76, pp. 1106– 1114, September 1988. - [4] G. Manes and et. al. "Synchronous dynamic focusing for ultrasound imaging," IEEE Trans. Ultrason. Ferroelec. Freq. Contr., vol. 35, pp. 14-21, January 1988. - [5] J. P. Stonestorm and W. A. Anderson "Custom nmos chip for medical ultrasound," VLSI Design, pp. 44-49, May 1982. - [6] T. H. Song and S. B. Park "A new digital phased array system for dynamic focusing and steering with reduced sampling rate," *Ultrasonic Imaging*, vol. 12, pp. 1-16, 1990. - [7] R. M. Lutolf, A. Vieli, and S. Basler "Ultrasonic phased-array scanner with digital echo synthesis for doppler echocardiography," *IEEE Trans. Ultrason. Ferroelec. Freq. Contr.*, vol. 36, pp. 494–506, September 1989. - [8] M. O'Donnel and et. al. "Real-time phasedarray imaging using digital beam forming and autonomous channel control," in *Proc. of 1990* IEEE Ultrason. Symp., pp. 1499-1502, 1990. - [9] M. Karaman, A. Atalar, and C. Aykanat "A front-end digital hardware architecture for realtime ultrasound imaging," in E. Arıkan, ed., Communication, Control, and Signal Processing, pp. 1612-1618. Elseiver Science Publishers, Ankara, 1990. - [10] Y. C. Lim and S. R. Parker "Fir filter designed over a discrete powers-of-two coefficient space," *IEEE Trans. Acoust., Speech, Signal Processing*, vol. 31, pp. 583-590, June 1983. - [11] L. N. Bohs and G. E. Trahey "A novel method for angle independent ultrasonic imaging of blood flow and tissue motion," *IEEE Trans. Biomed. Eng.*, vol. 38, pp. 280-286, March 1991. - [12] M. Karaman, A. Atalar, and H. Köymen "Adaptive digital beamforming for real-time phased array ultrasound imaging," in *Proc. of 1991 IEEE Ultrason. Symp.*, 1991.