# FERMI - A digital Front End Readout Micro-system for Calorimetric detectors at LHC

M. Hansen, CERN, Geneva, Switzerland (email: magnus.hansen@cern.ch)

#### Abstract

The activities within the FERMI Collaboration were completed by the successful assembly of multi-chip module demonstrators fulfilling the technical requirements of the LHC generation of high performance detectors like CMS ECAL. Throughout the project, the Collaboration has been concentrating on providing architectures that fulfil the requirements set by the interested experiments in the domain of calorimeter readout. Currently, readout system architectures based on experience acquired through FERMI are being implemented in CMS and the ATLAS Tiles calorimeter.

The final status of the project is presented, together with a brief description of some ASICs developed by the FERMI collaboration.

### **1. INTRODUCTION**

A calorimeter at LHC spans a dynamic range of in the order of 1 to 10<sup>5</sup>. This corresponds to at least 16 bits in digital representation. The resolution requirement is in the order of 10 to 12 bits for the electromagnetic calorimeters, and slightly less for the hadronic ditos. The LHC accelerator has an interaction rate of close to 40 MHz. In order to give the largest possible probability to detect, and later to confirm the detection of the very rare events searched for at LHC, the readout system has to be dead time free. The space inside and around the experiment is very limited, thus demanding a very high degree of integration.

The FERMI solution is to use MCM-D technology in order to cope with these requirements.

The initial concept was directly taken from the requirements stated above. These are coming from assumptions in the ECFA studies (1990) [1]

#### 2. THE FERMI CONCEPT

The objective for the FERMI collaboration is to develop an interface between the detector element and the global DAQ system.

The FERMI concept, see block diagram in figure 1, contains an analogue dynamic range compressor in order to cope with the 16 to 17 bit dynamic range. A sampling analogue to digital converter samples at the bunch crossing frequency. To de-compress the digitised data, a Look-Up Table (LUT) is chosen. This LUT, implemented using a RAM, contains the inverse transfer function of the whole electronics chain in front of the Analogue to Digital conversion. The LUT can compensate for all first order defects in the chain. The de-compressed data is used for the trigger primitive generation, and is at the

same time continuously stored in a digital pipeline with programmable length, while waiting for the level 1 trigger decision.



#### Fig. 1. FERMI concept block diagram

The de-compressed data from all channels in the basic unit is summed and filtered before sent to the trigger process. The filter is performing two tasks:

- 1) Extract the total energy and
- 2) Assign the origin in time.

These two tasks have to be concurrent, and of course in real time, synchronous with the bunch crossings. The latency of the adder and of the filter is constant. In the level 1 trigger processor, a selection operator is applied to the trigger primitive data extracted from all the front-ends after a known delay. The result of the selection process in the level 1 trigger is sent back to the front end. The front end receives the result from the level 1 trigger process at the same time as the corresponding data is coming out from the pipeline. If the result is negative, the data is discarded. If positive, a time frame of about 10 consecutive time samples is transferred to the event buffer, together with event identifiers, waiting for the final readout.

When finally reading out the data from the event buffer, a readout filter is applied to each time frame in order to calculate the energy deposition. The task for this filter is delicate. It has to, as accurate as possible, extract the absolute value corresponding to the energy deposition originating from the bunch crossing corresponding to the level-1 trigger accept. It has also to suppress energy deposits originating from other bunch crossings with an as large factor as possible.

#### **3. THE FERMI BUILDING BLOCKS**

The FERMI system implementation consists of a number of channel blocks, each one serving a single channel, and a service block, serving all channel blocks. The channel block contains an analogue dynamic range compressor, a sampling ADC, a digital decompression stage, a digital pipeline, and a set of event buffers. The service block consists of a trigger part and a DAQ part. On the trigger side, an adder is summing the data from all channels. A real-time digital filter is extracting the energy and time information from the channel sum for the first level trigger process. The DAQ side contains a readout filter and an FPGA readout controller.

## 3.1 The Analogue Dynamic Range Compressor

The chosen architecture uses four differential amplifier stages with a gain of 1, 2, 5, and 16.

The four are arranged in parallel, and receives the same input signal. The output from each stage is current limited, so that they saturate at chosen limits. The current output from the four stages is summed in an output stage. The whole arrangement gives a piecewise linear transfer function.

The compressor circuit was prototyped in a mask programmable transistor array from Gennum. It has a bandwidth of about 60 MHz and an output noise of 250  $\mu$ V at a full scale of 2V, it could give a clean input signal even to a 12-bit ADC. The theoretical dynamic range with a 10-bit ADC is slightly above 16 bits, and with a 12-bit ADC about 17.5 bits. This was also proven in bench tests.

#### 3.1.1 Dynamic Range Compression Constraints

A study of the actual requirements on a compressing circuit has been done [9]. The conclusions apply to any kind of analogue compression based gain stages with different gain factors. Table 1 shows the constraints on bandwidth and slew rate.

Table 1. Bandwidth and slew rate requirements. Input: Exp. pulse, 50ns shaping, 2V full scale.

| V <sub>diff</sub> | Error  | 3dB BW | Slew    |
|-------------------|--------|--------|---------|
|                   |        |        | rate    |
| 2mV               | 0.1%   | 54MHz  | 310V/us |
| 1mV               | 0.05%  | 75MHz  | 450V/us |
| 0.5mV             | 0.025% | 106MHz | 600V/us |
| 0.25mV            | 0.012% | 148MHz | 850V/us |

A design, using the Ericson P28 bipolar technology, has been developed. Spice simulations show that these performances can be reached [10].

## 3.2 The PSA-ADC

When the FERMI project was approved in 1991, one of the main issues was to design a low power sampling ADC able to run at bunch crossing speed.

A very tempting architecture, mainly latency wise, is the flash ADC. However, the power hungriness of this architecture excluded this architecture.

Another architecture, using only one comparator, thus consuming a minimum of power, is the Successive Approximation ADC (SA-ADC).

The disadvantage of this architecture is of course the latency, and the fact that, for the 10-bit case, only one conversion can be done every 10 clocks. The latter can however be solved by stacking 10 complete ADCs in parallel and interleave the sampling and conversion [2]. This architecture was chosen, see figure 2. The architecture is named Parallel Successive Approximation ADC, or PSA-ADC. Four more SA-ADCs is added in order to permit two auto-zero cycles of the sample-and-hold, one sampling cycle, and one relaxation cycle before the ten conversion cycles. A block diagram is found in figure 3.

The PSA-ADC was prototyped in the AMS 1.2u CMOS technology. With a core consumption of less than 200 mW for a full 10-bit ADC running up to 70 MHz, this prototype is a masterpiece of its time.





The performance is, at 40 MHz sampling rate, statically measured to better than 9.8 effective bits, and dynamically to about 9.4 effective bits.

#### 3.3 The Analogue ASIC

In order to decrease the number of chips on the MCM the analogue dynamic range compressor has been integrated with a further developed PSA-ADC. The complete ASIC is implemented using the AMS 1.2u BiCMOS technology, see photo in figure 3.



Fig 3. Photo of the Analogue ASIC

Despite minor layout errors, that were quickly found and corrected, the analogue ASIC is a success.

The complete ASIC, including the compressor and the PSA-ADC, consumes in the order of 190 mW at 40 MHz sampling rate. The effective dynamic range at 40 MHz sampling rate is about 15 bits at 9 effective bits of resolution.

One problem was encountered. The simulated 3dB bandwidth of over 80 MHz for the compressor is more than adequate [9]. The ASIC itself turned out to have just above 20 MHz. This adds some second order effects to the digitised pulse shape. a parallel project using the same technology encountered the same problem, pointing towards a process parameter error.

### 3.4 The Digital Pipeline and Event Buffer

The heart of the FERMI concept is the Digital Pipeline. Several different versions of digital pipeline ASICs have been developed.

The early versions include a 1kword Look-Up Table for linearisation of 10-bit compressed data, a combined Pipeline and event buffer RAM block, and control logic. A high degree of fault tolerance, imposed by the harsh environment inside the LHC experiments, is implemented. The functionality in the later versions is tracking the evolution in the LHC experiments. In the end this leads to the suppression of nearly all redundancy, instead concentrating on error detection in order to preserve the data integrity.

#### 3.4.1 The First Channel Chip

The architecture consists basically of three functional blocks, lineariser, pipeline/derandomiser, and control. It serves three channels and is implemented in 1u CMOS from AMS. Very complex, containing about one million transistors, this ASIC is a pure lab product. Being functionally correct, it suffers badly from simultaneous switching noise, why the chip can not run at full voltage, and subsequently not at specified speed.

#### 3.4.2 The Three-Channel Chip

The three channel chip has the same basic functionality as the first channel chip. All, for test beam purposes, nonvital functions are removed. The chip is implemented in the ES2  $0.7\mu m$  CMOS process. The three-channel chip is fully functional, and has been used with success in trigger tests together with the CMS trigger and the CMS ECAL.

#### 3.4.3 The One Channel ASIC

The One Channel ASIC is a major rework of the channel functionality. It serves one single channel. Instead of, as in the first channel chip, having a combined pipeline and derandomiser RAM, in has a DPRAM for the pipeline and eight separate event buffers in the derandomiser. A block diagram is found in figure 4.

On the input, a RAM of 1k times 23 data bits is used as LUT. The calibration constants written in the LUT are

ECC encoded, capable of 2-bit error detection and 1-bit error correction.



Fig. 4. Block diagram of the channel ASIC

The pipeline is divided in 8 identical blocks of 32 locations. Any even number of blocks between 2 and 8 can be used. The unused blocks can be used for redundancy, and can thus replace a failing block.

All eight derandomiser buffers are identical, and each one can be disabled in case of failure.

The one channel ASIC is the last channel ASIC being constructed to reside inside the experiment. Fully reconfigurable, it has a large fault-tolerance.

The ASIC is implemented in AMS 0.8  $\mu$ m CMOS technology. A photo of the ASIC is found in figure 5.



Fig. 5. Photo of the One Channel ASIC

#### 3.4.4 The Pipeline ASIC

In parallel with the one channel chip, another pipeline ASIC was developed.

It contains a minimal functionality, and is useful for test beam purposes because of its limited complexity and ease of use.

With a 160 clocks programmable pipeline and 5 event buffers, it has been used in ATLAS-Tiles Module0 tests, and in CMS ECAL Prototype -97.

#### 3.5 Trigger feature extraction

The trigger feature extraction function of FERMI consists of an adder and a level-1 filter ASIC. A Finite Impulse Response (FIR) structure was chosen after a systematic evaluation of different architectures.

The level-1 trigger filter is designed to provide accurate energy information and bunch crossing assignment for the global level-1 trigger. It operates on a sum of channels, and consists of two parallel FIR filters, each with six elementary stages (taps) and a three-point maximum finder, see the block diagram in figure 6. The most recent level 1 filter is implemented in the AMS 0.8  $\mu$ m CMOS technology. A photo is found in figure 6a.



Fig. 6. Block diagram of the level 1 filter

The energy extraction FIR can be optimised to extract the energy in the presence of certain artefacts. Typically it is an averaging operator with a relatively wide response on pulses in the time domain in order to improve the noise suppression.

The timing extraction FIR is optimised to produce a sharp maximum for each pulse even if it partially overlaps with another one. It is combined with a maximum finder in order to assign the output from the energy filter to a bunch crossing.



Fig 6a and 6b. Photos of a Level 1 and a readout filter ASIC

The coefficient optimisation strategy is based on either analytic calculation or an iteration method. The analytic solution is equivalent to matched FIR filtering. With the iteration strategy, the filter coefficients are obtained by minimising the mean squared error (MSE) between the desired and the actual output for a specific input.

The performance of filter F1 has been simulated using simulated Liquid Argon calorimeter signals with a sample timing uncertainty of 2ns RMS, and electronics noise with 70 MeV RMS, as well as the effects introduced by the compressor, the ADC and the LUT. The amplitude resolution is shown in figure 7.



Fig. 7. Energy resolution of the level 1 filter.

#### 3.6 Readout Feature Extraction

The readout filter is designed to extract a value of the energy with the highest precision allowed by the set-up and the experimental conditions. It takes as input the time frame for an individual channel, and returns a single absolute energy value.



Fig. 8. The readout filter architecture: FIR-OS with 3 coefficient banks

The filter has three parallel FIR filters and an order statistics (OS) operator The OS operator is programmed to select the largest, the median, or the smallest output of the three filters for every time frame. A block diagram is found in figure 8. The filter coefficients and the OS mode are obtained using iteration, as the non-linear filter structure is too complex to optimise using analytical methods. The most recent readout filter is implemented in the AMS 0.8  $\mu$ m CMOS technology. A photo is found in figure 6b.

The FIR-OS filter structure offers a greater suppression factor for the different artefacts present in the acquired data compared to a single FIR filter. It also offers an efficient fault tolerant architecture: if a FIR unit fails, it can be switched off from the system, thus gently degrading the global performance.

The performance of the readout filter has been evaluated using the simulated detector sequence described above. The complete system response measured at the output of the filter is again illustrated in figure 9. The data set is the same as for the level-1 filter performance graph above. Please note that the comparison has been done with a detector with a higher performance.



Fig. 9. Energy resolution of the readout filter.

## 4. TESTS IN BEAM

Beam tests have been carried out in collaboration with ATLAS-Lar/RD34 [10], ATLAS-Tiles [11], and CMS ECAL/Trigger [12].

The most recent results come from CMS ECAL, testing the analogue ASIC, the three-channel chip and the level-1 trigger filter, together with level-1 trigger circuits.

The detector element was a CMS ECAL prototype crystal matrix.

#### 4.1 Trigger feature extraction

In the three-channel version of the Channel ASIC the channel an adder is included, thus creating a sum of the three linearised channels. The ES2 Level-1 Filter ASIC contains a 3-input adder, providing the final summation of the 3X3 trigger-tower, with the output feeding the Energy and Time FIR filters. The output is sent to the level-1 trigger. As the charge-ADC system has no provision to generate trigger primitives, all trigger tests were made only with the FERMI set-up.

The trigger filter function can be seen in figure 10.



Fig. 10. Three plots describing the Level-1 filter function

The plot to the left is showing data taken with the filter in transparent mode; i.e. all coefficients but one set to zero, showing the composite (6-channel strip) signal as a function of time. The second plot is the output from the energy filter with the coefficients configured to fit the pulse shape, and finally the third plot shows the energy filter output conditioned by the timing filter and its peak finder.

The energy resolution of the trigger feature extraction was measured at three different energies. The measurements show that the energy resolution of the trigger feature extraction is ranging from sigma/E = 3% at low energies down to 3.5% at 50 GeV. This is more than adequate for a first-level trigger process at LHC.

The time resolution of the individual channels with respect to the central crystal, where the beam hits the matrix, was measured to be below 4ns RMS.

### 4.2 Readout resolution

On the readout side, a direct comparison between the charge-ADC reference electronics and the FERMI readout system was done. Figure 11 shows the comparison at four different energies, shows an almost perfect correlation between the two. Figure 12 shows the fractional resolution of the two systems as a function of energy in the range 35 to 150 GeV.

The global results show that the FERMI concept, using a continuous dynamic range compression followed by a sampling ADC would give a satisfactory result for all planned calorimeters at the LHC experiments.



Fig. 11. Comparison of FERMI and QADC readout at four energies



Fig. 12. Fractional resolution for FERMI and QADC

#### **5. THE MULTI-CHIP MODULE**

A number of Multi-Chip Modules (MCM) have been built in order to evaluate the feasibility of the MCM technology. The choice is to use a silicon substrate and mount the different ASICs using a flip-chip technology. The basic advantage with the MCM as with other hybrids is that each function can be implemented in the most suitable technology. The great advantages with the silicon substrate are the excellent thermal properties, both the great heat conduction and the perfect matching in dilatation with the different ASICs, also on silicon substrates.

Two complete Multi-Chip Modules, or MCMs, have been successfully assembled. The first, to be considered as the termination of FERMI phase 1, has integrated ADCs.

The second is to be considered as the final demonstrator. It uses external ADCs, as required in all LHC implementations, and is specified to fulfil the requirements of CMS ECAL. It has been successfully tested in the lab, see figure 13 and 14.



Fig. 13. ASIC placement on the MCM



Fig. 14. The MCM

The result shows that the technology is fully understood, and that a data acquisition system based on MCM technology is a valid option.

#### **6. FUTURE**

The results obtained by the FERMI collaboration are available for any detector collaboration. The different building blocks can be used, or only the generic concept. For the LHC experiments, no readout system will, of course, carry the name FERMI. Instead, systems optimised for the actual requirements will be implemented. Many detector groups are currently finalising readout systems at least partly based on experience from the FERMI collaboration.

#### REFERENCES

- V. G. Goggi, B. Lofstedt, "Digital front-end electronics for calorimetry at LHC", Proc. ECFA Large Hadron Collider Workshop, CERN 90-10, ECFA 90-133, vol. 3 (1990) pp. 190-200.
- [2] C. Svensson, J. Yuan, "A 10-bit 5-MS/s successive approximation ADC cell used in a 70-MS/s ADC array in 1.2μm CMOS", IEEE Journal on Solid State Circuits and Systems Vol. 29, No 8,(1994) pp. 866-873.
- [3] L. Dadda, S. Inkinen, V. Piuri, "A processor for calorimetry at the Large Hadron Collider in the FERMI project", Proc. IEEE ASAP94, San Francisco, August 1994.
- [4] S. J. Inkinen et. al., "Nonlinear Filters for Pulse Amplitude Extraction in FERMI", Proc. IEEE Nuclear Science Symposium, (1994) pp. 687-691.
- [5] H. Alexanian et al., "FERMI: a digital Front End and Readout MIcrosystem for high resolution calorimetry", NIM A 357 (1995) pp. 306-317.
- [6] H. Alexanian et al., "Optimised digital feature extraction in the FERMI microsystem", NIM A 357 (1995) pp. 318-328.
- [7] S. J, Inkinen, Tampere Univ. (1995), Thesis.
- [8] B. Lofstedt, W. Kurzbauer, "Investigations of the dynamic compression principles for fast detector pulses", NIM A396, (1997 pp. 198-213).
- [9] W. Kurzbauer, Univ. of Gratz (1997) Thesis.
- [10] F Astesan et al, Study of the properties of the noise seen by the FERMI system in the Liquid Argon Electromagnetic Calorimeter Test-Beam setup in May-June 1996, ATL-A-PN-45
- [11] S. Agnvall et al. Evaluation of FERMI Read-out of the ATLAS Tilecal Prototype, 28 Apr 1997, ATL-L-PN-116
- [12] R. Benetta et al. Beam Tests of the Trigger and Digital Processing Electronics for the Electromagnetic Calorimeter of the CMS Experiment, CMS NOTE-1998/008