The FMCW Radar Based Measurement Principles
Figure 1. shows the system schematic diagram for simultaneous video and radar vital signs monitoring. The FMCW device selected in this experiment was an mm-wave radar (IWR 1843, Texas Instrument), with an operational frequency range of 77–81 GHz. Out of the three transmitting and four receiving antennas equipped on the radar, only the closest spaced pair of transmitting/receiving antennas (TX/RX) was used. Chirps are generated by a waveform generator and transmitted via a self-oscillation circuit, a mixer, a pre-amplifier for the transmit antenna. The received signals are passed through a low-noise amplifier, a low-pass filter, the digital signal processing unit and the analogue-digital-converter (ADC) for the computer to perform further processing.
The principles of FMCW radar have been explained in detail in multiple research9–12. The conventional use of the FMCW radar is to perform a range FFT on the chirp sets with distance information and generate a range bin map. In the application of vital sign measurement, the amplitude of the chest movement is 4–12 mm, while the amplitude of heart pulses ranges from 0.1–0.5 mm21. In order to resolve these small-scale movements, the phase shift \({\Delta }\phi\) between two consecutive chirp signals will be calculated first.
The displacement \({\Delta }R\) of the object can then be calculated as:
\({\Delta }R=\frac{\lambda }{4\pi }{\Delta }\phi\)
(1)
where \(\lambda\) is the central wavelength of the radar.
As shown in Fig. 2. a), the primary processing technique involved performing phase unwrapping on the phase term of a set of chirps to obtain the correct phase information. A 20-second sliding window is then applied to compute vital signs second by second, as shown in Fig. 2. b). Similar to the previous studies, phase randomness and spike noises are removed by computing the phase difference and energy-based thresholding11,12.
Afterwards, a digital fourth-order Butterworth filter is applied in the frequency band of 0.2–0.6 Hz for respiration rate and 1–4 Hz for heart rate extraction, respectively. The resulting waveforms represent the heartbeat and breathing patterns in the data segment, as shown in Fig. 2. c).
As the breathing and cardiac cycles tend to be periodical, the vibration frequency can be extracted by applying a second FFT. The data is zero-padded with three times the data size to allow more data points in the resulting spectrum. Conventionally, the largest spectral magnitude of the resultant FFT spectrum corresponds to the HR (heartbeat frequency) and BR (breathing frequency). However, due to the spectral noise caused by motion disruption, the largest magnitude may not always produce the correct heart rate estimation. Here we propose a novel image segmentation-based method to address this issue.
Video-based Measurement Principles
Intensity-based face video vital sign extraction stems from photoplethysmography (PPG), a common and low-cost optical technique. Since light is more strongly absorbed by blood than the surrounding tissues, the periodic changes in blood flow can be detected by PPG sensors as changes in the intensity of light22. Thus, the subtle change in the light intensity on human skin can be captured by a digital camera. The face video vital sign extraction process can be referred to Fig. 2.
It has been shown that facial regions around the forehead, cheeks and mouth area tend to be more reliable for cardio-vascular pulses signal extraction23. Areas close to eyes and mouths are less suitable as they are likely to be affected by facial muscles17. Breathing pattern signals are most reliable around the chest area, corresponding to ventilation movement. As shown in Fig. 2. a), the cardio-vascular pulses for HR and BR are extracted from a manually selected face region-of-interest (ROI) and a box ROI on the chest area, respectively.
At first, the spatial average is employed to improve the SNR of the raw signal containing cardio-vascular pulses information and enhance the subtle colour changes5–8. Then, the pixel values of each colour channel in the selected ROI is averaged for each video frame to overcome sensor and quantisation noise. Unlike the conventional PPG-based devices utilising Near-Infrared light, the blood volumetric variation is reflected in three channels of colour video. Because haemoglobin and oxyhaemoglobin in blood both have higher absorption in the green channel than the red channel24, the green channel is selected to deliver the optimal SNR, as shown in Fig. 2. b). The same 20-second sliding window is applied.
The obtained signal waveform comprises "AC" components that are synchronous with each heartbeat and respiration. The slowly varying "DC" baseline corresponds to the subtle changes in illumination and head motion, even in a strictly controlled environment. Thus, a detrending filter is required to reduce the low frequencies and non-stationary trends of the raw signal10,13. This filter is effectively a high-pass filter with negligible latency. Then, a moving-average filter is applied to remove random noise caused by sudden light intensity changes or motion in the frame sequence.
Finally, the same Butterworth bandpass filter as mentioned in the FMCW radar signal processing section is applied to generate the heart and breathing waveforms, as shown in Fig. 2. c). HR and BR oscillations are the most periodic components in their frequency bands. They can be located as the signal with the most prominent power magnitude in the spectrum after applying the FFT. Similar to FMCW radar, the impact of motion disruption and change in illumination causes the peak values of the power amplitude to lack continuity and accuracy in video-based HR and BR measurement.
Graph-based Image Segmentation Estimation Method
This study employs the STFT on the acquired 20-seconds data segments for both camera and FMCW radar measurements. After converting the frequency unit from Hz to BPM, the spectra from the data segments form an STFT spectrogram to visualise the vibration signal strength, as shown in Fig. 3. Performing an image segmentation algorithm on the STFT spectrograms will compensate for the disruptions and reduce the impact of external factors compared to the conventional signal processing method.
The estimation method we adopted is graph-based image segmentation19,20. The work was originally developed for segmenting the retinal layers in the cross-sectional images from Optical Coherence Tomography (OCT)19. The algorithm is generalised for layered structures segmentation, which is an ideal method for extracting the vital signs in STFT spectrograms.
To summarise the segmentation method, each spectrogram is treated as an image, where the pixels are represented by a graph of nodes. The nodes in the graph are connected by edges. When cutting a graph into segments, a route between the start and the end nodes needs to be created by assigning weights to edges.
In literature, graph weights are often represented by the geometric distance and intensity difference of the graph nodes25. However, due to zero-padding in FFT and filtering, the transition between the adjacent pixels in the spectrograms are smooth, allowing us to consider the intensity difference only for weight calculations. Notably, the intensity of pixels corresponds to the spectral magnitudes. In the spectrograms, the signals to be extracted are layer-like and primarily horizontal. Therefore, the difference of intensity \({w}_{xy}\)can be represented by finding the vertical gradients of the image:
\({w}_{xy}= 2-\left({g}_{x}+ {g}_{y}\right)+{w}_{min}\)
(2)
where \({g}_{x}\) and \({g}_{y}\)are the vertical gradients of the image at node \(x\) and \(y\), respectively. \({w}_{xy}\) is the weight assigned to node \(x\) and \(y\), and \({w}_{min}\) is the minimum weight of the graph, added for system stabilisation.
An optimal path can be formed if the sum of assigned weights is at the minimum, which, in this case, is determined by using Dijkstra's algorithm26. The path is passed through a median filter and can then be considered as the extracted vital sign. Since the spectrograms are likely to consist of a single-layered structure after pre-extraction processing, the search region limitation is not necessary. The segmentation-based method can be used on spectrograms generated by both the radar and the camera to extract vital sign readings. The processing flow is demonstrated in Fig. 4.