1 Introduction

An ‘air-gap’ is a measure taken in order to keep a computer network (or other type of IT devices) disconnected from public networks such as the Internet. In air-gap isolation, there is no wired or wireless connection between the internal network and the outer world. Given the high level of separation, attackers cannot breach the network and steal data using remote attacks launched over the Internet. Military networks, as well as networks within financial organizations, critical infrastructure, and commercial industries [1], are known to be air-gapped due to the sensitive data they store and process. Despite the high level of isolation, an air-gap doesn’t provide hermetic protection from breach events. Several incidents in which air-gapped networks have been compromised have been published in the recent years [2]. Infecting such networks can be accomplished through a malicious insider, stolen credentials, physical access, and so on [3].

Once the attacker has a foothold in the target network, he/she may want to exfiltrate valuable data. To that end, the attacker has to overcome the physical isolation by bridging the air-gap. Over the years, different types of covert channels have been proposed by security researchers, enabling exfiltration through an air-gap. Electromagnetic methods that exploit electromagnetic radiation from different components of the computer [4, 5] are likely the oldest type of covert channel researched. Various types of optical [6], thermal [7], and acoustic [8, 9] out-of-band communication channels have also been suggested.

1.1 Speakerless Computers

Most of the acoustic covert channels require a speaker (as a transmitter) and a microphone (as a receiver) to be installed in the air-gapped computers, in order to enable bi-directional covert communication. A malware can encode the data over sonic or ultrasonic frequencies, and subsequently broadcast it through the computer speaker. Another computer with a microphone can receive the transmissions, decode the data, and sent it to the attacker. To avoid such an attack, security policies may prohibit the use of speakers and microphones in a secure network, a measure also referred as an ‘audio-gap’ [10, 11]. Keeping speakers disconnected from sensitive computers can effectively mitigate the acoustic covert channels based on speakers [12].

In this paper, we introduce ‘DiskFiltration’, an acoustic channel which works even when speakers (or other audio related hardware) are not present in the infected computer. Our method is based on exploring intrinsic covert noises emitted from the hard disk drive (HDD) which exists on most computers today. We show that malicious code on a compromised computer can perform ‘seek’ operations, such as the HDDs moving head (the actuator) will induce the generation of such noise patterns at a certain frequency range. Arbitrary binary data can therefore be modulated through these acoustic signals, and the signals can then be received by a nearby device equipped with a microphone (e.g., smartphone), and be decoded and finally sent to the attacker.

1.2 HDD, SSD, and SSHD

Three primary types of mass storage drives exist today:

Hard Disk Drive (HDD).

The hard disk drive uses a mechanical arm (actuator) with a read and write head to access information at the correct location on a spinning magnetic platter. The hard disk drive is the most prevalent mass storage medium used in PCs, servers, legacy systems, and laptops ([13], 2017 forecast).

Solid State Drive (SSD).

Solid state drives store the data in interconnected flash memory chips (e.g., NAND based flash chips), a type of non-volatile memory. There are no moving or mechanical parts to an SSD, and hence they emit virtually no noise.

Solid State Hard Drive (SSHD).

Solid state hybrid drives (SSHDs) combine HDD and SSD technology in the same unit. The flash is used as a cache buffer for frequently used data, while the rest of the data is stored on the magnetic media. SSHD contains the HDD’s mechanical parts, and hence emits noise when the data stored on the HDD component is accessed.

Our method is based on the acoustic signals generated by the hard drive’s mechanical parts, and therefore is relevant to HDDs and SSHDs, as opposed to SSDs. Generally speaking, SSDs are considered to have advantages over HDDs in term of data access speed, unit size, and reliability. Despite the increased rate of adoption of SSDs, HDD are still the most sold storage devices, mainly due to their low cost. In 2015, 416 million HDD units were sold worldwide, compared to 154 million SSD units. Currently, HDDs still dominate the storage wars, and most PCs, servers, legacy systems, and laptops are installed with HDD drives [13]. This means that our covert channel is available on most of today’s desktop and servers.

The rest of this paper is structured as follows. In Sect. 2 we present related work. Section 3 discusses the attack model. Section 4 introduces the anatomy of hard disk drives, and Sect. 5 discusses its acoustic characteristics. Section 6 describes the implementation of the transmitter and receiver. Section 7 present evaluation results. Section 8 proposes countermeasures, and we conclude in Sect. 9.

2 Related Work

Covert channels allowing exfiltration of data from air-gapped computers can be categorized into electromagnetic, optic, thermal, and acoustic. The general technique of spying on information systems through leaking emanations is also referred as a TEMPEST attack [14].

Electromagnetic.

Electromagnetic emanations from different computer components have been investigated as a medium for data transmission for more than twenty years. Intentional emissions from a computer screen was first discussed by Kuhn and Anderson [4] and Thiele [15]. More recently, AirHopper malware [16, 17] used the video cable to generate FM radio transmissions, in order to leak data to a nearby mobile phone. In the same manner, GSMem [5] exploit electromagnetic radiation generated from the computer bus to transmit data over the air-gap. Other types of electromagnetic based methods are discussed in [18].

Optic.

Optical methods are less discussed in the context of covert channels, since they are visible to the surrounding environment. Data leakage through keyboard LEDs was proposed in [19]. VisiSploit, a covert optical method, was proposed by Guri et al. [20]; in this method, data is leaked from the LCD screen to a remote camera via an invisible image projected on the screen. Other optical methods suggested for exfiltrating data from air-gapped computers are the Hard-Disk-Drive LEDs [6] and routers LEDs [21].

Thermal.

Air-gap communication using heat emissions was proposed in [7]. In a method called BitWhisper, the authors demonstrated slow communication between adjacent air-gapped computers via heat exchange. Thermal covert channels on modern multicores (in the same system) have been thoroughly studied by Bartolini et al. [22].

Acoustic.

Acoustic methods are based on leaking data over sound waves at sonic and ultrasonic frequencies. Data transmission over audio was first reviewed by Madhavapeddy et al. in 2005 [23] when they discussed audio based communication between two computers. In 2013, Hanspach and Goetz [24] used near-ultrasonic soundwaves to establish a covert channel between air-gapped systems equipped with speakers and microphones. They implemented a botnet which communicated between computers at distance of 19.7 m with a bandwidth of 20 bits/s. The work in [25] extends the ultrasonic covert channel for smartphones, demonstrating how data can be transferred up to 30 m away. Interestingly, in 2013, security researchers claimed to find BIOS level malware in the wild (dubbed BadBios) which communicates between air-gapped laptops using ultrasonic sound [26].

Notably, speakers are sometimes forbidden from certain computers based on regulations and security practices [10]. In 2016, Guri et al. introduced Fansmitter, a malware which facilitates the exfiltration of data from an air-gapped computer via noise intentionally emitted from the PC fans [9]. In this method, the computer does not need to be equipped with audio hardware or an internal or external speaker. Our method uses the acoustic signals emitted from the hard disk drive (HDD). Although it is a known fact that HDDs generate acoustical noise, it has never been studied and analyzed in the context of a covert-channel.

3 Attack Model

DiskFiltration, as an acoustic covert channel, can be used to leak data from air-gapped computers. However, this covert channel can also be used in the case of Internet connected computers (non air-gapped) in which the network traffic is intensively monitored by networked-based intrusion detection (IDS), intrusion prevention (IPS) and data leakage prevention (DLP) systems. In these cases, exfiltration of data though the Internet traffic may be detected, and hence the attacker may want to resort to an out-of-band covert channel.

The adversarial attack model consists of a transmitter and a receiver. The transmitter is usually an ordinary desktop computer or server with at least one HDD installed. The receiver is a nearby device with audio recording capabilities. It can be a smartphone placed on the table, a smartwatch on the user’s hand, or a nearby PC with a microphone. Infecting highly secure networks can be accomplished, as demonstrated by incidents such as Stuxnet [27], Agent.Btz [28], and others [29]. Infecting a mobile phone or other recording device, can be accomplished via different attack vectors, using emails, SMS/MMS, malicious apps, and so on [30].

The malware installed on the computer gathers the data to exfiltrate (e.g., passwords or encryption keys), and then transmits it using acoustic signals emitted from the HDD. The acoustic signals are generated by performing intentional seek operations which cause the HDD actuator arm to make mechanical movements. The nearby receiver receives the transmission, decodes the data, and transfers it to the attacker via the Internet Wi-Fi networks, mobile data, or SMS.

4 Anatomy of a Hard Disk Drive

In this section we provide the technical background necessary to understand the way DiskFiltration works. A more comprehensive description of HDD functionality and its internal operation can be found in [31].

The internal view of a hard disk drive is shown in Fig. 1. Hard disk drives store data in disks, or platters, coated with magnetic material (Fig. 1A). The platters rotate at various speeds, depending on the type of HDD. Modern consumer-grade HDDs commonly have rotational speeds of 5400, 7200, or 15,000 revolutions per minute (RPM). The engine that rotates the platters is the spindle motor (Fig. 1B), and it spins at a constant speed that is tied to the RPM of the HDD. Notably, this motor is one of the constant sources of noise from a HDD. The magnetic data is read/written from/to the platters using read-and-write heads (Fig. 1C). These heads are positioned very close to the magnetic surface (at a distance of nanometers from one another) and can detect (read) or change (write) the magnetization of the material passing under it. Modern HDDs have several stacked platters, each of which has its own read-and-write head. All of the read-and-write heads are attached to the actuator arm (Fig. 1D). During read and write operations, the actuator (Fig. 1F) rotates the actuator axis (Fig. 1E) which moves the read-and-write heads on an arc across the platters as they spin. The mechanical movements of the actuator generate noise at different levels and frequencies. Video clips showing HDD internal parts during operation can be found online [32].

Fig. 1.
figure 1

A hard disk drive’s internal parts.

The magnetic data is stored on circles on the surface known as tracks (Fig. 2A). Corresponding tracks on all surfaces of a drive (on all platters) make up a cylinder. Two fundamental terms of disk geometry are the geometrical sector and the disk sector. A geometrical sector (Fig. 2B) is a section of a disk between a center, two radii, and a corresponding arc. A disk sector (Fig. 2C) refers to the intersection of a track and geometrical sector. Logically, the disk sector is the minimum storage unit of a hard drive. ‘Seek’ describes the operation of the actuator arm to move to a specific track of the disk where the data needs to be read or written. The time it takes to move the head to the desired track is called the seek time. As we describe in the following section, the movement of the head assembly on the actuator arm during the seek operation emits acoustic noise.

Fig. 2.
figure 2

Basic geometry of an HDD platter

5 HDD Acoustics

An HDD emits noise at different frequencies and intensity levels which are produced by the movements of its internal parts. Notably, although there have been several studies on the acoustic characteristics of a hard drive, the noise emission mechanisms and the precise source of such emissions have not been comprehensively modeled [31, 33].

There are two primary sources of acoustic noise inside a drive: the motor and the actuator. These sources correspond with two type of noises as explained below.

Idle acoustic noise is defined as the noise generated when the HDD spins the disks (platters). Idle noise is generated mainly by the spindle motor and the ball bearings inside the motor. This main frequency of idle noise can be calculated by \( IdleMainFreq = RPM/60 \) where \( RPM \) is the HDD rotation speed. Figure 3 shows the spectrogram of the idle acoustic noise generated by the Western Digital HDD spinning at 7200 RPM. The primary tone is generated at \( 7200/60 = 120\,{\text{Hz}} \), and can be seen in the spectrogram as a highlighted continuous frequency peak.

Fig. 3.
figure 3

Spectrogram of the idle acoustic noise generated by a HDD with an RPM of 7200

Seek acoustic noise is generated by the engine of the actuator and its movement during seek, read, and write operations. This noise is produced during file system activities (e.g., file read and write) and is usually louder than the static Idle acoustic noise. Unlike idle acoustic noise, the seek noise depends on many factors (magneto-electric interactions, vibrations, and so on), hence the exact tone frequency cannot be calculated by a formula [31, 33]. The exact seek tone frequency regions (expected to be up to a range of \( 6\,{\text{kHz}} \) [33]) can be investigated through a waveform analysis. In this research, we exploit frequency regions which are probably rooted on the hard disk seek time, and in particular, on the shortest seek time component, the track-to-track seek time, which is the time required to move from adjacent tracks. As is shown later, the most informative frequency region detected in our experiments is around 2080 Hz, which is equivalent to a 0.48 ms track-to-track seek time.

5.1 Noise Reduction Technologies

Many HDD manufacturers include a feature called automatic acoustic management (AAM) [34] which aims at reducing seek acoustic noise. Such technologies (e.g., Western Digital IntelliSeek [35]) use sophisticated algorithms to regulate the acceleration and positioning of the HDD actuator so that the emitted noise is reduced. Enabling and disabling this feature is possible with the appropriate software or with an API to the HDD controller [36]. During our experiments we didn’t modify the AAM setting, which is usually set to on by default. The main reason we do so is to keep our covert channel as stealth and quiet as possible in order to evade detection by the user.

5.2 Acoustic Signal Generation

The idle acoustic noise emitted from disk rotation is static and cannot be controlled by software. In order to modulate binary data, we exploit the seek acoustic noise generated by the movements of the actuator. By regulating (starting and stopping) a sequence of seek operations, we control the acoustic signal emitted from the HDD, which in turn can be used to modulate binary ‘0’ and ‘1’. Next, we examine the seek acoustic noise generated by three types of operations: read, write, and seek.

‘Read’ and ‘Write’ Operations.

Figure 4 shows the spectrograms of acoustic waveforms generated from the HDD during read (left image) and write (right image) operations as recorded from outside the computer chassis. In this test we read the content of 100 MB binary file to a buffer in the memory, and write 100 MB of random bytes to a file in the disk. During the tests, the cache was disabled to guarantee physical disk access. Read and write operations cause acoustic bursts seen as a general increase in frequency for a short time period. During most of these operations, the head stays at the same position.

Fig. 4.
figure 4

Spectral views of read (left) and write (right) operations, with minimal head movements.

Acoustic Signal Generation.

We also examine the acoustic noise emitted by seek operations when the actuator moves between tracks at different distances. Figure 5 shows the acoustical waveform generated from the HDD during seek operations as recorded from outside the computer chassis. In this test we perform three types of seek operations, (1) seeking and reading repeatedly from the first and last sectors, (2) seeking and reading between two consecutive tracks, and (3) seeking and reading between two consecutive sectors. The seek and read operations cause an acoustic signal to wrap all over the range of 0 to 6000 Hz. There were no significant acoustic differences (frequencies or amplitude) between the three types of seek operations. This indicated that in order to emit a noticeable level of noise it is sufficient to perform seek operations between any two tracks. In our tests we used the seek operation between the first and the last track of the HDD.

Fig. 5.
figure 5

Spectral view of ‘seek’ operations between different tracks

6 Implementation

In this section we describe the implementation of the DiskFiltation transmitting software, including signal generation, data modulation, and bit-framing. We also describe the implementation of a receiver as an Android app for the smartphone.

6.1 Transmitter

A program can perform disk operations with two types of addressing: file system addressing and direct disk addressing. In file system addressing, the running process specifies the file name to perform the read or write operations on. In direct addressing, the process specifies the physical location on the HDD layout for the required I/O operation, e.g., specifying a sector number to read from or write to. Modern OSs, such as Windows and Linux, provide APIs for the two type of addressing; in particular, they allowing direct disk addressing [37]. Technically, it means that user-level processes can generate the acoustic signals by performing seek operations by specifying sector numbers or the location within files. Notably, file level operations may not require any special permissions (e.g., root). For example, any process may be able to read and write files from or to temporary or working folders. We choose to use the seek operation, as it generates the highest level of the acoustic signal. The transmitter is a C program which uses the direct addressing system calls using the fopen(), and fseek() systems’ calls [38]. For the testing we also implemented a shell script version of the transmitter using the Linux dd command-line utility [39]. This is a low level utility of Linux which can perform a wide range of HDD operations at the file or block level.

6.2 Data Modulation

To transmit binary data we used a simple on-off keying (OOK) modulation. In this digital modulation scheme, data is represented by the presence of a carrier at a specified frequency \( Fc \). More specifically, a binary ‘0’ is represented by the presence of a carrier for a duration of \( T_{1} \), while its absence for the duration of \( T_{0} \) represents a binary ‘0’. Algorithm 1 shows a pseudo code for our C program which handles the transmission of a bit \( b \).

figure a

The transmitBit procedure receives the ‘0’ and ‘1’ transmission time (T0, T1) and two sector numbers for the seek operation (BEGIN_SEC, END_SEC). As we previously explained, in signal generation, moving the actuator between the sectors positioned in different tracks produces the highest level of noise. If the bit to transmit is ‘0’, the procedure does nothing by sleeping for duration T0. If the bit to transmit is ‘1’, the procedure invokes seek operations, causing the head to repeatedly move between BEGIN_SEC and END_SEC for duration T1.

Bit Framing.

As explained, unlike the idle acoustic noise, seek acoustic noise may vary depending on the type of HDD, and differences in seek acoustic noise can also vary between HDDs of the same model. Although the general range of seek tone frequency is known (e.g., 0–6 kHz [33]), the exact tone frequency cannot be calculated by a formula. This implies that a potential receiver (e.g., an application in a smartphone) needs to scan the frequency range first, in order to find and detect the carrier used for the on-off keying modulation. In addition, \( T_{0} \) and \( T_{1} \) may be set differently on each transmitter, and may be unknown to the receiver in advance. To assist the receiver in dynamically synchronizing with the transmitter parameters, we transmit data in small frames. Each frame consists of a preamble sequence of four bits, a cyclic redundancy check (CRC) of 8 bits and a payload of 36 bits (Table 1).

Table 1. A frame consisting of four bits of preamble, followed by a payload of 36 bits

The preamble consists of the ‘1010’ sequence and is used by the receiver to periodically determine the carrier frequency. In addition, the preamble header allows the receiver to identify the beginning of a transmission in the area and extract other channel parameters, such as \( T_{0} \) and \( T_{1} . \) The CRC is computed on the 36 bits payload and added after the preamble. The receiver calculates the CRC for the received payload, and if it differs from the received CRC, an error is detected. For reliable separation of frames we add a time delay of two bits between the transmissions of two consecutive frames.

Stealth.

As noted, modern HDDs include a feature called AAM [34] which reduces seek acoustic noise. In order to keep the covert channel as stealth as possible, we did not modify the AAM setting, resulting in quiet HDD operation. Our experiments show that in modern HDDs, the generated acoustic signals blend with the background noise and are not noticeable by the user. Users may notice the HDD activity by seeing the HDD’s blinking LED or hearing unusual seek noise. However, such occurrences won’t raise suspicions, since they aren’t out of ordinary because the HDD is routinely active due to swapping, indexing, backups, and other types of background operations.

6.3 The Receiver

Directly decoding the acoustic information from the transmitted waveform is not efficient, since the relevant information encoded by the induced HDD operations are concentrated in narrowband frequency regions. The signal-to-noise ratio (SNR) of the captured waveform can be significantly improved by exploiting the informative spectral regions, rather than the whole frequency spectrum. In this research, these regions are defined experimentally, because a theoretical modeling of the position of the spectral peaks is quite complex, as discussed earlier.

In order to analyze our distinct encoding, we estimated the SNR in the frequency domain (as opposed to the time domain) as follows. Our “signal” \( (X) \) level is estimated by summing the magnitudes of the Fourier transform bins within our defined informative regions \( (R) \) during induced seek operations (bit 1). Noise level \( (N) \) is estimated in the same way, during an idle noise interval (bit 0). The SNR (in dB) is therefore the logarithm ratio of these two quantities:

$$ SNR_{R} = 20*log\left( {\mathop \sum \limits_{k = R} \left| {X_{k} } \right|/\mathop \sum \limits_{k = R} \left| {N_{k} } \right|} \right) $$

The signal adds up coherently in the frequency domain, whereas noise adds up incoherently. Therefore, in order to maximize the SNR, we would like to define \( R \) encompassing the most informative frequency bins. Note that the windowing settings and number of spectral bins used should be optimized in order to avoid spectral leaking (single frequencies spread through adjacent bins) and improve the SNR characteristics.

Figure 6 depicts the power spectral density (PSD) for seek and idle wave excerpts from the seek and read operations. The PSD reflects the average power of the signal in a logarithmic scale during a specific time-frequency region. It is expressed in dB relative to the auditory threshold. This wave was captured at a very close distance to the source, at 44.1 kHz, and resampled to 16 kHz. Fast Fourier transform (FFT) was calculated for 160 bins, spanning 50 Hz each. Spectral peaks are clearly spotted in the graphs, and the strongest low frequency peaks correspond to the basic frequency of 120 Hz generated by a 7200 RPM HDD. It can be observed that the 2050–2100 Hz region is the most informative in terms of the SNR. This means that optimal SNR estimation should be focused on this region. For instance, direct SNR calculation on the whole waveform (using all frequency bins) yields an SNR of 1.5 dB, as opposed to 12.0 dB obtained by setting R to 2050–2100 Hz using the bin corresponding to the highest SNR.

Fig. 6.
figure 6

PSD SNR as function of the frequency

Signal Decoding.

From a signal processing perspective, our decoder was implemented as an envelope detector of the waveform energy in the above mentioned frequency regions. In particular, we band-pass filter the received waveform between 2050–2100 Hz and then smooth the narrowband signal, convolving with an analysis window in order to estimate its intensity. The window length should be adjusted according to the bit transmission rate. There are alternative ways of detecting the evolving PSD of a signal on specific frequency region. One could use the Goertzel algorithm [40] which is an efficient FFT implementation for individual frequency bins. Another option would be to use Auto-Regressive (AR) models [41], very useful in describing time-varying random processes. Generally speaking, AR models offer a better frequency resolution but are slower than FFT. These advanced factors should be considered in system design and approached in future research.

Receiver Implementation.

The acoustic transmissions can be received by a nearby computer with a microphone, a smartphone placed on the desktop, or other types of recording devices. This subsection briefly describes the receiver implementation. Note that audio sampling and on-off keying (OOK) demodulation are widely used for commutation, and hence are not considered the main contribution of this paper. We refer interested readers to a detailed theoretical explanation and available source-code [42].

We implemented a receiver as an app installed on Samsung Galaxy (S4, S5 and S6) mobile phones with a standard microphone with a sampling rate of \( 44.1\;{\text{kHz}} \). The main functionality of the receiver is (1) audio sampling, (2) performing moving windows FFT, (3) preamble detection, and (4) payload demodulation. The receiver continuously samples the audio signals from the recording device - usually the built-in microphone. Technically, this is done by utilizing the AudioRecord class in the Android framework [43]. We then transfer the signal to the frequency domain using Fast Furrier transform. In its PAYLOAD state, the code continuously tries to detect a preamble, by scanning for a sequence of “1010” (a sequence of signal, no signal, signal, no signal). Once payload is detected, the channel properties (e.g., transmission time, noise, etc.) are saved, and the state is set to PAYLOAD. In a PAYLOAD state, the code demodulates a sequence of 32 bits using the OOK scheme, then returns to the PREAMBLE state. Note that error detection mechanisms, as well as handling of signal loss, are omitted from the description above.

7 Evaluation

In this section we present the evaluation results based on our experiments and analysis.

In our experiments, we used desktop computers installed with the transmitting application as our transmitter. The application can be configured to use ‘read’, ‘write’, or ‘seek’ operations, as well as to operate with specified transmission times and predefined sector numbers. During the experiments we checked five different PC desktop workstations with five types of internal HDDs. In addition, we tested external HDDs. The list of the computers and HDDs used during the tests is presented in Table 2.

Table 2. Desktop computers and HDD models tested

During our experiments we did not modify the HDD’s automatic acoustic management (AAM) setting, which is usually set on by default. The main reason for this is to keep our covert channel as stealth and quiet as possible in order to evade detection by the user. During all of the experiments the HDDs were firmly installed within computer cases in their usual internal drawers (except for the external HDDs). Before the experiments we validated that the computers’ cases were firmly enclosed.

We run the transmitter on desktop computers running the Linux Ubuntu OS, 64-Bit version 14.04.3 kernel 3.13.0. We implemented a version of the receiver as an app for the Android OS. All of our tests were conducted using the Samsung Galaxy (S4, S5 and S6) smartphones installed with stock Android. Our testing environment consisted of a computer lab with ordinary background noise, seven workstations, several network switches, and an active air conditioning system.

Figures 7 and 8 show the acoustical waveform generated from HDD-L, as received by a stationary smartphone placed at a distance of one meter and two meters, respectively, from the transmitter. In the two tests we used the ‘seek and write’ method for the transmission. Using on-off keying modulation, we transmitted a payload of “101010” when \( T_{0} = 2 \) s and \( T_{1} = 1 \) s. The received waveform was band-pass filtered between 2050–2100 Hz.

Fig. 7.
figure 7

Spectral view of the signal emitted from HDD-L, as received from a distance of one meter.

Fig. 8.
figure 8

Spectral view of the signal emitted from HDD-L, as received from a distance of two meters

Figure 9 shows the acoustical waveforms generated in four tests. HDD-O (Fig. 9a), received by a stationary smartphone placed at a distance of one meter, ‘seek and read’ method (using dd), and \( T_{0} = T_{1} = 5 \) s. HDD-A (Fig. 9b), received by a stationary smartphone placed at a distance of one meter, ‘seek & read’ method (using dd), and \( T_{0} = T_{1} = 3 \) s. HDD-I (Fig. 9c), received by a stationary smartphone placed at a distance of one meter, ‘seek & read’ method (using dd), and \( T_{0} = T_{1} = 3 \) s. HDD-G (Fig. 9d), received by a stationary smartphone placed at a distance of 0.5 m, ‘seek & read’ method (using dd), and \( T_{0} = T_{1} = 3 \) s. The received waveform was band-pass filtered between 2050–2100 Hz. In all tests we used on-off keying modulation to transmit a payload of “101010.”. The 128 bits payloads transmitted during the tests were successfully received and demodulated by the smartphone receiver placed up to two meters away with a BER (Bit Error Rate) of 0–10%. We found that with a distance greater than two meters the BER significantly increased, mainly because of the background noise in the lab. Increasing the distance further with specialized receivers (e.g., microphone array) is left for future work.

Fig. 9.
figure 9

Spectral view of the signal emitted from four hard drives (HDD-O, HDD-A, HDD-I, and HDD-G)

7.1 Causal Noise Emission

Since our covert channel is based on HDD activity, casual file operations of other running processes may interfere with the transmissions and interrupt them. Our experiments shows that most applications generate short bursts of noise with only moderate interruptions to the transmission activity. Figure 10 shows the acoustical waveform generated by HDD-I, when the computer was idle, playing video, and performing compilation for a 22 s period. During the idle time, only the system default processes, system services and shell command were active, without additional user application running. During the playing video time, a high definition (HD) video clip was played with the standard VLC media player. During the compilation time, we performed a compilation of medium size C project, using the GCC standard compiler.

Fig. 10.
figure 10

Spectral view of the signal emitted from HDD-I during different workload

As can be seen, the noise generated by casual operations is fleeting in bursts. There are two main reason for this phenomena. First, applications usually read and write files in a sequential manner, sector by sector. This means that most applications rarely seek between different tracks (e.g., files are stored on the same track), which minimizes the acoustic emission from the HDD. Second, the caching mechanisms (in the OS or in the HDD controller) try to reduce the amount of physical access to the hard drive, which minimizes the number of seek operations, and hence the acoustic emissions.

8 Countermeasures

Countermeasures to mitigate the DiskFiltration attack can be classified into three categories: hardware based, software based, and procedural based, as summarized in Table 3. Hardware based countermeasures. Replacing the HDD drives with SSD can eliminate the threat, since SSDs are not mechanical, hence generating virtually no noises. However, replacement of the hard drive in existing infrastructure may not be always practical due to the high cost [13]. In addition, most PCs, servers, legacy systems, and laptops are still shipped with HDD drives [13]. Acquiring a particularly quiet type of HDD [44] or installing the HDD within special enclosures [45] can also limit the range of emitted noise. Another type of hardware product includes signal detection and signal jamming systems. Noise detectors [46] aim at monitoring the background noise at specified frequency ranges. However, such noise detectors are usually limited to use in a quiet environment without noise. Jamming the HDD signal by generating static noise in the background is also possible but not particularly applicable in a work environment due to the disturbance it may cause to users. Software based countermeasures. At the software and firmware level, modern HDDs include a feature called automatic acoustic management (AAM) [34] which reduces seek acoustic noise. Ensuring that the AAM settings are at their correct values can limit the range of the emitted signals. As noted, the evaluation in this paper was performed with the default AAM settings, which are configured to their optimal values. Another solution may involve using host intrusion detection systems (HIDS) and host intrusion prevention systems (HIPS) to detect and prevent suspicious ‘seek’ pattern on HDDs. Such software based countermeasures can be evaded by malware and rootkits at the OS kernel [47]. In addition, distinguishing between legitimate read, write, and seek operations and malicious ones may not be a trivial task. Procedural based countermeasures. Procedural countermeasures involve a physical separation of emanating equipment from potential receivers. This approach is referred to as zone separation by United States and NATO standards [11]. In these standards sensitive computers are kept in restricted areas in which certain equipment is banned. In our case, smartphone and other types of recording devices should not be permitted in close proximity of the computer.

Table 3. Different types of countermeasures

9 Conclusion

We present a new type of acoustical covert channel code-named DiskFiltration. In this method, an attacker can leak binary data from computers over covert noises emanating from hard disk drives. Unlike most of the existing acoustic covert channels, DiskFiltration can work in computers that are not equipped with speakers or audio hardware. Malicious code installed on the computer can perform intentional seek operations, which cause the HDD head (the actuator) to move between different tracks. The mechanical movements generate acoustic signals which can be used for ‘0’ and ‘1’ modulation. The covert signals can be received by a nearby recording device. Despite DiskFiltration’s general contribution to the field of covert channels, it is particularly relevant in two adversarial scenarios: (1) in air-gapped networks where there is no network connection between the computer and the Internet, and (2) in computers with heavily monitored (by IDS and IPS systems) Internet connections. In these cases an attacker may resort to out-of-band covert exfiltration channels which are not monitored by existing defense measures. We provided the main technical details regarding the anatomy of modern HDDs and examined the acoustic signals generated by their basic read, write, and seek operations. Based on our observations we designed a rather simple data modulation and demodulation protocol and implemented a prototype of a transmitter (for a computer) and a receiver (for smartphones). We evaluate the covert channel in different types of HDDs and computer chassis, and examine its channel signal quality, channel capacity, and bandwidth. Results shows that DiskFiltration can be used to covertly transfer data to distance of up to two meters (six feet) from the transmitting computer at a bit rate of 180 bits/min.