Algorithms to Antenna: RF Fingerprinting for Trusted Communications Links

In this blog, we build on the examples described in a previous blog, where we showed how to identify waveform characteristics using deep-learning networks trained with synthesized radar and communications signals. We will apply similar techniques to perform RF fingerprinting to help identify trusted and unknown transmit sources in communications systems. While our example is focused on a wireless communications application, the same techniques could be used in multifunction radar systems to identify trusted communications link status.

First a bit of background: An RF transmitter-receiver pair creates a unique RF signature at the receiver, which is made up of a combination of channel paths and RF impairments. Our deep-learning network consumes baseband in-phase/quadrature samples and identifies the transmitting radio as a trusted source. Figure 1 shows a concept diagram of trusted and unknown propagation paths between a set of transmit and receive antennas.

1. The top image shows the trusted signal between transmit and receive nodes; the bottom image illustrates unknown sources that don’t match the expected RF fingerprint. (© 1984–2021 The MathWorks, Inc.)

To start, we use a wireless local-area-network (WLAN) system for our scenario because we want to verify the system using signals we collect from a radio and WLAN is ubiquitous. This same type of workflow could be applied to radios that operate outdoors over long distances in the presence of terrain and buildings.

We train the network to identify router impersonation, which is a form of attack on a WLAN network. In such an attack, a malicious agent tries to impersonate a legitimate router and trick network users to connect to it. Security identification solutions based on simple digital identifiers, such as media-access-control (MAC) addresses, IP addresses, and SSID, aren’t effective in detecting such an attack. Since these identifiers can be spoofed, a more secure solution uses other information, such as the RF signature of the radio link.

To simplify our discussion, we assume a network can identify transmitting radios if the RF impairments are dominant or the channel profile stays constant during the operation time. Most WLAN networks have fixed routers that create a static channel profile when the receiver location is also fixed. A trained deep-learning network can identify router impersonators by comparing the received signal’s RF fingerprint and MAC address pair to that of the known routers.

As with the examples in our earlier blog, we start with a workflow using synthesized data to train, validate, and test our system. In the second part of this blog, we repeat the workflow using data collected from a set of routers and radios.

Example Using Synthesized Data

We train a convolutional neural network (CNN) with simulated WLAN beacon frames from known and unknown routers for RF fingerprinting. The MAC address of received signals and the RF fingerprint detected by the CNN are compared to detect WLAN router impersonators.

We modeled an indoor space with three trusted routers with known MAC addresses. As shown in Figure 2, our scenario includes unknown routers that enter the observation area. Unknown routers are defined as "harmless" radios.

2. Scenario for synthesized example with three trusted routers and a collection of unknown routers. (© 1984–2021 The MathWorks, Inc.)

We collected unknown router data by moving around a Wi-Fi router to generate many channel profiles (in this case RF fingerprints). This enabled the network to learn that any fingerprint not marked as known is unknown, thus creating an "unknown" category. Some of these are harmless attempts and some are modeled as router impersonators. The observer node collects non-high-throughput (non-HT) beacon signals from these routers and uses the legacy long training field (L-LTF) to identify the RF fingerprint.

The transmitted L-LTF signals are the same for all routers so that there aren’t any data dependencies. Since the routers and the observer are fixed, the RF fingerprints (shown in Figure 2 as RF1, RF2, and RF3) are a combination of multipath channel profiles and RF impairments that don’t vary in time. The “unknown” router data is a collection of random RF fingerprints, which are different than the known routers.

Figure 3 (top) shows a user connected to a router and a mobile hot spot. The observer receives beacon frames and decodes the MAC address. The observer node also extracts the L-LTF signal and uses this signal to classify the RF fingerprint of the source of the beacon frame.

If the MAC address and the RF fingerprint match, as in the case of Router 1, Router 2, and Router 3, the observer declares the source a “known” router. If the MAC address of the beacon isn’t in the database and the RF fingerprint doesn’t match any of the known routers, as in the case of a mobile hot spot, the observer declares the source an “unknown” router.

Figure 3 (bottom) shows a router impersonator that replicates the MAC address of a known router and transmits beacon frames. The hacker can jam the original router and force the user to connect to the evil twin. The observer receives the beacon frames from the impersonator and decodes the MAC address. The decoded MAC address matches the MAC address of a known router, but the RF fingerprint doesn’t match. The observer declares the source a router impersonator.

3. Shown are known and unknown sources (top), and unknown sources that don’t match the expected RF fingerprint (bottm). (© 1984–2021 The MathWorks, Inc.)

To train our deep-learning network, we generate a dataset of 5,000 non-HT WLAN beacon frames for each router. We use MAC addresses as labels for the known routers; the remaining are labeled as “unknown.” Our dataset is broken into training (80%), validation (10%), and test (10%) sets.

Wi-Fi routers that implement 802.11a/g/n/ac protocols transmit beacon frames in the 5-GHz band to broadcast their presence and capabilities using the orthogonal frequency-division multiplexing (OFDM) non-HT format. The beacon frame consists of two main parts: preamble (SYNC) and payload (DATA). The preamble also has two parts: short training and long training.

In this example, the payload contains the same bits except for the MAC address of each radio. The CNN uses the L-LTF part of the preamble as training units. Reusing the L-LTF signal for RF fingerprinting provides an overhead-free fingerprinting solution.

The challenge when synthesizing data is to make it match as closely as possible to data that a radio would see from received real-world signals. For this, we synthesize our data by passing each frame through a Rayleigh multipath fading channel with a range of delay profiles and average path gains. We also add white Gaussian noise to each dataset. RF impairments are added, including phase noise, frequency offsets, and dc offsets.

Figure 4 shows the results after we classify the test frames and calculate the final accuracy off the neural network. The network is able to correctly identify 100% of the RF fingerprints.

4. Classification results using synthesized data for training, validation, and testing. (© 1984–2021 The MathWorks, Inc.)

Example Using Data Collected from a Radio

Now we repeat the workflow, but instead of applying synthesized data to train and test the system, we will use data collected from WLAN beacon frames from real routers using a software-defined radio (SDR). We use a second SDR to transmit unknown beacon frames and capture them. The deep-learning network is trained with these captured signals.

Figure 5 shows the setup that’s built with routers and radios. We used the ADALM-PLUTO SDR platform from Analog Devices.

5. This an example scenario of three trusted routers and a collection of unknown routers using hardware. (© 1984–2021 The MathWorks, Inc.)

This example uses data from four known routers. The dataset contains 3,600 frames per router, where 90% is used as training frames and 10% as test frames. Figure 6 shows the results when we train our network and test our system solely with data from a radio. Again, the network is able to identify the correct RF fingerprints in all cases.

6. Classification results using data collected from a radio for training, validation, and testing. (© 1984–2021 The MathWorks, Inc.)

We also tested the system by spoofing routers with an ADALM-PLUTO SDR by generating beacon signals with the WLAN Toolbox functions. The following is a sample output of this experiment:

As you can see, the system can identify impersonators and unknown routers.

We were able to prove out our algorithm and workflow steps using synthesized data, which helped us to move seamlessly to a hardware-based system. This same type of workflow can be applied to other communications systems as well.

To learn more about the topics covered in this blog, see the examples below or email me at [email protected]:

Design a Deep Neural Network with Simulated Data to Detect WLAN Router Impersonation (example): Learn how to design an RF fingerprinting CNN with simulated data. You train the CNN with simulated WLAN beacon frames from known and unknown routers for RF fingerprinting.
Test a Deep Neural Network with Captured Data to Detect WLAN Router Impersonation (example): Learn how to train an RF fingerprinting CNN with captured data. You capture WLAN beacon frames from real routers using an SDR.
Radar Waveform Classification Using Deep Learning (example): Learn how to classify radar waveform types of generated synthetic data using the Wigner-Ville distribution and a deep CNN.

See additional 5G, radar, and EW resources, including those referenced in previous blog posts.