CUDA Technology Reduces FDTD Simulation Time

Oct. 8, 2013
The computational time of finite-difference-time-domain (FDTD) electromagnetic simulations can be significantly reduced with the use of CUDA technology.

The finite-difference-time-domain method (FDTD) is widely used for electromagnetic (EM) simulations due to its accuracy, flexibility, and simplicity. Yet the benefits provided by the FDTD technique come at the cost of increased computational time. Using NVIDIA’s Compute Unified Device Architecture (CUDA) technology, computational time can supposedly be reduced by over two orders of magnitude compared to conventional computing. In a paper from Remcom titled, “Accelerating the Finite Difference Time Domain (FDTD) Method with CUDA,” the migration from a traditional C implementation of a three-dimensional FDTD method to NVIDIA’s CUDA architecture is discussed.

The six-page paper discusses the challenges and techniques that are involved in migrating the FDTD algorithm from a traditional C implementation to a form suitable for leveraging modern graphics processor units (GPUs) through NVIDIA’s CUDA framework. With the GPU approach, thousands of threads are used simultaneously. To achieve maximum speed, special design considerations are needed. Proper understanding of CUDA can enable speed to be raised beyond two orders of magnitude over traditional central processing units (CPUs).

Although GPUs were originally used for the sole purpose of driving graphical displays, they have evolved into powerful computational devices. The Tesla C1060 GPU, for example, can yield significant performance gains over a 2.66-GHz Intel Core 2 Quad processor. The document provides additional details on the CUDA GPU, including its architecture and the several types of memory that CUDA devices offer. Functions targeted for the GPU are implemented in CUDA as kernels, which are written in a similar manner to the C programming language. The document discusses the optimizations that were implemented, which reduced memory operations to less than 14% of the original total. These optimizations, along with some others, were applied to the FDTD algorithm, which was integrated into Remcom’s XFdtd software.

A modern cellular phone design was chosen for the final test simulation, which included all major device components. Tests were performed using a single thread for the CPU baseline and all four Tesla C1060s for the GPU implementation. The test results demonstrated that the GPU implementation was consistent in achieving speeds that exceeded the CPU implementation by more than two orders of magnitude.

Remcom, Inc., 315 S. Allen St., Ste. 416, State College, PA 16801; (814) 861-1299.

About the Author

Chris DeMartino | Sales and Applications Engineer, Modelithics

Chris DeMartino began working in the RF/microwave industry in 2004, developing and testing a variety of RF/microwave components and assemblies for both commercial and military programs. In May 2015, DeMartino joined Microwaves & RF magazine, where he served as the technical editor until December 2019. In December 2019, he joined Modelithics as the company’s sales and applications engineer. Chris has a B.S. in Electrical Engineering from the State University of New York at Binghamton and an M.S. in Electrical Engineering from Polytechnic University.

Sponsored Recommendations

Wideband MMIC LNA with Bypass

June 6, 2024
Mini-Circuits’ TSY-83LN+ wideband, MMIC LNA incorporates a bypass mode feature to extend system dynamic range. This model operates from 0.4 to 8 GHz and achieves an industry leading...

Expanded Thin-Film Filter Selection

June 6, 2024
Mini-Circuits has expanded our line of thin-film filter topologies to address a wider variety of applications and requirements. Low pass and band pass architectures are available...

Mini-Circuits CEO Jin Bains Presents: The RF Engine of the 21st Century

June 6, 2024
In case you missed Jin Bains' inspiring keynote talk at the inaugural IEEE MTT-S World Microwave Congress last week, be sure to check out the session recording, now available ...

Selecting VCOs for Clock Timing Circuits A System Perspective

May 9, 2024
Clock Timing, Phase Noise and Bit Error Rate (BER) Timing is critical in digital systems, especially in electronic systems that feature high-speed data converters and high-resolution...