Kim DohYon

Kim DohYon

Hardware Engineer & FPGA Specialist

Electronic and Electrical Engineering student specializing in FPGA development, High-Level Synthesis optimization, and AI accelerator design. Currently researching workload-aware interconnect networks for AI accelerators, with focus on addressing communication bottlenecks in large language model scale systems.

Sungkyunkwan University, Suwon, Republic of Korea

Research Objective

My research objective is to design workload-aware interconnect networks for AI accelerators. I aim to address the communication bottlenecks that occur at large language model (LLM) scale. As chips scale to accommodate massive models, the growing number of compute units and routers introduces substantial complexity to network management. Severe congestion challenges emerge that traditional methods cannot handle efficiently because they are often too rigid or computationally expensive.

Education

Sungkyunkwan University

Republic of Korea

Expected Graduation: Aug. 2026

B.S. in Electrical and Electronic Engineering

Double Major in Advanced Semiconductor Engineering

Overall GPA: 3.54/4.0

Gyeonggibuk Science High School

Republic of Korea

Mar. 2017 – Feb. 2020

A specialized high school for academically gifted students in mathematics and science.

Experience

Rebellions

Hardware Engineer Intern – Tape-out Participation

Jun. 2025 – Present

NPU Memory Subsystem & NoC Analysis (Oct. 2025 – Present)

  • Developing an automated SRAM compiler to generate parameterized memory wrappers with built-in clock domain crossing logic and error protection, improving reliability for datacenter-scale NPUs.
  • Analyzing on-chip mesh router topology to identify deadlock points; collaborating on architectural solutions for deadlock-free packet routing.

Hash Algorithm Accelerator (Jun. 2025 – Oct. 2025)

  • Led logic design in a three-person team to develop a SHA-512 accelerator using HLS and OpenCL, spanning from microarchitecture exploration to FPGA deployment.
  • Achieved 2.4 GH/s throughput and improved performance-per-watt by 14.2% over the RTX 4080 via loop unrolling and dataflow pipelining.
  • Architected a distributed system leveraging dual-FPGA clustering to scale throughput for high-volume client workloads.

Intelligent Signal Processing Laboratory

Undergraduate Researcher

Sungkyunkwan University · Advisor: Professor Wansu Lim

2024 – Present

Conducting research on hardware-software co-design, ranging from algorithm development to silicon bring-up. Gaining end-to-end perspective on computational accelerators and efficient hardware solutions.

KATUSA

Sergeant, CBRN Non-Commissioned Officer in Charge (NCOIC)

Korean Augmentation to the U.S. Army

Aug. 2022 – Feb. 2024

Served as a liaison and interpreter for U.S. Defense agencies (DTRA, DOE) and Republic of Korea military officials throughout the full term of service.

Research Experience

Scalable Distributed Acceleration

FPGA hardware setup with Xilinx ALVEO U250 cards

SHA-512 Accelerator on Xilinx Alveo U250: Served as logic designer in a three-person team, architecting and implementing a SHA-512 accelerator. Achieved 2.4 GH/s throughput and 14.2% improvement in performance per watt compared to RTX 4080 baseline.

Initially employed explicit stage-by-stage pipelining to maximize operating frequency, but identified severe routing congestion due to excessive flip-flop usage. Resolved this by minimizing pipeline stages through partitioning wide arithmetic logic and migrating critical adders from LUT fabric to DSP slices.

Deployed the design on a dual-FPGA server and validated stability through multi-day stress tests, gaining direct experience in constructing distributed systems.

Hardware-Aware Neural Architecture Search (NAS)

FPGA development board for UAV-NAS

UAV-NAS Project: Focused on hardware-algorithm co-design for efficient drone detection. Initiated this work to address the structural irregularity of existing models that hinder efficient edge deployment.

Key insight: The repetitive structure of NAS-derived cells allows for serialized execution of compute units, significantly reducing energy consumption. By deploying the resulting network on a Xilinx KV260 FPGA, achieved an 88.6% energy reduction compared to CPU inference.

📄 Publications & Preprints

"UAV-NAS: Novel UAV Identification using Neural Architecture Search on FPGAs"

Doh Yon Kim et al., IEEE Transactions on Industrial Informatics (Under Review)

  • Designed the algorithms and HLS optimizations for a neural architecture search framework.
  • Achieved 88.6% energy reduction compared to CPU implementations while maintaining real-time throughput.
FPGANeural Architecture SearchEdge ComputingEnergy Optimization

Projects

Llama2.c Acceleration on FPGAs

Dec. 2024 – Aug. 2025

Finalist, Creative Innovation Research Program. Investigated HLS optimization strategies to accelerate large language models on FPGA platforms, focusing on memory bandwidth bottlenecks.

FPGALLMHLS

High-Level Synthesis Library for Deep Learning

Dec. 2024 – Jul. 2025

Finalist, Co-Deep Learning Project. Developed a modular HLS library to streamline the deployment of deep learning models, bridging the gap between software frameworks and hardware implementation.

HLSDeep LearningLibrary

Drone Signal Classification with FPGA

Jan. 2025 – current

Selected as a finalist in the Capstone Design Fair. Implemented real-time drone signal detection and classification using FPGA and NAS (DARTS) model for edge devices.

FPGASignal ProcessingNAS

Awards & Recognition

Joint Service Achievement Medal

Awarded by UNC/CFC/USFK Commander (Four-Star General)

Mar. 2024
Kim DohYon during KATUSA service

Recognized for developing the Republic of Korea Army's chemical, biological, radiological, and nuclear (CBRN) response plan and implementing it during training exercises.

Skills & Languages

Technical Skills

Languages

VerilogSystemVerilogC/C++PythonOpenCLTcl

Tools & Frameworks

VivadoVitis HLSSynopsys VCSVerdiPyTorch

Hardware

Xilinx FPGAs (Alveo U250, Zynq)ASIC Design Flow

Languages

English

OPIc Advanced Low (Aug. 2024) - Highest proficiency level

Korean

Native

Built with v0