Expert-Level Detection of Epilepsy Markers in EEG on Short and Long Timescales

Li, Jun B. Eng
Goldenholz, Daniel M. M.D., Ph.D.
Alkofer, Moritz M.Sc.
Sun, Chenxi Ph.D.
Nascimento, Fabio A. M.D.
Halford, Jonathan J. M.D.
Dean, Brian C. Ph.D.
Galanti, Mattia M.Sc.
Struck, Aaron F. M.D.
Greenblatt, Adam S. M.D.
Lam, Alice D. M.D., Ph.D.
Herlopian, Aline M.D.
Nwankwo, Chinasa M.D.
Weber, Dan D.O.
Maus, Douglas M.D., Ph.D.
Haider, Hiba A. M.D.
Karakis, Ioannis M.D., Ph.D.
Yoo, Ji Yeoun M.D.
Ng, Marcus C. M.D.
Selioutski, Olga D.O.
Taraschenko, Olga M.D., Ph.D.
Osman, Gamaleldin M.D.
Katyal, Roohi M.B.B.S.
Schmitt, Sarah E. M.D.
Benbadis, Selim M.D.
Cash, Sydney S. M.D., Ph.D.
Tatum, William O. D.O.
Sheikh, Zubeda M.D.
Kong, Wan Yee M.D., M.Sc.
Bayas, Grace B.Sc.
Turley, Niels B.Sc.
Hong, Shenda Ph.D.
Westover, M. Brandon M.D., Ph.D.
Jing, Jin Ph.D.

¹National Institute of Health Data Science, Peking University, Beijing
²Harvard Medical School, Boston
³Neurology Department, Beth Israel Deaconess Medical Center, Boston
⁴Institute for Theoretical Physics, Technical University Berlin, Berlin
⁵Neurology Department, Washington University in St. Louis, St Louis, MO
⁶Ralph H. Johnson VA Medical Center, Charleston, SC
⁷Electrical and Computer Engineering Department, Clemson University, Clemson, SC
⁸Clemson University School of Computing, Clemson, SC
⁹Public Health Sciences Department, Medical University of South Carolina, Charleston
¹⁰University of Wisconsin–Madison, Madison
¹¹Neurology Department, Massachusetts General Hospital, Boston
¹²Yale University School of Medicine, New Haven, CT
¹³Akron Children’s Hospital, Akron, OH
¹⁴St. Louis University School of Medicine, St Louis, MO
¹⁵Neurology Department, University of Chicago, Chicago
¹⁶University of Chicago Medical Center, Chicago
¹⁷Emory University School of Medicine, Atlanta
¹⁸University of Crete School of Medicine, Heraklion, Greece
¹⁹Icahn School of Medicine at Mount Sinai, New York, NY
²⁰University of Manitoba, Winnipeg, MB, Canada
²¹Stony Brook University, Stony Brook, NY
²²University of Rochester, Rochester, NY
²³University of Nebraska Medical Center, Omaha
²⁴Mayo Clinic, Jacksonville, FL
²⁵Louisiana State University Health Shreveport, Shreveport
²⁶Neurology Department, Medical University of South Carolina, Charleston
²⁷University of South Florida, Tampa
²⁸Tampa General Hospital, Tampa, FL
²⁹Neurology Department, Virginia Commonwealth University, Richmond

Dr. Jing can be contacted at [email protected] or at Beth Israel Deaconess Medical Center, Neurology Department, 330 Brookline Ave, Boston, MA 02215.

Mr. Li, Dr. Goldenholz, and Mr. Alkofer and Drs. Hong, Westover, and Jing contributed equally to this article.

revised April 25, 2025

Accepted May 02, 2025

NEJM AI 2(7), July 2025. | DOI: 10.1056/AIoa2401221

Abstract

Background

Epileptiform discharges, or spikes, within electroencephalogram (EEG) recordings are essential for diagnosing epilepsy and localizing seizure origins. Artificial intelligence (AI) offers a promising approach to automating detection, but current models are often hindered by artifact-related false positives and often target either event- or EEG-level classification, thus limiting clinical utility.

Methods

We developed SpikeNet2, a deep-learning model based on a residual network architecture, and enhanced it with hard-negative mining to reduce false positives. Our study analyzed 17,812 EEG recordings from 13,523 patients across multiple institutions, including Massachusetts General Brigham (MGB) hospitals. Data from the Human Epilepsy Project (HEP) and SCORE-AI (SAI) were also included. A total of 32,433 event-level samples, labeled by experts, were used for training and evaluation. Performance was assessed using the area under the receiver operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), calibration error, and a modified area under the curve (mAUC) metric. The model’s generalizability was evaluated using external datasets.

Results

SpikeNet2 demonstrated strong performance in event-level spike detection, achieving an AUROC of 0.973 and an AUPRC of 0.995, with 44% of experts surpassing the model on the MGB dataset. In external validation, the model achieved an AUROC of 0.942 and an AUPRC of 0.948 on the HEP dataset. For EEG-level classification, SpikeNet2 recorded an AUROC of 0.958 and an AUPRC of 0.959 on the MGB dataset, an AUROC of 0.888 and an AUPRC of 0.823 on the HEP dataset, and an AUROC of 0.995 and an AUPRC of 0.991 on the SAI dataset, with 32% of experts outperforming the model. The false-positive rate was reduced to an average of nine spikes per hour.

Conclusions

SpikeNet2 offers expert-level accuracy in both event-level spike detection and EEG-level classification, while significantly reducing false positives. Its dual functionality and robust performance across diverse datasets make it a promising tool for clinical and telemedicine applications, particularly in resource-limited settings. (Funded by the National Institutes of Health and others.)