Validating a novel natural language processing pathway for automated quality assurance in surgical oncology

incomplete excision rates of 34 955 basal cell carcinomas

Ali, Stephen R
Dobbs, Thomas D
Jovic, Matthew
Strafford, Huw
Fonferko-Shadrach, Beata
Lacey, Arron S
Williams, Namor
Pickrell, William Owen
Hutchings, Hayley A
Whitaker, Iain S

Reconstructive Surgery and Regenerative Medicine Research Centre. Institute of Life Sciences, Swansea University Medical School, Swansea, UK
Welsh Centre for Burns and Plastic Surgery, Morriston Hospital, Swansea, UK
Reconstructive Surgery and Regenerative Medicine Research Centre. Institute of Life Sciences, Swansea University Medical School, Swansea, UK
Welsh Centre for Burns and Plastic Surgery, Morriston Hospital, Swansea, UK
Reconstructive Surgery and Regenerative Medicine Research Centre. Institute of Life Sciences, Swansea University Medical School, Swansea, UK
Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
Health Data Research UK, Swansea University Medical School, Swansea University, Swansea, UK
Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
Health Data Research UK, Swansea University Medical School, Swansea University, Swansea, UK
Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
Health Data Research UK, Swansea University Medical School, Swansea University, Swansea, UK
Department of Cellular Pathology, Morriston Hospital, Swansea, UK
Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
Department of Neurology, Morriston Hospital, Swansea, UK
Patient and Population Health and Informatics Research, Swansea University Medical School, Swansea, UK
Reconstructive Surgery and Regenerative Medicine Research Centre. Institute of Life Sciences, Swansea University Medical School, Swansea, UK
Welsh Centre for Burns and Plastic Surgery, Morriston Hospital, Swansea, UK

Correspondence to: Stephen R. Ali, Reconstructive Surgery and Regenerative Medicine Research Centre, Institute of Life Sciences, Swansea University Medical School, Swansea SA2 8PP, UK (e-mail: [email protected])

Received January 24, 2023

Accepted February 06, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

British Journal of Surgery 110(9):p 1072-1075, September 2023. | DOI: 10.1093/bjs/znad055

Introduction

Accurate and accessible outcomes following a cancer diagnosis are crucial in maintaining robust quality assurance. Multidisciplinary team (MDT) meetings aim to improve care through group consensus, national guidance, clear documentation, and communication. However, research has highlighted limitations in their outputs, especially regarding the way outcomes are databased and audited. Novel technologies, such as artificial intelligence (AI), have the potential to improve this, as cited in the Royal College of Surgeons of England ‘Future of Surgery’ commission.

Natural language processing (NLP), a form of AI, offers a novel approach to automate extraction of detailed clinical information from unstructured electronic healthcare record data, such as clinic letters, operative notes, and histopathology reports. In a recent systematic review, NLP was found to have higher sensitivity and comparable specificity in identifying postoperative complications compared to conventional administrative methods.

To date, no studies have used NLP to determine incomplete excision rates in surgical oncology. In this study, the feasibility of automatically extracting and interpreting margin status from histopathology reports using an NLP-based system was demonstrated.

Methods

A multicentre, pan-specialty, retrospective analysis of consecutive patients with histologically confirmed basal cell carcinoma (BCC) managed with surgical excision and examined using the bread loafing cross-section technique was undertaken. The study period covered a 17-year period from 2004 to 2021. Cases were identified from InterSystems TrakCare Laboratory Information Management System (InterSystems TrakCare Lab, Cambridge, Massachusetts, USA), using SNOMED RT codes for BCC. Primary, recurrent and previously excised lesions were grouped together for analysis. Diagnostic biopsies were excluded.

Free-text pathology reports were retrieved and saved in text file format. These were then processed using a previously validated and published rule-based NLP information extraction system. Comma-separated variable (CSV) text files were generated from the respective canonical subheadings of the pathology report.

A Java™ Spring Boot (VMware, Palo Alto, California, USA) web application hosted on Amazon Web Services (Amazon.com, Seattle, Washington, USA) in EC2 was then developed. Respective CSV files were imported into a relational database management system. The process for generating incomplete excision rates is shown in Fig. 1.

Fig. 1
Open multimedia modal
Schematic representation of our automated population-based quality assurance model
BCC, basal cell carcinoma.

A range of tumour, patient, and surgical factors were recorded. The British Association of Dermatologists’ (BAD) adaption of the National Comprehensive Cancer Network guidelines on the treatment of BCC (Table S1) was modified to categorize BCCs clinicopathologically into low and high risk (Fig. S1). The primary endpoints were histological margin status, risk status, and speciality of the operating surgeon. The margin status was defined as either clear (≥1 mm) or involved (0 mm).

Sample size for the internal validation cohort was determined by conducting an a priori calculation. Retrospective analysis of the baseline variables and outcomes was undertaken retrospectively by two independent and blinded expert clinicians. This clinical pathway served as the reference standard for the study. Disagreements were resolved by case discussion until a consensus was reached. The single-consensus clinician-derived outputs were then compared against NLP-derived outputs for analysis. Percentage agreement and Cohen’s kappa were used as measures of agreement between NLP-derived and clinician-derived completeness of excision, risk status, and speciality of the operating surgeon. Statistical analysis was done in R version 4.1.1 (R Core Team; R Foundation for Statistical Computing, Vienna, Austria).

Ethical committee approval was obtained from Swansea University Medical School Research Ethics Subcommittee (reference no: 2020-0025). The study was performed in accordance with the Declaration of Helsinki. Data are reported following Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD).

Results

Some 34 955 lesions in 15 657 patients were included. Baseline characteristics are shown in Table S2. The overall incomplete excision rate in this cohort was 5.5 per cent.

The incomplete excision rate stratified by risk and margin by the web application is shown in Table 1. There were 6152 histopathology reports in the validation cohort, assessing accuracy of completeness of excision, risk status, and speciality of operating surgeon. There was 0.99 agreement (95 per cent c.i. 0.98 to 0.99; Cohen’s kappa = 0.74 (95 per cent c.i. 0.68 to 0.80); P < 0.001) for completeness of excision. There was 0.99 agreement (95 per cent c.i. 0.99 to 0.99; Cohen’s kappa = 0.73 (95 per cent c.i. 0.69 to 0.77), P < 0.001) for speciality of the operating surgeon.

Table 1 Incomplete excision rates for all specialities stratified by risk (high or low) and margin (peripheral or deep)
Open multimedia modal

Using a MacBook M1 Pro with 16 GB RAM, the NLP pipeline extracted and structured 2 184 309 items of information in 22.7 min, a rate of 689.7 cases/min. A single clinician could extract data at a rate of 0.25 cases/min on the validation cohort. Extrapolating this rate to 15 657 histopathology reports, it would take a clinician 29.8 weeks (8 h/day, Monday to Friday, with a 30-min rest break every 4 h) to extract the same amount of data, representing a time saving of 208 days.

Post-hoc binary logistic regression showed that plastic surgeons were more likely to achieve clear margins than other specialties (Table 2). The incomplete excision rates were calculated by converting log odds from the model into probabilities on a 0 to 1 scale for significant values. Probabilities were then converted into incomplete excision rates (Table 2). This approach accounts for risk when comparing specialties, rather than using raw incomplete excision rates.

Table 2 *Post-hoc* binary logistic regression and probabilistic modelling
Open multimedia modal

Discussion

In this study, an automated population-based approach to quality assurance in surgical oncology was validated, using NLP to extract margin status from histopathology reports at scale in the most common human cancer. A web application was used to automate the analysis of incomplete excision rates, stratifying margin and risk before undertaking post-hoc analysis to investigate the relationship between uninvolved margin and speciality. As shown, it is possible to reach high levels of percentage agreement (> 90 per cent) when comparing the NLP-based method to blinded expert clinicians. This is also in a tumour type that, traditionally, has poor compliance with minimum data set reporting, suggesting that for other tumours, agreement would likely be similar or higher. The rate of incomplete excision (5.5 per cent) in this study is comparable to joint National Institute for Health and Care Excellence (NICE) and BAD guidance, with a target rate of ≥95 per cent for complete excision. This suggests that the output of this algorithm is valid.

Although this innovative approach allows for automated, rapid, and large-scale analysis of health data, saving significant time and resources, there remains an argument for integrating a ‘human in the loop’ at the MDT level to ensure the highest level of accuracy and patient care. The present work aligns with the vision outlined by the Topol Review, which foresees AI and other digital technologies augmenting the capabilities of healthcare professionals rather than replacing them, allowing them to focus on providing the best possible care to patients.

This work is novel, with a recent systematic review for NLP highlighting that only one other model was able to extract tumour margin status in a small test set. There are no reports of using NLP at a population level for quality assurance in surgical oncology. Furthermore, this study represents a large global series of incomplete BCC excision rates. This was achieved in an infinitesimally smaller time frame than would be possible with human extraction and processing.

The system developed here allows for the rapid and accurate assessment of a number of parameters to which clinicians, MDTs, and service providers can be assessed. This tool could be used for standard benchmarking, confidential feedback to service providers and users, and for freeing up healthcare professional time in MDTs and administrative tasks to focus on delivering high-quality patient care.

References

1. Meeting Patient's Needs: Improving the Effectiveness of Multidisciplinary Team Meetings in Cancer Services. London: Cancer Research UK (2019). https://www.cancerresearchuk.org/about-us/we-develop-policy/our-policy-on-cancer-services/improving-the-effectiveness-of-mdts-in-cancer-services (accessed 22 December 2021)
Cited Here
2. Commission on the Future of Surgery. London: Royal College of Surgeon of England (2019). https://www.cancerresearchuk.org/sites/default/files/full_report_meeting_patients_needs_improving_the_effectiveness_of_multidisciplinary_team_meetings_.pdf (accessed 22 December 2021)
Cited Here
3. Mellia JA, Basta MN, Toyoda Y, Othman S, Elfanagely O, Morris MP, et al Natural language processing in surgery: a systematic review and meta-analysis. Ann Surg2021;273:900–908
Cited Here
4. Ali SR, Strafford H, Dobbs TD, Fonferko-Shadrach B, Lacey A, Pickrell WO, et al Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing. Front Surg2022;9:870494
Cited Here
5. Nasr I, McGrath EJ, Harwood CA, Botting J, Buckley P, Budny PG, et al British Association of Dermatologists guidelines for the management of adults with basal cell carcinoma 2021. Br J Dermatol2021;185:899–920
Cited Here
6. National Institute for Health and Care Excellence. Improving outcomes for people with skin tumours including melanoma (update). The management of low-risk basal cell carcinomas in the community. https://www.nice.org.uk/guidance/csg8/evidence/2010-update-the-management-of-lowrisk-basal-cell-carcinomas-in-the-community-updatedrecommendations-and-evidence-on-this-topic-only-pdf-7022614429 (accessed 18 May 2022)
Cited Here
7. Topol E. The Topol Review. Preparing the healthcare workforce to deliver the digital future. An independent report on behalf of the Secretary of State for Health and Social Care. https://topol.hee.nhs.uk/wp-content/uploads/HEE-Topol-Review-2019.pdf (accessed 5 October 2022)
Cited Here