^
A
A
A

Voice as Analysis: Early Signals of Cancer and Benign Lesions

 
, Medical Reviewer, Editor
Last reviewed: 18.08.2025
 
Fact-checked
х

All iLive content is medically reviewed or fact checked to ensure as much factual accuracy as possible.

We have strict sourcing guidelines and only link to reputable media sites, academic research institutions and, whenever possible, medically peer reviewed studies. Note that the numbers in parentheses ([1], [2], etc.) are clickable links to these studies.

If you feel that any of our content is inaccurate, out-of-date, or otherwise questionable, please select it and press Ctrl + Enter.

12 August 2025, 08:13

Researchers from Oregon Health & Science University analyzed speech recordings from the new publicly available Bridge2AI-Voice dataset and found a simple acoustic feature that can reveal vocal fold pathology. We are talking about the harmonics-to-noise ratio (HNR) — the ratio of “musical overtones” to noise. Its level and variability distinguished the voices of people with laryngeal cancer and benign lesions from healthy ones and some other voice disorders. The effect was especially evident in cisgender men; statistical significance was not enough for women — the authors blame the small sample size and call for an expansion of the data. The work was published as a brief report in Frontiers in Digital Health.

Background of the study

  • Why look for "voice markers" at all. Hoarseness is a common complaint. The causes are varied: from colds and reflux to nodules/polyps and laryngeal cancer. Currently, the path to diagnosis is a visit to an ENT specialist and an endoscopy (a camera in the nose/throat). It is accurate, but not always quickly available and is not suitable for home self-monitoring. Pre-screening is needed: a simple way to understand who should see a doctor first.
  • What is a voice biomarker? Speech is a signal that can be easily recorded on a phone. Its “pattern” can be used to judge how the vocal folds vibrate. Lesions make the vibrations uneven: more “noise” and less “music”.
  • Why new datasets are important. Previously, such works relied on small, “homemade” samples — the models were fragile. Bridge2AI-Voice is a large, multi-center, ethically collected set of audio recordings linked to diagnoses. It was created as a “common testing ground” to finally train and test algorithms on large and heterogeneous data.
  • Where are the main difficulties?
    • The voice changes due to the microphone, room noise, cold, smoking, language, gender and age.
    • There is traditionally less female data, and the female voice is higher in frequency - metrics behave differently.
    • No “home” test can replace an examination or make a diagnosis - at most, it helps to decide: “is it necessary to urgently see an ENT specialist?”
  • Why does the clinic and patients need this? If people with high risk of nodes/tumors can be selected for a priority appointment by a short appointment, this will speed up diagnostics, reduce unnecessary referrals and provide a tool for self-monitoring between visits (after surgery, during therapy).
  • Where this should lead: To validated telemedicine applications/modules that:
    1. write a speech according to the standard (phrase + drawn-out “a-a-a”),
    2. calculate basic features (HNR, jitter, shimmer, F0),
    3. issue a recommendation to contact a specialist if the profile is alarming,
    4. maintain dynamics after treatment.

The idea is simple: “give the phone to the ear of an ENT doctor” – not for diagnosis, but so as not to miss those who need quick face-to-face help.

What exactly did they do?

  • We took the first release of the multi-center, ethically collected Bridge2AI-Voice dataset, a flagship NIH project where voice recordings are linked to clinical information (diagnoses, questionnaires, etc.).
  • Two analytical samples were formed:
    1. "laryngeal cancer / benign nodes / healthy";
    2. "cancer or benign nodules" versus spasmodic dysphonia and vocal fold paralysis (other common causes of hoarseness).
  • Basic voice features were identified from standardized phrases: fundamental tone (F0), jitter, shimmer, and HNR, and the groups were compared using nonparametric statistics. Result: the most stable differences were in HNR and F0, with HNR and its variability best separating benign lesions from both the norm and laryngeal cancer. These signals were more distinct in men.

Why is this important?

  • Early screening without a probe. Currently, the path to diagnosis often means nasoendoscopy and, if suspected, biopsy. If simple acoustic features combined with AI can prioritize those who need endoscopy, patients will get to an ENT specialist sooner and unnecessary referrals will be reduced. This is a complement, not a replacement for the doctor.
  • Big data for voice. Bridge2AI-Voice is a rare project where voice is collected using uniform protocols and linked to diagnoses; the data is available to researchers via PhysioNet / Health Data Nexus. This accelerates the development of reliable voice biomarkers instead of “miracle apps” on small samples.

What is HNR?

When we speak, the vocal folds vibrate and create overtones (harmonics). But the vibration is never perfect - there is always noise in the signal. HNR is simply how much more "music" there is in the voice than "hiss". When the folds are damaged, the vibration becomes less even - there is more noise, HNR drops, and its jumps (variability) increase. This is the pattern that the authors caught.

Important Disclaimers

  • This is a pilot, exploratory analysis: without clinical validation, with restrictions on the sample of women - so their effects were not significant. Larger and more diverse data and "roasting" of models in different clinics and in different languages are needed.
  • The voice is a "multi-valued" thing: it is affected by a cold, smoking, reflux, a microphone, noise in the room. Any "home test" should be able to take into account the context - and still serve as a filter for a referral to an ENT specialist, and not a click-through diagnosis.

What's next?

  • Expand the dataset (including for women and ages), standardize tasks and acoustics (reading a phrase, prolonged “a-a-a”, etc.), try multimodal models (voice + questionnaire symptoms/risk factors).
  • Link acoustic signs with examination results (endoscopy, stroboscopy) and dynamics after treatment - so that the HNR profile can also be used for monitoring.
  • Continue “open science”: Bridge2AI-Voice is already publishing versions of the dataset and tools - this is a chance to quickly reach real pilots in clinics.

Conclusion

It is possible to “hear” vocal fold troubles from the voice — and perhaps refer the person to the right specialist sooner. For now, it’s a pretty clue (HNR and its variability), but thanks to big open data, voice biomarkers finally have a chance to become a reliable screening tool.

Source: Jenkins P. et al. Voice as a Biomarker: Exploratory Analysis for Benign and Malignant Vocal Fold Lesions. Frontiers in Digital Health, 2025 (accepted for publication). Data — Bridge2AI-Voice (NIH/PhysioNet).

You are reporting a typo in the following text:
Simply click the "Send typo report" button to complete the report. You can also include a comment.