E´ COLE DE TECHNOLOGIE SUPE´ RIEURE UNIVERSITE´ DU QUE´ BEC
E´ COLE DE TECHNOLOGIE SUPE´ RIEURE UNIVERSITE´ DU QUE´ BEC
THE` SE PAR ARTICLES PRE´ SENTE´ E A` L’E´ COLE DE TECHNOLOGIE SUPE´ RIEURE
COMME EXIGENCE PARTIELLE
A` L’OBTENTION DU
DOCTORAT EN GE´ NIE Ph.D.
PAR
Ce´cile LE COCQ
COMMUNICATION DANS LE BRUIT:
PERCEPTION DE SA PROPRE VOIX ET REHAUSSEMENT DE LA PAROLE
MONTRE´ AL, le 11 JANVIER 2010
×c Ce´cile Le Cocq, 2010
CETTE THE` SE A E´ TE´ E´ VALUE´ E PAR UN JURY COMPOSE´ DE :
M. Fre´de´xxx Xxxxxxx, directeur de the`se
De´partement de Ge´nie Me´canique, E´ cole de technologie supe´rieure
M. Xxxxxxxxx Xxxxxxx, codirecteur
De´partement de Ge´nie E´ lectrique, E´ cole de technologie supe´rieure
M. Xxxxxx Xxxxxx, pre´sident du jury
De´partement de Ge´nie E´ lectrique, E´ cole de technologie supe´rieure
M. Xxxxxxxx Xxxxxxxxxx, charge´ de recherche, examinateur externe Laboratoire de Me´canique et d’Acoustique, CNRS France
Xxx Xxxxxxx Xxxxxxx, Ph.D., examinatrice externe
Programme d’Audiologie et d’Orthophonie, Universite´ d’Ottawa
ELLE A FAIT L’OBJET D’UNE SOUTENANCE DEVANT JURY ET PUBLIC LE 14 DE´ CEMBRE 2009
A` L’E´ COLE DE TECHNOLOGIE SUPE´ RIEURE
AVANT-PROPOS
Les travailleurs de l’industrie œuvrent dans des milieux bruyant ou` ils risquent d’encourir des pertes auditives se´ve`res. Au Que´bec environs 400 000 travailleurs (Girard et al., 2007) sont
expose´s a` de tels risques. Afin de prote´ger l’audition des travailleurs, l’OMS (Organisation
mondiale de la sante´, WHO - World Health Organization) a e´mis des recommandations pour notamment re´duire la dose de bruit a` laquelle sont soumis les travailleurs : sur une journe´e, le niveau e´quivalent de bruit pendant 8 heures ne doit pas de´passe´ 85 dB(A) (WHO, 2001, Ch 4.). Le port de protecteurs auditifs est une solution simple qui permet, en choisissant ade´quatement le type de protection auditive (bouchons d’oreilles, coquilles ou double protection), de respecter cette recommandation.
Aujourd’hui, on trouve sur le commerce un grand nombre de syste`mes de protection auditive diffe´rents. Or, le plus souvent, ces protections sont inconfortables, et de plus, ce qui est dange- reux, elles perturbent la communication entre les ouvriers ainsi que la perception des signaux d’alarme (Xxxxxx et al., 2000; Xxxxx, 1992). C’est ainsi que beaucoup de travailleurs ne les portent pas, ou tre`s peu. Les travailleurs ne sont donc pas prote´ge´s.
”Les meilleurs bouchons d’oreilles sont ceux que le travailleur de l’industrie portera” (NIOSH, 1998). Pour re´soudre ce proble`me, la compagnie SONOMAX a initie´ le de´veloppement d’un bouchon d’oreille confortable et ”intelligent” : il permettra aux travailleurs de continuer a` percevoir les signaux d’information utile, tels que la parole ou les signaux d’alarme, tout en prote´geant leurs oreilles des forts niveaux de bruit. Aujourd’hui, les dispositifs de protection des oreilles qui sont sur le marche´ ne permettent pas toujours, tout en prote´geant l’audition, de percevoir les signaux d’information utile. En effet, chez les sujets qui ont une audition “nor- male”, il a e´te´ de´montre´ (Xxxxxx et al., 2000; Xxxxx, 1992) que la reconnaissance de la parole ainsi que la perception des signaux avertisseurs ne sont en ge´ne´ral pas perturbe´es. Ne´anmoins ces sujets ne trouve pas toujours la conversation facile et confortable et retirent leurs pro- tecteurs pour communiquer. Par contre, pour les sujets qui posse`dent une perte auditive, la proble´matique est le´ge`rement diffe´rente vu qu’en ge´ne´ral, le port de protecteurs auditifs peut
IV
perturber la perception de la parole et des signaux d’information utile. Le de´veloppement de bouchons d’oreilles ”intelligents” constitueraient une grande innovation dans le domaine de la protection auditive : les sujets qui ont une audition “normale” pourraient converser confortable- ment, et ceux qui posse`dent une perte auditive pourraient percevoir les signaux d’information utile.
La compagnie SONOMAX a contacte´ l’E´ TS pour initier un partenariat de recherche. Le projet
de recherche a alors e´te´
finance´
par SONOMAX et deux organismes subventionnaires. Le
premier organisme subventionnaire a e´te´ l’IRSST qui a octroye´ une bourse doctorale a` Je´re´mie
Voix. Le deuxie`me organisme subventionnaire a e´te´ le Conseil de Recherches en Sciences
Naturelles et en Ge´nie du Canada (CRSNG) qui a octroye´ des fonds a` l’E´ TS pour un projet de Recherche et De´veloppement Coope´rative (RDC) en partenariat avec SONOMAX.
La premie`re phase du projet de recherche de SONOMAX a` l’E´ TS sur le de´veloppement de nouveaux bouchons d’oreilles confortables et ”intelligents” a de´bute´ en 2000 avec le travail de the`se de Je´re´mie Voix (Voix, 2006) et portait sur la mise au point de l’embase auriculaire, l’e´laboration d’un protocole objectif de mesure et de certification de l’atte´nuation effective procure´e par un protecteur auditif intra-auriculaire et la pre´diction de l’atte´nuation apporte´e par un bouchon fait sur mesure et muni d’un filtre.
La deuxie`me phase du projet de recherche SONOMAX, qui est l’objet de cette the`se, traite de la communication dans le bruit et vise a` e´tudier la perception de notre propre voix lors du port de protecteurs auditifs du type bouchon d’oreille et a` proposer un traitement e´lectronique pour ame´liorer la compre´hension de la parole des autres personnes.
REMERCIEMENTS
Je tiens tout d’abord a` remercier mes directeurs Messieurs Xxxxxxxxxx Xxxxxxx et Xxxxxxxxx Xxxxxxx pour leur soutien inconditionnel tout au long de ces dernie`res xxxx´es. Je les remercie pour la confiance qu’ils m’ont accorde´e dans toutes les e´tapes de ma recherche.
Merci au CRSNG et a` la compagnie Sonomax pour leur soutien financier.
Un immense merci a` Xx´re´mie Voix, ami et colle`gue qui a toujours e´te´ de tre`s bons conseils dans les moments difficiles.
Un grand merci a` toute l’e´quipe de la compagnie Sonomax : Xxxx-Xxxxxxx Xxxxxxx, Xxxxxxx X. Xxxxxx, Je´re´mie Voix, Xxxxx Xxxxxxxxx, Xxxxx Xxxxxx, Myle`ne Landry, et tous les autres . . . pour la confiance et le soutien qu’ils m’ont accorde´s.
Un remerciement particulier a` Xxxx-Xxxx Xxxx pour son expertise xxxxxxxxxxxxx et sa bonne volonte´.
Merci au personnel technique de l’ETS : Xxxxx Xxxxxxxxx, Xxxxxxx Xxxxxxxx et tous les autres, sans l’aide de qui je n’aurais pas pu re´aliser mes expe´riences.
Merci a` tous mes sujets et a` Xxxxxxxxx Xxxxxxxx qui m’a assiste´e dans mes expe´riences.
Merci a` tous mes amis Xxxxxx, Xxxxxxxx, Xxxxxxxxxx . . . et a` tous ceux qui, a` un moment donne´, ont joue´ un roˆle dans ce doctorat.
Un immense merci a` toute ma famille, mes parents, mon fre`re, mes grands-parents pour leur soutien et leur amour.
COMMUNICATION DANS LE BRUIT:
PERCEPTION DE SA PROPRE VOIX ET REHAUSSEMENT DE LA PAROLE
Ce´cile LE COCQ
RE´ SUME´
La communication dans le bruit est un proble`me de tous les jours pour les travailleurs qui œuvrent dans des environnements industriels bruyants. Un grand nombre de travailleurs se plaignent du fait que leurs protecteurs auditifs les empeˆchent de communiquer facilement avec leurs colle`gues. Ils ont alors tendance a` retirer leurs protecteurs et mettent ainsi leur audition a` risque. Ce proble`me de communication est en fait double : les protecteurs modifient a` la fois la perception de la propre voix du porteur, ainsi que la compre´hension de la parole des autres personnes. Cette double proble´matique est conside´re´e dans le cadre de cette the`se.
La modification de la perception de la propre voix du porteur des protecteurs est en partie due a` l’effet d’occlusion qui se produit lorsque le conduit auditif est occlus par un bouchon d’oreille. Cet effet d’occlusion se traduit essentiellement par une ame´lioration de la perception des sons de basses fre´quences internes a` l’eˆtre humain (bruits physiologiques), et par une mo- dification de la perception de la propre voix de la personne. Dans le but de mieux comprendre ce phe´nome`ne, suite a` une e´tude approfondie de ce qui se trouve de´ja` dans la litte´rature, une nouvelle me´thode pour quantifier l’effet d’occlusion a e´te´ de´veloppe´e. Xx xxxx x’xxxxxxx xx xxxxx xxxxxxxxxx du sujet au moyen d’un pot vibrant ou de faire parler le sujet, comme il se fait classi- quement dans la litte´rature, il a e´te´ de´cide´ d’exciter la cavite´ buccale des sujets au moyen d’une onde sonore. L’expe´rience a e´te´ conc¸ue de telle manie`re que l’onde sonore qui excite la cavite´ buccale n’excite pas l’oreille externe ou le reste du corps directement. La de´termination des seuils auditifs en oreilles ouvertes et occluses a ainsi permis de quantifier un effet d’occlusion subjectif pour une onde sonore dans le conduit buccal. Ces re´sultats ainsi que les autres quan- tifications d’effet d’occlusion pre´sente´es dans la litte´rature ont permis de mieux comprendre le phe´nome`ne de l’effet d’occlusion et d’e´valuer l’influence des diffe´rents chemins de transmis- sion entre la source sonore et l’oreille interne.
La compre´hension de la parole des autres personnes est alte´re´e a` la fois par le fort niveau
sonore pre´sent dans les environnements industriels bruyants et par l’atte´nuation du signal de parole due aux protecteurs auditifs. Une possibilite´ envisageable pour reme´dier a` ce proble`me est de de´bruiter le signal de parole puis de le transmettre sous le protecteur auditif. De nom- breuses techniques de de´bruitage existent et sont utilise´es notamment pour de´bruiter la parole en te´le´communication. Dans le cadre de cette the`se, le de´bruitage par seuillage d’ondelettes est conside´re´. Une premie`re e´tude des techniques “classiques” de de´bruitage par ondelettes est re´alise´e afin d’e´valuer leurs performances dans un environnement industriel bruyant. Ainsi les signaux de paroles teste´s sont alte´re´s par des bruits industriels selon une large de gamme
de rapports signal a` bruit. Les signaux de´bruite´s sont e´value´s au moyen de quatre crite`res.
Une importante base de donne´es est ainsi obtenue et est analyse´e au moyen d’un algorithme
VII
de se´lection conc¸ue spe´cifiquement pour cette taˆche. Cette premie`re e´tude a permis de mettre en e´vidence l’influence des diffe´rents parame`tres du de´bruitage par ondelettes sur la qualite´ de celui-ci et ainsi de de´terminer la me´thode “classique” qui permet d’obtenir les meilleures performances en terme de qualite´ de de´bruitage. Cette premie`re e´tude a e´galement permis de donner des guides pour la conception d’une nouvelle loi de seuillage adapte´e au de´bruitage de la parole par ondelettes dans un environnement industriel bruite´. Cette nouvelle loi de seuillage est pre´sente´e et e´value´e dans le cadre d’une deuxie`me e´tude. Ses performances se sont ave´re´es supe´rieures a` la me´thode “classique” mise en e´vidence dans la premie`re e´tude pour des signaux de parole dont le rapport signal a` bruit est compris entre −10 dB et 15 dB.
COMMUNICATION IN A NOISY ENVIRONMENT:
PERCEPTION OF ONE’S OWN VOICE AND SPEECH ENHANCEMENT
Ce´cile LE COCQ
ABSTRACT
Workers in noisy industrial environments are often confronted to communication problems. Lost of workers complain about not being able to communicate easily with their coworkers when they wear hearing protectors. In consequence, they tend to remove their protectors, which expose them to the risk of hearing loss. In fact this communication problem is a double one: first the hearing protectors modify one’s own voice perception; second they interfere with understanding speech from others. This double problem is examined in this thesis.
When wearing hearing protectors, the modification of one’s own voice perception is partly due to the occlusion effect which is produced when an earplug is inserted in the ear canal. This occlusion effect has two main consequences: first the physiological noises in low frequencies are better perceived, second the perception of one’s own voice is modified. In order to have a better understanding of this phenomenon, the literature results are analyzed systematically, and a new method to quantify the occlusion effect is developed. Instead of stimulating the skull with a bone vibrator or asking the subject to speak as is usually done in the literature, it has been decided to excite the buccal cavity with an acoustic wave. The experiment has been designed in such a way that the acoustic wave which excites the buccal cavity does not excite the external ear or the rest of the body directly. The measurement of the hearing threshold in open and occluded ear has been used to quantify the subjective occlusion effect for an acoustic wave in the buccal cavity. These experimental results as well as those reported in the literature have lead to a better understanding of the occlusion effect and an evaluation of the role of each internal path from the acoustic source to the internal ear.
The speech intelligibility from others is altered by both the high sound levels of noisy industrial environments and the speech signal attenuation due to hearing protectors. A possible solution to this problem is to denoise the speech signal and transmit it under the hearing protector. Lots of denoising techniques are available and are often used for denoising speech in telecommu- nication. In the framework of this thesis, denoising by wavelet thresholding is considered. A first study on “classical” wavelet denoising technics is conducted in order to evaluate their per- formance in noisy industrial environments. The tested speech signals are altered by industrial noises according to a wide range of signal to noise ratios. The speech denoised signals are eval- uated with four criteria. A large database is obtained and analyzed with a selection algorithm which has been designed for this purpose. This first study has lead to the identification of the influence from the different parameters of the wavelet denoising method on its quality and has identified the “classical” method which has given the best performances in terms of denoising quality. This first study has also generated ideas for designing a new thresholding rule suitable for speech wavelet denoising in an industrial noisy environment. In a second study, this new
IX
thresholding rule is presented and evaluated. Its performances are better than the “classical method found in the first study when the signal to noise ratio from the speech signal is between
−10 dB and 15 dB.
TABLE DES MATIE` RES
Page
INTRODUCTION 1
0.1 Contexte 1
0.1.1 Le milieu sonore industriel 1
0.1.2 Me´canisme de l’audition en oreille ouverte et occluse 6
0.1.3 Compromis entre sante´ et se´curite´ 9
0.2 Objectif 17
0.3 Proble´matique et Me´thodologie 18
0.3.1 Effet d’occlusion 18
0.3.2 De´bruitage de la parole en milieu industriel 19
0.4 Structure de la the`se. 20
CHAPITRE 1 ARTICLE #1: “SUBJECTIVE CHARACTERIZATION OF EAR- PLUGS’ OCCLUSION EFFECT USING AN EXTERNAL ACOUSTI-
CAL EXCITATION OF THE MOUTH CAVITY” ........................ | 21 | |
1.1 | Introduction ............................................................................ | 22 |
1.2 | Main physical explanations of the OE ................................................ | 24 |
1.3 | The internal sound path components involved in the perception of one’s own voice | 25 |
1.4 | Measurement method .................................................................. | 29 |
1.4.1 Subjects ...................................................................... | 29 | |
1.4.2 Experimental setup .......................................................... | 30 | |
1.4.3 Noise reduction measurement............................................... | 32 | |
1.5 | Experimental results ................................................................... | 33 |
1.5.1 Noise attenuation of earplugs ............................................... | 34 | |
1.5.2 Subjective VABC OE ........................................................ | 35 | |
1.6 | Characterization of the voice OE using experimental results from the presented | |
research work and from the literature ................................................. | 37 | |
1.6.1 Subjective VABC OE ........................................................ | 37 | |
1.6.2 Subjective BC OE ........................................................... | 39 | |
1.6.3 Objective BC OE ............................................................ | 41 | |
1.6.4 BC OE: Integration of physiological noise masking effect and synthesis | 43 | |
1.6.5 Objective voice OE .......................................................... | 45 | |
1.6.6 Voice OE: Synthesis ......................................................... | 46 | |
1.7 | Conclusions and Recommendations .................................................. | 48 |
Acknowledgements .................................................................... | 50 | |
References . ............................................................................. | 50 |
CHAPITRE 2 ARTICLE #2: “WAVELET SPEECH ENHANCEMENT FOR INDUS- TRIAL NOISE ENVIRONMENTS” 52
2.1 Introduction 53
2.2 Wavelet thresholding theory 56
2.2.1 Thresholding rules 57
2.2.2 Threshold expressions 57
2.2.3 Noise estimate expressions 60
2.3 Presentation of the studied cases 60
2.3.1 The signals 61
2.3.2 The methods 62
2.3.3 The criteria 66
2.4 Methods selection methodology 67
2.4.1 First step: Selecting the adequate wavelet type and the number of anal-
ysis levels 69
2.4.2 Second step: Selecting the denoising techniques that preserve intelligi-
bility 71
2.4.3 Third step: Selecting the denoising techniques that separately optimize
each criterion SNRglo, SNRseg and MSE 72
2.4.4 Fourth step: Selecting the denoising techniques that simultaneously maximize performance for all the three criteria SNRglo, SNRseg and
MSE 73
2.4.5 Selection algorithm 73
2.5 Experimental results 76
2.5.1 First step: Choice of the analysis wavelet type and of the number of analysis levels 76
2.5.2 Second step: Selection of the denoising techniques that preserve intel- ligibility 78
2.5.3 Third step: Selection of the denoising techniques that separately opti-
mize each criterion SNRglo, SNRseg and MSE 79
2.5.4 Fourth step: Selection of the denoising techniques that simultaneously maximize performances for all the three criteria SNRglo, SNRseg and
MSE 79
2.6 Discussion 81
2.6.1 Methods selection methodology: A positioning among similar studies 81
2.6.2 Analysis of experimental results 82
2.6.3 Observations on independent and interdependent parameters 84
2.6.4 Industrial noises versus white and pink Gaussian noises 87
2.7 Conclusions and recommendations 89
Acknowledgements 91
References 91
CHAPITRE 3 ARTICLE #3: “A WAVELET SPEECH THRESHOLDING RULE FOR DENOISING IN INDUSTRIAL ENVIRONMENTS” 93
3.1 Introduction 94
3.2 Proposed method 96
3.2.1 Thresholding rule 96
3.2.2 Risk estimator 97
3.3 Framework for the evaluation of the proposed method 98
3.3.1 Signals 99
3.3.2 Methods 100
3.3.3 Criteria 100
3.4 Experimental results 102
3.5 Conclusions 104
3.A Derivation of the risk estimator for the proposed thresholding rule 105
3.B Derivation of the bias of the risk estimator for the proposed thresholding rule 107
Acknowledgements 109
References 109
CONCLUSION 111
BIBLIOGRAPHIE 120
Page Tableau 0.1 Force de la voix associe´e aux diffe´rents niveaux de parole en dB (A)
mesure´s a` 1 m devant le locuteur 4
Tableau 0.2 Quelques le´gislation sur la protection auditive (Xxxxxxx et al., 2001; Su-
ter, 2000; Xxxxxxx, 2009) 10
Tableau 0.3 Niveaux de protection tels que de´finis par les recommandations EN458 (1993, 1996), et CSA (2002) 12
Tableau 1.1 Main results about path identification and OE for the VBC, the VABC and the voice conduction path 46
Tableau 2.1 Signals, methods and criteria 62
Tableau 2.2 Denoising techniques 65
Tableau 2.3 Performance results for the best method for each noise 80
Tableau 3.1 Advantages and inconveniences of some of the major thresholding rules reported in the literature 96
Tableau 3.2 Signals, denoising parameters and criteria 98
Tableau 3.3 Noises 99
Tableau 3.4 Denoising techniques 101
Tableau 3.5 Performance results for the best method for each noise 103
Page
Figure 0.1 Spectres de deux bruits industriels enregistre´s dans une usine de voitures, issus de la base de donne´es Noisex
(NOISEX , 1990). 2
Figure 0.2 Spectres moyens de la parole a` long terme pour diffe´rentes forces de la voix d’un homme mesure´s a` 1 m devant le sujet 5
Figure 0.3 Relation entre l’intelligibilite´ de la parole, la force de la voix et le rapport signal a` bruit (bruit blanc de 70 dB (SPL)) 5
Figure 0.4 Repre´sentation des diffe´rents chemins de transmission du son a` l’oreille interne 7
Figure 0.5 Seuil d’audition binaural minimum (MAF, minimal auditory field) d’apre`x Xxxxxxxx and Xxxxxx (1956), seuil d’audition monaural minimum (MAP, minimal auditory pressure) d’apre`x Xxxxxx and Xxxx (1952), seuil d’in- confort d’apre`s Wegel (1932) et seuil de douleur d’apre`s Be´ke´sy (1960b). 10
Figure 0.6 Atte´nuation obtenue pour diffe´rents protecteurs auditifs choisis dans le Compendium du NIOSH (2009) de manie`re a` pre´senter l’ensemble des possibilite´s 11
Figure 0.7 Allure du masquage produit par diffe´rents tons purs et pour diffe´rents ni- veaux du signal masquant 13
Figure 0.8 Diagramme re´capitulatif des compromis entre sante´ et se´curite´ 17
Figure 1.1 Diagram of the internal sound path components involved in the perception
of one’s own voice 26
Figure 1.2 Flowchart of each experimental phases 30
Figure 1.3 Setup of the acoustic box: (1) parallelepipedic box in 1 inch plywood, (2) sound barrier, (3) at least 4 inches of sound absorber, (4) truncated pyra- mid with rectangular base and square node in 1 inch plywood, (5) PVC pipe of 1,75 inches diameter, (6) speaker, (7) fixed half spirometry filter,
(8) spirometry filter adapter, (9) individual spirometry filter (bacterial / viral), (10) subject, (11) adjustable seat. (color online) 31
Figure 1.4 Sonomax v3 S2/M2 custom earplug with an inserted dual microphone probe (color online) 32
Figure 1.5 Average MADFop (−+), average MADFoc (− ), average MAMPop (− ·
+), and average MAMPoc (− · ) 34
Figure 1.6 REAT experimental results [average (−×), std (− )] and REAT ANSI results (Industrial Noise Laboratory, 2007) [average ± 2 std (−− +), std
(−− ♦)] (color online) 35
Figure 1.7 Delta between NRcb in diffused field and acoustic box experiments. 36
− −× − −
Figure 1.8 Subjective VABC OE (MAMPoc MAMPop) [average ( ), std (
)] . ........................................................................... 36
−−
−×
−·
···
···
···
···
···
Figure 1.9 Subjective BC OEs of the literature [this study ( black); Xxxxxxxxx et al. (2007): ( gray) occlusion with 18 mm insertion of E-A-R Clas- sics and bone excitation at the forehead; Xxxxxx and Xxxxxxx (1983) bone excitation at the forehead: ( + black) E-A-R plug with 0.2 cm3 oc- cluded volume, ( + dark gray) E-A-R plug with 0.5 cm3 occluded vol- ume, ( + medium gray) V-51R plug with 0.6 cm3 occluded volume, ( + light gray) E-A-R plug with 0.8 cm3 occluded volume; Xxxxxxxx and Xxxxxxxxx (2007) bone excitation at the forehead: ( black) occlusion with 7 mm insertion of foam earplug, ( black) occlusion with 15 mm
insertion of foam earplug] 40
−
−−
−×
−·
···
Figure 1.10 Objective BC OEs of the literature [this study ( black); Xxxxxxxxx et al. (2007): ( gray) occlusion with 18 mm insertion of E-A-R Classics and bone excitation at the forehead; Xxxxxxxx and Xxxxxxxxx (2007) bone excitation at the forehead: ( black) occlusion with 7 mm insertion of foam earplug, ( black) occlusion with 15 mm insertion of foam earplug; Xxxxx (1986), Xxxxxx (1986): ( dark gray) occlusion with full ear- mould impression and bone excitation at the contra-lateral mastoid]. 42
− ·
−− Δ
−×
mould]. ....................................................................... | 45 | |
Figure 2.1 | Thresholding rules. .......................................................... | 64 |
Figure 2.2 | Main flowchart of the general methodology used for methods selection. . | 68 |
Figure 1.11 Objective voice OEs of the literature [this study ( black); Xxxxx (1986), Xxxxxx (1986): ( dark gray) occlusion with earmould im- pression and bone excitation at the contra-lateral mastoid; Xxxxxx (1996) according to Xxxxxx (1997, 1998): ( dark gray) occlusion with full xxxxxx xxxxxxxx; May (1992) according to Xxxxxx (2000) and Xxxxxx (1997, 1998): (··· ♦ dark gray) occlusion with unvented skeleton ear-
Figure 2.3 Flowchart of first step: Selecting the adequate wavelet type and the num-
ber of analysis levels 70
Figure 2.4 Flowchart of second step: Selecting the denoising techniques that preserve intelligibility 71
Figure 2.5 Flowchart of third step: Selecting the denoising techniques that separately optimize each criterion SNRglo, SNRseg and MSE 72
Figure 2.6 Flowchart of the selection algorithm 75
Figure 2.7 First step experimental results: Choice of the analysis wavelet type. (a) Gain in SNRglo. (b) Gain in SNRseg. (c) MSE. (d) IS 76
Figure 2.8 First step experimental results: Choice of the number of analysis levels.
(a) Gain in SNRglo. (b) Gain in SNRseg. (c) MSE. (d) IS 77
Figure 2.9 Second step experimental results: Selection of denoising techniques that preserve intelligibility 78
Figure 2.10 Third step experimental results: Selection of denoising techniques that separately optimize each criterion SNRglo, SNRseg and MSE 79
Figure 2.11 Correspondence between the level of decomposition and the central fre- quency for each tested wavelet type. 83
Figure 2.12 Third step experimental results: Selection of denoising techniques that separately optimize each criterion SNRglo, SNRseg and MSE for white Gaussian noise 88
Figure 2.13 Third step experimental results: Selection of denoising techniques that separately optimize each criterion SNRglo, SNRseg and MSE for pink Gaussian noise 89
Figure 3.1 Thresholding rules 97
Figure 3.2 Selection of denoising techniques that preserve intelligibility 102
Figure 3.3 Selection of denoising techniques that separately optimize each criterion
SNRglo, SNRseg and MSE 103
Figure 3.4 Denoising performances, in term of the gain in segmental SNR, for the μ- law and the CLC thresholding rule, according to different signal to noise ratios 104
Figure 4.1 Sche´ma re´capitulatif pour l’article #1 112
Figure 4.2 Sche´ma re´capitulatif pour l’article #2 114
Figure 4.3 Sche´ma re´capitulatif pour l’article #3 117
Figure 4.4 Sche´ma re´capitulatif pour la the`se 118
INTRODUCTION
La pre´sente the`se est une the`se par articles qui traite de la communication dans le bruit. Plus pre´cise´ment sont e´tudie´es la perception de notre propre voix lors du port de protecteurs auditifs de type bouchons d’oreilles et la compre´hension des signaux de parole des autres personnes.
Dans cette introduction, une premie`re partie pre´sente le contexte ge´ne´ral de la recherche, une deuxie`me partie les objectifs de la the`se, une troisie`me partie la proble´matique et la me´thodologie, une quatrie`me et dernie`re partie la structure de la the`se.
0.1 Contexte
La communication dans le bruit est un proble`me auquel est confronte´ tout travailleur qui œuvre dans un environnement industriel bruyant (Xxxxxx et al., 2000; Xxxxx, 1992). Tout d’abord, le milieu sonore industriel et les diffe´rents e´le´ments qui le constituent sont pre´sente´s. Puis, les me´canismes de l’audition en oreille ouverte et occluse sont de´crits. Finalement, les compromis entre sante´ et se´curite´ auxquels tout travail en environnement industriel bruyant est confronte´ sont explicite´s.
0.1.1 Le milieu sonore industriel
Le milieu sonore industriel est habituellement constitue´ de trois types principaux de signaux sonores que sont les bruits industriels, les signaux d’alarme et la parole (Xxxxxx et al., 2000; Xxxxx, 1992). Ils seront pre´sente´s ici successivement.
0.1.1.1 Les bruits industriels
A` ce jour, il n’existe pas de description pre´cise d’un bruit industriel. De manie`re ge´ne´rale, il
s’agit des bruits qui sont ge´ne´re´s par les diffe´rentes machines et outils qui sont utilise´s dans le milieu industriel. En traitement du signal, il est courant de classer les signaux selon leur station- narite´ (Flandrin, 1998). Deux cate´gories de bruits industriels peuvent donc eˆtre conside´re´es :
1. Les bruits non-stationnaires : ce sont par de´finition les bruits dont le contenu spectral varie au cours du temps ; par exemple les coups de marteaux donne´es sur une plaque de me´tal ge´ne`rent un bruit d’impact qui est non-stationnaire ;
2. Les bruits stationnaires : ce sont par de´finition les bruits dont le contenu spectral reste in- change´ au cours du temps ; par exemple les transformateurs e´lectriques ge´ne`rent un bruit stationnaire.
Spectrogramme de bruits industriels
80
75
70
65
60
55
50
45
Noisex 21 − 83 dB(A)
Noisex 22 − 74 dB(A)
40
125
250
500
Frequences − Hz
1000 2000
4000
8000
dB (SPL)
La figure 0.1 pre´sente les spectres moyens en tiers d’octaves de deux bruits industriels station- naires enregistre´s dans une usine de voitures issus de la base de donne´es Noisex (NOISEX , 1990).
Figure 0.1 Spectres de deux bruits industriels enregistre´s dans une usine de voitures, issus de la base de donne´es Noisex
(NOISEX , 1990).
Il est difficile de pre´dire le niveau sonore ge´ne´re´ par une machine a` l’oreille d’un travailleur. En effet pour ce faire il faudrait a` la fois tenir compte de l’acoustique de la salle, de la locali- sation du travailleur et de la machine dans la salle, et du bruit ge´ne´re´ par la machine. Tous les travailleurs d’une meˆme entreprise ne seront donc pas soumis au meˆme niveau sonore. Parmi
la population a` risque du canada (environs 2,200,000 travailleurs), d’apre`s une e´tude re´alise´e par Voix et al. (2002) dans laquelle il avait regroupe´ des donne´es statistiques issus de plusieurs sources et pour diffe´rents secteurs (construction, alimentation, imprimerie, textile, transports, et autres), 27.2% d’entre eux sont soumis a un niveau d’exposition supe´rieur a` 100 dB (A).
0.1.1.2 Les signaux d’alarme
Dans le milieu industriel, de nombreux signaux d’alarme sont pre´sents. Ils se distinguent selon leur roˆle (Tran Quoc and He´tu, 1996) :
1. Un avertisseur de danger : pour pre´venir les travailleurs d’un danger imminent. Il s’agit, par exemple, d’une alarme d’incendie ou d’un avertisseur de recul d’un ve´hicule.
2. Un signal indicatif impliquant une action : ces signaux permettent de pre´venir le travailleur d’une action a` re´aliser. Il s’agit, par exemple, de la sonnerie d’un te´le´phone ou du signal sonore de´livre´ par une machine pour indiquer un mauvais fonctionnement.
Les signaux d’alarme se distinguent e´galement selon leur structure temporelle et fre´quentielle :
1. Un signal continu : il peut se caracte´riser par un signal contenant plusieurs harmoniques qui seraient moduler en fre´quences en fonction du temps. Certaines alarmes d’incendie ont cette caracte´ristique.
2. Un signal discontinu : il est constitue´ d’un court signal ou d’une se´rie de courts signaux qui est re´pe´te´e pe´riodiquement en respectant un temps de pause entre chaque ite´ration. L’aver- tisseur de recul d’un ve´hicule est ainsi de´fini.
Aujourd’hui, Il existe peu de normes de conception et d’imple´mentation de signaux d’alarme dans un milieu industriel. Selon l’entreprise conside´re´e, un meˆme signal d’alarme n’aura pas force´ment la meˆme signification. Par ailleurs, le nombre de signaux d’alarme diffe´rents pre´sents dans un meˆme milieu industriel est de plus en plus important. Or, en moyenne, un travailleur ne peut reconnaˆıtre facilement et rapidement que 7 signaux d’alarme diffe´rents (Tran Quoc and He´tu, 1996; He´tu, 1994).
0.1.1.3 La parole
Plusieurs niveaux de parole ont e´te´ de´finis (Xxxxxxx, 1979) : le murmure, la voix normale, la voix e´leve´e, la voix tre`s forte, le cri et le niveau maximum de la voix. Ces niveaux ont e´te´ mesure´s a` 1 m devant le locuteur et sont pre´sente´s dans le tableau 0.1.
Tableau 0.1 Force de la voix associe´e aux diffe´rents niveaux de parole en dB (A) mesure´s a` 1 m devant le locuteur
Tire´ de Xxxxxxx (1979)
Force de la voix | Niveau global - dB (A) |
Maximum Cri Tre`s forte E´ leve´e Normale De´tendue Murmure | 88 82 74 65 57 50 40 |
La figure 0.2 pre´sente les spectres moyens de la parole a` long terme pour diffe´rentes forces de la voix d’un homme mesure´s a` 1m devant le sujet. Il est a` remarquer que l’allure du contenu spectral de la parole varie en fonction de la force de la voix employe´e par le locuteur (He´tu, 1994; Xxxxxxx, 1979). La figure 0.3 nous donne la relation entre l’intelligibilite´ de la parole, la force de la voix et le rapport signal a` bruit (Xxxxxxx, 1956) : quelque soit le rapport signal a` bruit conside´re´, a` tre`s faible niveau, la voix est difficilement intelligible ; et quand le locuteur commence a` forcer sa voix, l’intelligibilite´ de sa parole diminue.
Figure 0.2 Spectres moyens de la parole a` long terme pour
xxxxx´rentes forces de la voix d’un homme mesure´s a` 1 m devant le sujet.
Tire´ de Xxxxxxx (1979)
Figure 0.3 Relation entre l’intelligibilite´ de la parole, la force de la voix et le rapport signal a` bruit (bruit blanc de 70 dB (SPL)).
Tire´ de Xxxxxxx (1956)
0.1.2 Me´canisme de l’audition en oreille ouverte et occluse
Les me´canismes de l’audition en oreille ouverte et occluse sont ici brie`vement pre´sente´s. Ils permettront par la suite de mettre en e´vidence des explications possibles de la diminution des capacite´s a` percevoir, reconnaˆıtre et comprendre les signaux d’information utile, du type si- gnaux d’alarme ou parole, lors du port de protecteurs auditifs (voir article #1). Sur la figure 0.4, sont trace´s les diffe´rents chemins de transmission du son a` l’oreille interne (transmission externe et interne) dans le cas d’une oreille ouverte ou occluse, sur le sche´ma anatomique du syste`me auditif. Un signal qui arrive au re´cepteur auditif est la combinaison des signaux issus des diffe´rents chemins de transmission entre la source sonore et le re´cepteur auditif (Howell, 1985; Be´ke´sy, 1949, 1960a).
Dans le cas de la transmission externe en oreille ouverte, le parcourt le plus “classique” est celui appele´ commune´ment conduction “ae´rienne” : le son se propage dans le pavillon puis dans le conduit auditif externe pour eˆtre transmis a` la cochle´e par l’oreille moyenne. L’autre type de conduction est la conduction “osseuse” : un signal sonore est la plupart du temps au de´part ae´rien, mais il peut a` tout niveau du syste`me auditif, mettre en vibration les os et se propager ainsi jusqu’a` la cochle´e.
Dans le cas de la transmission interne en oreille ouverte, transmission qui concerne tous les sons d’origine interne au corps humain (par exemple les bruits physiologiques, la voix), deux types de conduction sont mises en e´vidence (Be´ke´sy, 1960a; Tonndorf, 1972) : premie`rement une conduction directe a` la cochle´e, et, deuxie`mement, une conduction indirecte qui prend en compte le rayonnement dans le conduit auditif externe.
Dans le cas de la transmission externe en oreille occluse, c’est-a`-dire quand un protecteur auditif de type bouchon d’oreille est introduit dans le conduit auditif externe, le chemin de transmission du son par la conduction “ae´rienne” est modifie´ pour la transmission externe et est remplace´ par un ensemble de trois chemins de transmission (Berger, 1986) :
Oreille ouverte | Oreille occluse | |
Transmission externe | ||
Transmission interne |
Figure 0.4 Repre´sentation des diffe´rents chemins de transmission du son a` l’oreille interne.
1. Un chemin de conduction par voie ae´rienne qui subsiste mais est re´duit aux fuites d’air au niveau de l’e´tanche´ite´ du protecteur auditif : l’onde acoustique se propage par voie ae´rienne entre le protecteur auditif et la paroi du conduit auditif, avant de se propager dans le conduit auditif externe restant ;
2. Un chemin de conduction au travers du protecteur auditif par voie solide : l’onde acoustique se propage par voie solide dans le mate´riau du protecteur auditif avant de se propager par voie ae´rienne dans le conduit auditif externe restant ;
3. Un chemin de conduction par vibration de l’ensemble de la structure du protecteur auditif : l’onde acoustique met en vibration l’ensemble du protecteur auditif par action sur sa paroi externe. La paroi interne est soumise au meˆme mouvement vibratoire et elle va rayonner dans le conduit auditif externe restant.
Ces deux derniers chemins ont e´te´ distingue´s par Xxxxxx (1986), mais il s’agit en fait, d’un point de vue acoustique, du meˆme type de transmission par voie solide par l’interme´diaire du bouchon. De plus, lors de cette transmission par voie solide par le bouchon, ce dernier va non seulement re´e´mettre dans la partie ae´rienne du conduit auditif, mais aussi dans la peau et les structures sous-jacentes telles que cartilage et os auxquelles il est couple´. La conduction ”osseuse” s’en trouvera donc modifie´e.
Dans le cas de la transmission interne, le signal qui rayonne dans le conduit auditif externe se dissipe en partie a` l’exte´rieur en oreille ouverte. En oreille occluse, cette e´nergie est “pie´ge´e” a` l’inte´rieur et est perc¸ue par la personne sous la forme de l’effet d’occlusion (Xxxxxxxx et al., 2003). L’effet d’occlusion se traduit en basses fre´quences par une ame´lioration de la percep- tion auditive et une augmentation du niveau sonore dans le conduit auditif (Stenfelt et al., 2003). Quand une personne parle, elle perc¸oit sa voix, a` la fois par la propagation de sa voix a` l’exte´rieur de la teˆte (transmission externe), a` la fois par la voie interne (transmission interne). Lorsqu’une personne porte des protecteurs auditifs, elle ne perc¸oit pas sa voix de la meˆme manie`re qu’en oreilles ouvertes pour deux raisons : premie`rement, la propagation de sa voix par conduction “ae´rienne” se trouve atte´nue´e par le protecteur auditif. Deuxie`mement, l’effet
d’occlusion (Xxxxxxxx et al., 2003), mentionne´ pre´ce´demment, va augmenter la perception des basses fre´quences de sa propre voix.
0.1.3 Compromis entre sante´ et se´curite´
La partie 0.1.1 a permis d’exposer brie`vement la complexite´ du contexte sonore en milieu
industriel. Trois types de signaux que sont les bruits industriels, les signaux d’alarme et la parole sont pre´sents. En raison notamment de leur niveau sonore et de leur pre´sence souvent continue, les bruits industriels sont dominants. Le niveau sonore global en milieu industriel est donc e´gal au niveau sonore duˆ au bruit industriel seul. Ces forts niveaux sonores peuvent entraˆıner des pertes auditives chez les travailleurs qui sont pre´sents dans ce contexte sonore pendant de longues pe´riodes. La ne´cessite´ de prote´ger leur audition pour des raisons de sante´ sera pre´sente´e dans une premie`re partie. Bien que ce soit les bruits industriels qui dominent les milieux sonores industriels, ce n’est pas leur perception qui est la plus importante pour la se´curite´ des travailleurs, mais la perception des signaux d’alarme et l’intelligibilite´ de la parole. Ces deux proble´matiques seront pre´sente´es dans la deuxie`me partie.
0.1.3.1 Protection de l’audition
L’audition est une caracte´ristique sensible de l’eˆtre humain qui a ses limites a` la fois du coˆte´ des tre`s faibles niveaux sonores que des tre`s forts niveaux sonores. Sur la figure 0.5 est repre´sente´ les seuils d’audition minimum en champ diffus (MAF, minimal auditory field) et sous e´couteurs (MAP, minimal auditory pressure) ainsi que les seuils d’inconfort et de douleur en niveau de pression exprime´ en environnement industriel bruyantSPL).
Afin de prote´ger l’audition des travailleurs, des lois ont e´te´ propose´es pour limiter la dure´e et le niveau d’exposition au bruit. Au dela` de ces limites, ils doivent prote´ger leur audition au moyen de protecteurs auditifs. Il n’existe pas de le´gislation internationale sur la protection auditive, et chaque pays choisit ses propres re`glements. Le table 0.2 pre´sente les re`glements de quelques pays.
Figure 0.5 Seuil d’audition binaural minimum (MAF, minimal auditory field) d’apre`x Xxxxxxxx and Xxxxxx (1956), seuil d’audition monaural minimum (MAP, minimal auditory pressure) d’apre`x Xxxxxx and Xxxx (1952), seuil d’inconfort d’apre`s Wegel (1932) et seuil de douleur d’apre`s Be´ke´sy (1960b).
Tire´ de Xxxxxxx (2004)
Tableau 0.2 Quelques le´gislation sur la protection auditive (Xxxxxxx et al., 2001; Xxxxx, 2000; Xxxxxxx, 2009)
Origine de la le´gislation | Niveau moyen d’exposition pendant 8h en environnement industriel bruyantA) |
OMS (Xxxxxxx et al., 2001) | 85 |
Canada (Fe´de´ral) | 87 |
Canada (Ontario, Que´bec, Nouveau-Brunswick) | 90 |
USA | 90 |
France | 85 |
Selon l’Organisation Mondiale de la Sante´ (OMS), un niveau moyen d’exposition pendant
24 heures de 70 environnement industriel bruyantA) n’entraˆıne pas de perte auditive pour la majeure partie de la population (WHO, 1999), ce qui e´quivaut a` un niveau de 75 dB (A) pen- dant 8 heures, si les 16 heures restantes sont a` un niveau ne´gligeable. L’OMS recommande qu’au dela` d’un niveau moyen d’exposition de 85 environnement industriel bruyantA) pendant 8 heures (Xxxxxxx et al., 2001), les travailleurs prote`gent leur audition au moyen de protecteurs auditifs. Mais se prote´ger l’audition n’est pas tout, encore faut-il bien se prote´ger. Tous les protecteurs auditifs (bouchons d’oreille ou coquilles) n’apportent pas la meˆme protection. Sur la figure 0.6 sont repre´sente´es les atte´nuations obtenues pour 4 diffe´rents protecteurs auditifs tire´s du Compendium du NIOSH (2009) : un protecteur qui posse`de une faible atte´nuation, un protecteur qui posse`de une forte atte´nuation, et deux protecteurs qui posse`dent une atte´nuation interme´diaire et qui dont l’allure de l’atte´nuation en fonction des fre´quences diffe`re.
Figure 0.6 Atte´nuation obtenue pour diffe´rents protecteurs auditifs choisis dans le Compendium du NIOSH (2009) de manie`re a` pre´senter l’ensemble des possibilite´s.
Certains protecteurs auditifs posse`dent une atte´nuation (lors de mesures en laboratoire) qui peut atteindre 40 dB, alors que d’autres se limitent a` 10 dB d’atte´nuation. Si la protection choisie n’est pas suffisante, le travailleur risque de subir des pertes auditives. Par contre a` l’inverse, si la protection choisie est trop importante, le travailleur n’a en ge´ne´ral rien a` craindre pour son audition, mais peut se retrouver isole´ du milieu sonore qui l’entoure. Il risque alors de ne plus entendre tous les signaux d’information utile, que ce soit le bruit d’une machine qui s’emballe, un signal de parole ou un signal d’alarme ; il met en danger sa se´curite´. Le tableau 0.3 pre´sente,
d’apre`s les recommandations EN458 (1993, 1996), et CSA (2002), le niveau de pression qui doit eˆtre pre´sent sous le protecteur pour avoir une protection ade´quate.
Tableau 0.3 Niveaux de protection tels que de´finis par les recommandations EN458 (1993, 1996), et CSA (2002)
Niveau de pression re´siduel sous le protecteur environnement industriel bruyantA) | Niveau de protection |
85 + | Insuffisant |
80 - 85 | Acceptable |
75 - 80 | Optimal ou Ide´al |
70 - 75 | Acceptable |
Infe´rieur a` 70 | Surprotection |
0.1.3.2 Perception des signaux d’information utile
Un signal d’information utile sera toujours mieux perc¸u dans le silence que dans le bruit. L’in- fluence du bruit ainsi que du port de protecteurs auditifs sur la perception des signaux d’in- formation utile (signaux d’alarme et/ou parole) est pre´sente´e ici. Dans cette section, la notion de percevoir, reconnaˆıtre et comprendre les signaux d’information utile est utilise´e. Ces trois termes indiquent des niveaux graduels d’interpre´tation des signaux. Percevoir correspond a` entendre quelque-chose sans savoir de quoi il s’agit. Reconnaˆıtre consiste a` identifier un son entendu comme e´tant un signal de parole d’un homme par exemple, mais sans comprendre ce qui est dit. Finalement, comprendre implique a` la fois que le sujet a tout d’abord perc¸u et reconnu le signal sonore, mais que, en plus, il a pu assimiler toute la signification du message.
Saturation du syste`me auditif
Il est a` noter que le syste`me auditif humain, comme tout syste`me d’acquisition de signal sonore, peut saturer : si le niveau sonore global est trop e´leve´, la cochle´e ne sera plus capable de traiter correctement les informations rec¸ues, les capacite´s a` percevoir et surtout a` comprendre et reconnaˆıtre les signaux d’information utile seront amoindries (Xxxxxxxx and Casali, 2000).
Le port de protecteurs auditifs permet, en ge´ne´ral, de diminuer le niveau sonore global dans l’oreille et supprime ainsi la saturation de la cochle´e. Si l’atte´nuation des protecteurs auditifs e´tait constante pour toutes les bandes d’octave, ceux-ci permettraient d’ame´liorer les capacite´s a` percevoir, reconnaˆıtre et comprendre les signaux d’information utile. Comme nous le voyons sur la figure 0.6, l’atte´nuation des protecteurs auditifs n’est pas constante pour l’ensemble des fre´quences, mais a` tendance, au contraire, a` atte´nuer plus les hautes fre´quences que les basses
fre´quences. Ceci pourrait diminuer les capacite´s a` percevoir, reconnaıtreˆ et comprendre les
signaux d’information utile, notamment chez les personnes qui affichent des pertes auditives (Xxxxxxxx and Casali, 2000). En effet les hautes fre´quences sont tre`s importantes du point de vue de l’intelligibilite´ de la parole.
Masquage des signaux d’information utile
Quand deux signaux sonores sont pre´sents simultane´ment dans un environnement, ils auront tendance a` se masquer. Le signal d’information utile est appele´ le signal masque´, le bruit est appele´ le signal masquant. La figure 0.7 pre´sente l’allure du masquage produit par diffe´rents tons purs et pour diffe´rents niveau du signal masquant.
Figure 0.7 Allure du masquage produit par diffe´rents tons purs et pour diffe´rents niveaux du signal masquant.
Tire´ de Xxxxxxx (2004)
Le seuil de perception du signal masque´ augmente line´airement en fonction du niveau sonore du signal masquant lorsque les deux signaux sont centre´s sur la meˆme fre´quence (Xxxxxxx, 2004; Xxxxxxxx and Xxxxxx, 2000). Si on conside`re un bruit a` bande e´troite comme signal mas- quant, la plage fre´quentielle de masquage sera d’autant plus importante que le niveau sonore du signal masquant sera e´leve´ (Xxxxxxx, 2004; Xxxxxxxx and Xxxxxx, 2000). De plus, il est a` remar- quer que la plage fre´quentielle de masquage n’est syme´trique que pour un faible niveau sonore du signal masquant et devient tre`s rapidement asyme´trique : pour les fre´quences infe´rieures, la plage fre´quentielle est tre`s e´troite et chute tre`s rapidement ; pour les fre´quences supe´rieures, la plage de masquage est d’autant plus large que le niveau sonore du bruit masquant est e´leve´ (Xxxxxxx, 2004; Xxxxxxxx and Xxxxxx, 2000). Ainsi un bruit a` bande e´troite centre´e sur 200Hz ge´ne´rera a` fort niveau (de l’ordre de 80 a` 100 environnement industriel bruyantA)) la meˆme plage fre´quenctielle de masquage qu’un bruit large bande de niveau e´quivalent en e´nergie (Gel- fand, 2004; Xxxxxxxx and Xxxxxx, 2000).
Les bruits industriels posse`dent en ge´ne´ral une forte e´nergie en basses fre´quences. Sur la fi-
gure 0.1, il est a` remarquer, pour les deux bruits industriels pre´sente´s, que la partie la plus
e´nerge´tique du spectre se situe en dessous de 1000Hz, avec un pic d’e´nergie dans les bandes de tiers d’octave centre´s sur 250 et 200Hz. L’effet de masquage provoque´ par ces bruits couvrira donc presque la totalite´ du spectre audible et alte´rera les capacite´s a` percevoir, reconnaˆıtre et comprendre les signaux d’information utile. Le port de protecteurs auditifs permet de diminuer cet effet de masquage, vu que le niveau du signal masquant sera diminue´. Toutefois, les pro- tecteurs auditifs ont en ge´ne´ral une atte´nuation non constante qui est plus importante pour les hautes fre´quences (voir figure 0.6). La diminution de l’effet de masquage apporte´ par le port de protecteurs auditifs est alors moins be´ne´fique pour le travailleur, surtout si celui-ci est atteint d’une perte auditive (Tran Quoc and He´tu, 1996; Xxxxx, 1992).
Perception des signaux d’alarme
Comme il a e´te´ pre´sente´ pre´ce´demment, la saturation de la cochle´e due a` un fort niveau sonore
et l’effet de masquage duˆ au bruit industriel e´leve´ entraıˆnent une diminution de la capacite´
a` percevoir, reconnaˆıtre et comprendre les signaux d’alarme. Le port de protecteurs auditifs n’ame´liorera pas force´ment ces capacite´s, surtout chez des travailleurs qui sont atteints d’une perte auditive (Tran Quoc and He´tu, 1996; Xxxxx, 1992). Par ailleurs, la conception des signaux d’alarme et de leur niveau sonore est laisse´e au hasard et n’est pas re´alise´e en fonction du contexte sonore la plupart du temps (Tran Quoc and He´tu, 1996; He´tu, 1994). Un niveau so- nore ade´quat pour un signal d’alarme doit eˆtre compris entre +10 dB et +25 dB au dessus du seuil de masquage pour un homme ayant une audition normale (He´tu, 1994) ou entre +13 dB et +25 dB au dessus du bruit avec un niveau maximal absolu de 105 environnement industriel bruyantSPL) (Tran Quoc and He´tu, 1996). Une e´tude des signaux d’alarme d’une acie´rie a per- mis de mettre en e´vidence qu’uniquement 50% des signaux d’alarme avaient un niveau sonore ade´quat ; 15% e´taient trop faible pour eˆtre perc¸us, 25% e´taient trop fort et pouvaient entraˆıner une geˆne et de la fatigue auditive (He´tu, 1994). Ce proble`me de se´curite´ associe´ a` la perception des signaux d’alarme ne peut pas eˆtre entie`rement re´solu par des protecteurs auditifs “intelli- gents” qui de´tecteraient les signaux d’alarme et les transmettraient au travailleur. En effet, un signal d’alarme qui est noye´ dans le milieu sonore ne sera pas mieux de´tecte´ par un syste`me de microphones que par l’oreille humaine. Tran Quoc and He´tu (1996) ont propose´ une se´rie de re`gles pour aider a` concevoir les signaux d’alarme en fonction du milieu industriel conside´re´ et ainsi garantir que ces signaux d’alarme seront entendus par les travailleurs, qu’ils posse`dent une perte auditive ou non. Dans l’e´tat actuel des choses, les signaux d’alarme ne respecte pas en ge´ne´ral ces re`gles de conception. Les ouvriers peuvent donc rencontrer des difficulte´s a` per- cevoir ces signaux, qu’ils portent leur protecteur auditif ou non. Meˆme un syste`me e´lectronique muni de microphone pourrait avoir du mal a` les de´tecter, en raison de leur trop faible niveau ou de leurs caracte´ristiques fre´quentielles pas assez diffe´rencie´es par rapport au bruit industriel environnant.
Intelligibilite´ de la parole
L’intelligibilite´ de la parole est ici conside´re´e selon deux points de vue : premie`rement la per- ception de la parole d’autrui, xxxxxx`mement la perception ne notre propre voix.
Pour la perception de la parole d’autrui, de meˆme que pour les signaux d’alarme, la saturation de la cochle´e due a` un fort niveau sonore et l’effet de masquage duˆ au bruit industriel e´leve´ entraˆınent une diminution de la perception et de l’intelligibilite´ de la parole qui ne seront pas force´ment ame´liore´es par le port de protecteurs auditifs (Tran Quoc and He´tu, 1996; Xxxxx, 1992). Par ailleurs, en milieu bruyant, une personne qui ne porte pas de protecteurs a tendance a` e´lever la voix pour se faire entendre. Si elle doit crier, l’intelligibilite´ de ses paroles diminue (He´tu, 1994; Xxxxxxx, 1956).
Du point de vue de la perception de notre propre voix, lorsqu’un travailleur porte des pro- tecteurs auditifs, sa perception de sa propre voix est modifie´e (cf. partie 0.1.2), il aura alors tendance a` modifier le contenu spectral de sa voix pour retrouver sa voix “normale”, ce qui peut e´galement diminuer l’intelligibilite´ de ses paroles pour un autre auditeur. De plus, l’effet d’occlusion donne l’impression au porteur des bouchons qu’il parle plus fort qu’il ne parle
re´ellement ; il a alors tendance a` diminuer le niveau global de sa voix, ce qui n’aide pas a`
l’intelligibilite´ de ses paroles pour un autre auditeur. Le rapport signal a` bruit de sa voix par rapport au bruit ambiant est diminue´, ce qui diminue d’autant l’intelligibilite´ de ses paroles pour un autre auditeur (Xxxxxxx, 2004; Xxxxxxxx and Xxxxxx, 2000).
0.1.3.3 Synthe`se : Compromis entre sante´ et se´curite´
La figure 0.8 pre´sente un diagramme re´capitulatif des compromis entre sante´ et se´curite´.
A` un fort niveau sonore, le port de protecteurs auditifs est indispensable pour prote´ger l’au-
dition des travailleurs ; il faut faire attention d’avoir une protection ade´quate a` la fois pour
prote´ger suffisamment l’audition, sans tomber dans un exce`s de protection qui entraˆıne des risques pour la se´curite´ du travailleur. Toutefois, dans certains cas, notamment chez les tra- vailleurs ayant une perte auditive, le port de protecteurs auditifs, meˆme bien choisis, peut en- traˆıner une diminution des capacite´s a` percevoir, reconnaˆıtre et comprendre les signaux d’in- formation utile ; la se´curite´ du travailleur est alors compromise. Pour la perception des signaux d’alarme, une partie des difficulte´s de reconnaissance est due a` une mauvaise conception de ces derniers qui se retrouvent noye´s dans le bruit. En ce qui concerne l’intelligibilite´ de la parole,
Sante´
Se´curite´
Communication
Limites de l’audition
Effet d’occlusion
Bruit des machines Signaux d’alarme
Protection de l’audition
Parole
Choix des protecteurs auditifs
Masquage
Saturation
Figure 0.8 Diagramme re´capitulatif des compromis entre sante´ et se´curite´
les sources des difficulte´s de compre´hension se situent a` la fois du coˆte´ du locuteur qui modifie le spectre de sa voix et du coˆte´ de l’auditeur qui porte des protecteurs auditifs.
0.2 Objectif
L’objectif principal de cette the`se est d’ame´liorer la communication verbale des travailleurs dans le bruit en milieu sonore industriel, a` la fois du point de vue du locuteur et de l’auditeur. Pour le locuteur, la perception de sa propre voix ne doit pas eˆtre modifie´e ou tre`s peu par le port de protecteur auditifs ; ainsi il pourra parler normalement et ses paroles seront tre`s intelligibles. Pour l’auditeur, il s’agit d’ame´liorer l’intelligibilite´ de la parole d’autrui en milieu sonore industriel ; pour se faire il s’agit de de´bruiter la parole du locuteur et de la re´e´mettre sous les protecteurs auditifs que porte l’auditeur. Deux sous-objectifs se de´gage ainsi de l’objectif principal.
1. Pour le locuteur, afin d’ame´liorer la perception de sa voix lors du port de protecteurs auditifs, une e´tude de l’effet d’occlusion est re´alise´e pour comprendre les alte´rations qu’il entraine sur la perception de notre propre voix.
2. Pour l’auditeur, afin d’ame´liorer l’intelligibilite´ de la parole d’autrui, le de´bruitage de la parole en milieu sonore industriel est e´tudie´ et e´value´. Le cas du de´bruitage de la parole par seuillage d’ondelettes est ici conside´re´.
Le premier sous-objectif traite d’aspects physiologiques de l’audition e´tudie´s de manie`re
expe´rimentale et fondamentale. Le deuxie`me sous-objectif consiste en une recherche applique´e et the´orique en traitement du signal par ondelettes. Le premier sous-objectif est traite´ dans le cadre du premier article de la the`se. Le deuxie`me sous-objectif est e´tudie´ dans les articles #2 et #3 de la the`se.
0.3 Proble´matique et Me´thodologie
Dans cette section, la proble´matique et la me´thodologie suivies tout au cours de la the`se pour chacune des deux parties sont ici pre´sente´es et explique´es.
0.3.1 Effet d’occlusion
La question pour laquelle une re´ponse e´tait recherche´e est la suivante : Comment le port de protecteurs auditifs intra-auriculaires modifie la perception interne de notre propre voix ?
Tout d’abord il s’agissait de trouver dans la litte´rature des e´le´ments de re´ponse. L’effet d’oc- clusion est apparu comme un des facteurs de modification de la perception interne de notre propre voix. Il a beaucoup e´te´ e´tudie´ dans le cas d’une excitation par conduction osseuse. Des quantifications subjectives et objectives de l’effet d’occlusion sont ainsi disponibles. Une quan- tification objective de l’effet d’occlusion pre´sent quand un sujet parle a e´galement e´te´ re´alise´e. Par contre, la litte´rature ne fournit pas de quantification subjective de l’effet d’occlusion pour la voix du sujet.
Afin de combler ce manque, une nouvelle technique pour quantifier l’effet d’occlusion a e´te´ conc¸ue : le mate´riel expe´rimental a e´te´ de´veloppe´ de telle manie`re qu’une onde sonore e´mise
dans la bouche du sujet ne puisse pas eˆtre transmise a` la cochle´e par la voix externe, mais
qu’elle soit uniquement transmise par voix interne. Une quantification a` la fois subjective et objective de l’effet d’occlusion obtenu ainsi avait e´te´ pre´vue et re´alise´e expe´rimentalement.
Pour ce qui est de la quantification objective, une fois que toute les donne´es avaient e´te´ re´cupe´re´ pour les diffe´rents sujets, les re´sultats se sont ave´re´s faux au dessus de 1,000 Hz : exactement la meˆme courbe e´tait obtenue pour tous les sujets. Nous nous sommes aperc¸us a` ce moment la` qu’il s’agissait d’un bruit e´lectrique qui e´tait transmis sur les microphones lors de l’acquisition. Ces re´sultats expe´rimentaux e´taient donc non utilisables car non fiables. La de´cision a alors e´te´ prise de comple´ter la the`se sans ces mesures afin de ne pas reporter le de´poˆt et la soutenance de celle-ci.
Les re´sultats de la quantification subjective ont e´te´ analyse´s. Une caracte´risation de l’effet d’oc- clusion, de la perception interne de la voix et de l’influence des diffe´rents chemin de conduc- tion a e´te´ entreprise a` partir de ces re´sultats ainsi que des quantifications de´ja` pre´sentes dans la litte´rature. L’article #1 pre´sente le protocole expe´rimental, les re´sultats obtenus et les conclu- sions qui ont e´te´ tire´es.
0.3.2 De´bruitage de la parole en milieu industriel
De nombreuses me´thodes de de´bruitage de la parole existent et sont principalement uti- lise´es dans le domaine des te´le´communications. Dans un environnement industriel bruyant, les contraintes sonores ne sont pas les meˆmes, il est donc a priori difficile de pre´voir les performances d’un algorithme de de´bruitage de la parole, conc¸u pour le domaine des te´le´communications, dans un milieu industriel.
Parmi les me´thodes de de´bruitage de la parole pre´sentes dans le litte´rature, le de´bruitage par ondelette a e´te´ choisi pour plusieurs raisons. Tout d’abord, la transforme´e en ondelettes re´alise une de´composition fre´quentielle proche de celle de la cochle´e. Par ailleurs les bruits indus-
triels sont en ge´ne´ral non blanc, et dans une meˆme entreprise plusieurs bruits diffe´rents sont pre´sents. L’ondelette a la capacite´ de s’adapter selon les de´placements du travailleur, et selon les composantes fre´quentiels du bruit.
Par conse´quent, Les techniques “classiques” de de´bruitage par ondelettes de la parole ont e´te´ teste´es et e´value´es dans un environnement industriel bruyant. Au total, 1 296 me´thodes de de´bruitage par ondelettes ont e´te´ teste´es sur 8 200 signaux de parole bruite´s. Vu l’importance de la base de donne´es ainsi obtenue a` analyser, il n’e´tait plus envisageable de faire une e´tude subjective des performances. Quatre crite`res ont donc e´te´ utilise´s pour quantifier les perfor-
mances des re´sultats obtenus. Un algorithme de se´lection spe´cialement conc¸u a` cet effet a
ensuite permis de faire une analyse objective pour mettre en e´vidence la ou les me´thodes qui semblent les plus prometteuses pour de´bruiter un signal de parole en milieu industriel bruite´.
Cette e´tude exploratoire a permis de mettre en e´vidence l’influence des diffe´rents parame`tres du de´bruitage par ondelettes, et a permis, par la suite, de proposer et de concevoir une nouvelle re`gle de seuillage spe´cifiquement adapte´e au milieu industriel bruite´.
0.4 Structure de la the`se
La pre´sente the`se est structure´ comme suit. Le premier chapitre constitue l’article #1 intitule´ “Subjective characterization of earplugs’ occlusion effect using an external acoustical excita- tion of the mouth cavity” qui re´pond au premier sous-objectif de la the`se. Pour le deuxie`me sous-objectif de la the`se, deux articles permettent d’y re´pondre et sont pre´sente´s dans les cha- pitres 2 et 3. Le deuxie`me chapitre est l’article #2 intitule´ “Wavelet speech enhancement for industrial noise environments”. Cet article traite de l’e´tude exploratoire re´alise´e ainsi que l’al- gorithme de se´lection conc¸u pour analyser les performances des re´sultats obtenus. Le troisie`me chapitre est l’article #3 intitule´ “A wavelet speech thresholding rule for denoising in industrial environments”. Il explicite la nouvelle loi de seuillage propose´e et pre´sente les performances qu’elle permet d’obtenir. Finalement, une conclusion ge´ne´rale de la the`se permet de faire le point sur le travail qui a e´te´ re´alise´ et les avenues a` envisager dans l’avenir.
CHAPITRE 1
ARTICLE #1
“SUBJECTIVE CHARACTERIZATION OF EARPLUGS’ OCCLUSION EFFECT USING AN EXTERNAL ACOUSTICAL EXCITATION OF THE MOUTH CAVITY”
Ce´cile Le Cocq, Fre´de´xxx Xxxxxxx, Xxxxxxxxx Xxxxxxx, E´ cole de technologie supe´rieure, Xxxxxxxxxxx xx Xxxxxxx,
Xxxxxxxxx (Xxxxxxx), Xxxxxx, X0X 0X0
This article has been submitted to the Journal of the Acoustical Society of America, 16 September 2009.
Re´sume´
L’occlusion du conduit auditif par le port d’une prothe`se ou d’un protecteur auditif cre´e un inconfort chez les usagers qui est duˆ, entre autres, a` une modification de la perception de sa propre voix. Ce phe´nome`ne s’appelle effet d’occlusion (OE - occlusion effect). Cet article propose une caracte´risation de l’effet d’occlusion associe´ a` la voix base´e sur un mode`le de transmission interne du son divise´ en un chemin de conduction de la voix par le corps (VBC - voice body conduction) et en un chemin de conduction de la voix par l’air et le corps (VABC - voice air and body conduction). Une mesure subjective de ce dernier chemin est propose´e: elle utilise un haut-parleur place´ a` l’entre´e de la bouche. Les re´sultats issus de cette mesure et ceux rapporte´s dans la litte´rature sont utilise´s afin de caracte´riser les chemins internes de conduction de la voix. En basses fre´quences, lorsque les oreilles sont occluses, le chemin indirect domine le chemin direct et les OE objectif et subjectif sont positifs. En hautes fre´quences, lorsque les oreilles sont non occluses, le chemin VBC direct domine le chemin VBC indirect, alors que lorsque les oreilles sont occluses, c’est le chemin VABC indirect qui domine le chemin VABC
direct. A` ces fre´quences, les OEs objectifs relatifs aux chemins VBC et VABC et a` la voix ainsi
que l’OE subjectif associe´ au chemin VABC sont ne´gatifs, tandis que l’OE subjectif relatif au chemin VBC est de l’ordre de ze´ro.
Abstract
The occlusion of the ear canal by hearing aids or hearing protectors results in an occlusion effect (OE) which creates a discomfort to their users due to the resulting changes in their own voice perception. In this paper, a characterization of the voice OE is proposed, based on a transmission scheme where the internal ear sound path is subdivided into the voice body conduction (VBC) path and the voice air and body conduction (VABC) path. A subjective measurement of this path is presented where a speaker is placed at the mouth entrance. The results from these measurements and others reported in the literature, are used to characterize the internal voice paths. At low frequencies, in an occluded ear, the indirect path is dominant over the direct one and the objective and subjective OE are positive. At high frequencies, in an open ear, the VBC direct path is dominant over the indirect one, while, in an occluded ear, the VABC indirect path dominates the direct one. At these frequencies also, the objective VBC, VABC and voice OE as well as the subjective VABC OE are negative while the subjective VBC OE is close to zero.
1.1 Introduction
The “hollow voice” occlusion effect (OE) (Xxxxxxx, 1988) is a known modification of one’s own voice perception when wearing a hearing aid or a hearing protector. This effect creates a discomfort that sometimes brings people to remove their device. Hearing aid wearers are then isolated in their silence while hearing protector wearers are at risk of hearing damage. In an attempt towards resolving this problem, the improvement of the characterization of the voice OE is undertaken in the present research.
Different kinds of OE can be found in the literature. Usually the bone conduction (BC) OE is considered (Be´ke´sy, 1960a; Xxxxx, 0000; Xxxxxx, 1986; Xxxxxxxx et al., 2003; Xxxxxxxx and Xxxxxxxxx, 2007; Xxxxxxxxx et al., 2007). Some studies considered also the voice OE (Lundh, 1986; Xxxxxx, 1986; Xxxxxx, 1997, 1998; Xxxxxx, 2000). Moreover the OE can be quantified in an objective or subjective manner. These different OEs have been defined in the literature (Xxxxxxxx et al., 1966; Xxxxxxxx, 0000; Xxxxxx, 1997, 1998; Xxxxxxxx et al., 2002; Xxxxxxxxx
et al., 2007; Xxxxxxxx and Xxxxxxxxx, 2007; Xxxxxx, 1986) and are presented here under a unified terminology that will be used throughout this article.
• Objective BC OE: The objective BC OE can be defined as the difference of the sound pressure level (SPL) in the ear canal between occluded and open ear conditions when the subject is submitted to a bone vibrator.
• Subjective BC OE: The subjective BC OE can be defined as the difference of the hearing perception between occluded and open ear conditions when the subject is submitted to a bone vibrator.
• Objective voice OE: The objective voice OE can be defined as the difference of the SPL in the ear canal between occluded and open ear conditions when the subject is speaking. The subject is supposed to speak at the same level whether the ear is occluded or not.
• Subjective voice OE: The subjective voice OE can be defined as the difference of the voice perception between occluded and open ear when the subject is speaking.
Although all these OEs have been defined, not all of them have been quantified. The objec- tive BC OE (Xxxxxxxx et al., 2003; Xxxxxxxx and Xxxxxxxxx, 2007; Xxxxxxxxx et al., 2007; Xxxxx, 0000; Xxxxxx, 1986) and the subjective one (Be´ke´sy, 1960a; Xxxxxxxxx et al., 2007; Xxxxxxxx and Xxxxxxxxx, 2007) have been quantified by several researchers. The objective voice OE (Dil- lon, 2000; Xxxxxx, 1997, 1998; Xxxxx, 0000; Xxxxxx, 1986) has also been quantified. The subjective voice OE has been examined by Xxxxxx (1997) who had evaluated the annoyance experienced by subjects when they wear their hearing aids. However no real quantification (threshold or loudness) of this effect was done in her study.
The purpose of the present paper is to quantify the subjective voice OE and to analyze its different aspects. The subjective voice OE can hardly be obtained by the classical method of hearing thresholding, but it could be measured by the loudness balance method. As this second possibility is more time consuming and more demanding for the subjects and probably less accurate, a new hearing thresholding measurement has been developed. The voice in the subject’s mouth is simulated by means of an external speaker.
The article is organized as follows. In section 1.2 the main physical explanations of the OE are presented for later use in the data analysis. Section 1.3 describes the different internal sound path components involved in the perception of one’s own voice and introduces the possible measurements and their associated OEs. Some measurements were previously available in the literature whereas some were designed and implemented in this present research. The new measurement method is presented in section 1.4. The experimental results are in section 1.5. In section 1.6, the voice OE is characterized using the present results and these from the literature. The conclusions and recommendations are drawn in section 3.5.
1.2 Main physical explanations of the OE
The purpose of this section is to briefly present the main physical mechanisms that are generally agreed upon, in the literature, to explain the OE. The upcoming section on the characterization of the voice OE (section 1.6) will refer to these physical mechanisms whenever they can be used to explain the measurement results.
The OE is most generally observed and explained at low frequencies (up to about 2 kHz) (Stenfelt et al., 2003): it is an increase in the perceived and measured sound level in the ear canal. If the external air conduction is not considered, the ear canal SPL is in a large part due to the vibrations of the cartilaginous walls (as proposed by Ba´ra´ny (1938)) which are located roughly along the external 2/3 of the ear canal. Occlusion modifies the ear canal acoustic system, preventing at the opening the radiation of sound which is typically more efficient at low frequencies, so more low frequency sounds remain in the ear canal and are transmitted to the inner ear. A higher SPL in the ear canal and a lower auditory threshold can then be observed at low frequencies. This effect has been presented in Tonndorf (1972) and Tonndorf et al. (1966) who have shown that the opened ear canal behaves like a high-pass filter based on a lumped parameter element model. Xxxxxxxx et al. (2003) confirmed Xxxxxxxx’x explanation in his 2003 experiments where he showed a positive OE up to 2 kHz. Several authors have also demonstrated that this positive OE is practically eliminated or much lowered when the earplug is deeply inserted, which is attributed to the fact that the cartilaginous part of the ear canal is
completely covered by the earplug and only the bony part is radiating at a much lower level than the cartilaginous part (Xxxxxx and Xxxxxxx, 1983; Xxxxxxxx and Xxxxxxxxx, 2007).
At high frequencies (above about 2 kHz), the OE is only partially explained. The explanations are based on a comparison of the ear canal acoustic system in open and occluded configura- tions. In experiments where the ear canal is replaced by an equivalent plastic tube, Stenfelt et al. (2002) demonstrated that the open tube has a quarter wavelength resonance at 2.7 kHz, whereas the closed tube has a half wavelength resonance at 5.5 kHz. Xxxxx (1855), Politzer (1907-1913) and Xxxxxxx (1960) (cf. Xxxxxxxx (1972)) proposed the assumption that the OE could be caused by these resonance modifications of the ear canal. Xxxxxxxx et al. (1966); Ton- ndorf (1972) and Xxxxxxxx et al. (2003) validated this assumption but only for high frequencies. This has also been obtained by Xxxxxx (1998) with a distributed parameter element model of the ear canal which is represented by a constant section tube. Hence, at high frequencies, the OE could be explained by the facts that suppressing the quarter wavelength resonance at about
2.7 kHz would give a negative OE around this frequency and that a positive OE would appear around the half wavelength resonance at about 5.5 kHz (or above since the ear canal length is reduced by the earplug insertion depth).
1.3 The internal sound path components involved in the perception of one’s own voice
Voice production is a complex mechanism involving air moving out of the lungs that induces a fluctuating pressure at the entrance of the vocal tract that will filter this pressure signal to produce an external voice signal (Xxxxxxx, 2004). The aforementioned fluctuating pressure can be produced in two ways (Xxxxxxx, 2004): (i) for voiced sounds that constitute most of the components of speech delivered at a conversation level, the fluctuating pressure at the entrance of the vocal tract is induced mainly through vocal cords vibrations, (ii) for unvoiced sounds that constitute all the components of whispered speech and some components of usual speech such as some consonant sounds, the fluctuating pressure at the entrance of the vocal tract is not induced by vocal cord vibrations but by turbulences generated by flow restrictions. In other words, there are two voice sources: the vocal cords vibrations and the turbulences due
to flow restrictions. The vocal cord vibrations source has the particularity of interacting with the air in the larynx and also interacting directly with the bone and associated body structures supporting the vocal cords, so it will be classified in this text as both an airborne source and a structure-borne source. The turbulences that interact only with air will be classified as an airborne source.
From the point of view of voice perception, two main paths can be distinguished from the voice sources to the inner ear (Howell, 1985): the external and the internal paths. These two paths have also been identified by Be´ke´sy (1949, 1960a) who found that the amount of acoustic energy transmitted by each of them is of the same order. The external voice perception is only due to the airborne source, whereas the internal voice perception depends on both the structure- borne source and the airborne source. Since the present research aims at characterizing the OE, only the internal path will be studied. In order to easily refer to the various components of this internal path, they have been represented schematically with a bloc diagram in Fig. 1.1.
Figure 1.1 Diagram of the internal sound path components involved in the perception of one’s own voice.
Following Be´ke´sy (1949, 1960a) and Xxxxxx (1985) this internal path is separated into two paths according to two possible sources: (i) a structure-borne source (A) due to the vocal
cord vibrations that excite directly the skull bones, (ii) an airborne source (B) (vocal cords interaction with air or turbulences in the larynx) that excites the vocal tract cavities.
The upper half of the bloc diagram contains the first path, the voice body conduction (VBC) path associated with the structure-borne source (A). The structure-borne source (A) is either the vocal cords (voiced sound production) or a bone vibrator (artificial excitation on the skull, most often the forehead or the mastoid bone) used in many bone excitation experiments reported in the literature (Be´ke´sy, 1960a; Xxxxx, 0000; Xxxxxx, 1986; Xxxxxxxx et al., 2003; Xxxxxxxx and Xxxxxxxxx, 2007; Xxxxxxxxx et al., 2007). According to the works on BC of Be´ke´sy (1960a) and Tonndorf (1972)), this VBC path is subdivided into two paths: (i) a direct path to the inner ear (A1), (ii) an indirect path, first to the ear canal (A2a) and second, from the ear canal to the inner ear (A2b). In naming this path as well as the other one in the lower half of the bloc diagram, the word “body” is used instead of the word “bone” to acknowledge the fact that not only the bones but other human tissues contribute to solid borne sound transmission.
The lower half of the bloc diagram contains the second path, the voice air and body conduction (VABC) path associated with the airborne source (B). The airborne source (B) is either, for real voice production, the vocal cords interaction with air and the air turbulences in the larynx or, for a path identification experiment developed specifically for the presented research work, an artificial excitation using a mouth speaker (the details on the experimental setup are given in section 1.4). This VABC path can be subdivided, exactly like the VBC path, into a direct path to the inner ear (B1) and one indirect path, first to the ear canal (B2a) and second, from the ear canal to the inner ear (B2b).
For both the VBC and the VABC paths, two kinds of measurements can be performed, a sub- jective one for the paths reaching the “inner ear” box which represents the subject hearing per- ception, and an objective one for the paths reaching the “ear canal” box where a microphone can give the SPL value. In the following paragraphs, the four OEs defined in the introduction are expressed in terms of the measurements (for open and occluded ear) of the associated paths.
The subjective voice OE is the perception difference between open and occluded ear associated with both the VBC and the VABC paths when the source is the subject real voice (vocal cords and/or turbulences). Classically, as for the subjective BC OE (Be´ke´sy, 1960a; Xxxxxxxxx et al., 2007; Xxxxxxxx and Xxxxxxxxx, 2007), the threshold measurements for open and occluded ear is used to quantify the OE. However, for the subjective voice OE, the standardized threshold measurement protocol (ISO 8253-1, 1989; ISO 8253-2, 1992) cannot be used anymore because the subject will always know when a sound is possibly emitted. A possibility could be to use a loudness balance method. However no account of a loudness balance method for the deter- mination of the subjective voice OE was found in the literature, only qualitative evaluations of this effect have been found in (Xxxxxx, 1997).
The objective voice OE is the ear canal SPL difference between open and occluded ear asso- ciated this time with a fraction only of the path associated with the subjective OE presented in the previous paragraph: only the A2a path from the VBC and the B2a path from the VABC are considered in this case. Some results of the literature (Lundh, 1986; Xxxxxx, 1986; Xxxxxx, 1997, 1998; Xxxxxx, 2000) will be used in section 1.6 for analysis purpose.
The first two OEs presented (subjective and objective voice OEs) include the two kinds of voice sources: the structure-borne source(A) and the airborne source(B). In order to investigate separately the OEs associated with each of these two kinds of sources, the real sources can be replaced by an artificial one: a forehead bone vibrator which may be said similar to the structure-borne source part of the vocal cords, or a mouth speaker which may be said similar to the turbulences and the airborne source part of the vocal cords.
In the case of the structure-borne source (A), according to Be´ke´sy (1949, 1960a), the VBC path can be considered similar to a BC path. Then the VBC OEs can be said equivalent to the BC ones and the literature results can be used. The subjective VBC OE is the hearing threshold difference between open and occluded ear associated with the VBC paths (A1 in parallel with A2a and A2b). The objective VBC OE is the ear canal SPL difference between open and occluded ear associated only with the A2a path from VBC. Some literature results on
these two BC OEs (subjective (Xxxxxxxxx et al., 2007; Xxxxxx and Xxxxxxx, 1983; Xxxxxxxx and Xxxxxxxxx, 2007) and objective (Xxxxxxxxx et al., 2007; Xxxxxxxx and Xxxxxxxxx, 2007; Xxxxxx, 1986; Lundh, 1986)) will be used in section 1.6 for analysis purpose.
In the case of the airborne source (B), similarly to the structure-borne source (A), we define two OEs: the subjective VABC OE is the hearing threshold difference between open and occluded ear associated with the VABC paths (B1 in parallel with B2a and B2b) whereas the objective VABC OE is the ear canal SPL difference between open and occluded ear associated only with the B2a path from VABC. As mentioned previously in this section, an experiment is designed using a mouth speaker to simulate the voice airborne source. In this research, a subjective measurement is conducted giving a quantification of the subjective VABC OE. The objective measurement could be considered in future research. The experimental set-up and protocol for this subjective measurement of the VABC OE are presented in the next section.
1.4 Measurement method
The experimental protocol has been examined and accepted by the IRB (Institutional Review Board) of the E´ cole de technologie supe´rieure.
1.4.1 Subjects
Eleven subjects have been chosen for their otologically normal ears and their good hearing using criteria from the ISO 4869-1 standard (ISO 4869-1, 1990): their minimum audible pres- sure (MAP) must be below 20 dB-HL for frequencies below 2,000 Hz and below 30 dB-HL for frequencies above 3,000 Hz.
For each subject, a pair of Sonomax v3 S2/M2 (small or medium size) custom earplugs from Sonomax Hearing Healthcare Inc. (Montre´al Que´bec) are fitted individually by a certified implementor. The average insertion depth of the earplugs is 13 mm. For easier comparison with data in the literature, this insertion depth can be said equivalent to an approximate average
occluded volume value of about 0.54 cm3, based on an average length of the ear canal of 27 mm and an average diameter of 7 mm (Small and Gales, 1998).
1.4.2 Experimental setup
All tests were realized in an audiometric booth with an Interacoustics AC40 clinical audiome- ter. The 125, 250, 500, 1000, 2,000, 4,000, and 8,000 Hz frequencies are tested with a warble
tone according to the ISO 8253-1,2 standards (ISO 8253-1, 1989; ISO 8253-2, 1992). All tests are realized in open and occluded ears with SonoCustom v3 S2/M2 earplugs.
The experiments are perform in two experimental phases. In the first one, measurements in a diffuse field are realized. In the second one, an acoustic box is used. The flowchart of these experimentals phases is reprensented in Fig. 1.2.
Subject welcome
Hearing thresholds with open ears
The subject puts its earplugs
Earplug attenuation measurement
Hearing thresholds with occluded ears
Thanks of the subject
Figure 1.2 Flowchart of each experimental phases
In a first experimental phase, the hearing thresholds with open and occluded ears were mea- sured in a diffuse field in the audiometric booth. This diffuse field was previously calibrated according to the ISO 389-7 standard (ISO 389-7, 1996). The hearing threshold with open and
occluded ears measurements give respectively the minimum audible diffuse field in open ears (MADFop) and in occluded ears (MADFoc). The REAT (Real Ear Attenuation at Threshold: MADFop − MADFoc) is then quantified. This measurement serves to check that the earplug attenuation is within ANSI certification (Industrial Noise Laboratory, 2007).
In a second experimental phase, an acoustic box is used to transmit a sound field directly and only in the mouth of the subject. The acoustic box was designed to provide an acoustic signal in the subject’s mouth by making the sound of a loudspeaker box converge to a spirometry filter system with a tubular extremity on which the subject closed his mouth. The experimental set-up including details on the acoustic box elements are presented in Fig. 1.3.
7
1
2
4
8
9
3
11
10
5
6
Figure 1.3 Setup of the acoustic box: (1) parallelepipedic box in 1 inch plywood, (2) sound barrier, (3) at least 4 inches of sound
absorber, (4) truncated pyramid with rectangular base and square node in 1 inch plywood, (5) PVC pipe of 1,75 inches diameter, (6)
speaker, (7) fixed half spirometry filter, (8) spirometry filter adapter, (9) individual spirometry filter (bacterial / viral), (10) subject, (11) adjustable seat. (color online)
Since there is no standard available for the calibration of this new measurement procedure, it was chosen, somewhat arbitrarily, to apply the ISO 389-7 standard (ISO 389-7, 1996) for diffuse field to calibrate the sound field at the output of the ergo-filter closed with a microphone adapter. This choice has no effect on the OE measurement since it is a threshold difference, it only affects the hearing threshold value as represented on the results in Fig. 1.5 of the
next section. The hearing thresholds with open and occluded ears are measured with this acoustic box set-up in the audiometric booth. The minimum audible mouth pressure in open ears (MAMPop) and in occluded ears (MAMPoc) are obtained. They serve to determine the OE associated with the VABC path.
Because each experimental phase is fairly demanding for the subjects, it was chosen to perform the two experimental phases at two different moments. Hence the earplug was removed and reinserted and other factors might affect earplug attenuation. Consequently, in order to check that earplug attenuation had not changed significantly between the two phases, a noise reduc- tion measurement of earplugs was carried out before each experiment with occluded ears. This is presented in the next section.
1.4.3 Noise reduction measurement
The noise reduction provided by earplugs is measured in a 85 dB-SPL free field pink noise by means of two dual microphone probes (produced by Sonomax Hearing Healthcare Inc.) which measure the SPL inside (pmeas) and outside (pref ) the earplug for both ears. In Fig. 1.4, a Sono- max v3 S2/M2 custom earplug with an inserted dual microphone probe is drawn schematically.
Figure 1.4 Sonomax v3 S2/M2 custom earplug with an inserted dual microphone probe (color online).
The noise reduction measurement protocol which follows is the second scenario described by Voix (2006) and Voix and Xxxxxxx (2009) with adapted notations. The measured noise reduction NRmeas of earplugs realized when the subject wears his earplugs is defined by equation 1.1.
NRmeas
= 20 log10
pref
p
(1.1)
meas
This measure must be corrected by the transfer function N˜R of the dual microphone probe inserted in the earplug when they are not worn and submitted to a uniform acoustic pressure field:
N˜R = 20 log10
p˜meas p˜ref
(1.2)
The corrected noise reduction NRc (cf. equation 1.3) is obtained by subtracting this transfer function from the measured noise reduction
NRc = NRmeas − N˜R (1.3)
A binaural corrected noise reduction NRcb is then obtained by the equation 1.4 with MAP the minimum audible pressure under headphone of the subject.
⎧
⎪
⎪ if
⎪
NRc(left) + MAP(left)
< NRc(right) + MAP(right)
⎪
⎪ NRcb
⎪
= NRc(left)
⎨⎪ if
⎪
⎪
⎪
NRc(left) + MAP(left) NRcb = NRc(right)
> NRc(right) + MAP(right)
(1.4)
⎪ if
⎪
⎪
NRc(left) + MAP(left)
= NRc(right) + MAP(right)
,⎪ NRcb = min
NRc(left), NRc(right)
1.5 Experimental results
Figure 1.5 depicts the main average experimental results: the minimum audible diffuse field in open ears (MADFop) and in occluded ears (MADFop), and the minimum audible mouth pressure in open ears (MAMPop) and in occluded ears (MAMPoc).
In order to validate our experimental results, the noise attenuation of earplugs is examined in a first part. In a second part, the objective VABC OE is presented.
100
90
80
70
60
50
40
30
20
10
0
125
250
500
1000
Frequency [Hz]
2000
4000
8000
Sound pressure level [dB−SPL]
Figure 1.5 Average MADFop (−+), average MADFoc (− ), average MAMPop (− · +), and average MAMPoc (− · ).
1.5.1 Noise attenuation of earplugs
In Fig. 1.6 are plotted the experimental average and standard deviation REAT (real ear at- tenuation threshold: MADFop − MADFoc) obtained in diffuse field as well as the certified ANSI REAT (Industrial Noise Laboratory, 2007) (average ± two standard deviation and stan- dard deviation) for the Sonomax v3 S2/M2 custom earplugs. As can be seen in Fig. 1.6 the experimental REAT results are in agreement with the ANSI certification ones.
As mentioned previously, the NR values were obtained before each experimental phase with occluded ears to check that the earplugs’ attenuation had not changed significantly between the two phases. The delta between NRcb obtained in the two experimental phases is presented in Fig. 1.7. At low frequencies there are practically no differences. At high frequencies the delta is slightly higher, but still acceptable, especially because the fit variability of earplugs is high at high frequencies. The small differences obtained confirmed that the subjects have placed their earplugs in the same manner during the two experimental phases. As the earplugs have
10
0
−10
−20
−30
−40
−50
125
250
500
1000
Frequency [Hz]
2000
4000
8000
REAT [dB]
std [dB]
−× −
Figure 1.6 REAT experimental results [average ( ), std ( )] and REAT ANSI results (Industrial Noise Laboratory, 2007) [average ± 2 std (−− +), std (−− ♦)] (color online).
been well fitted during the diffuse field experiments, they also have been well fitted during the experiments with the acoustic box.
1.5.2 Subjective VABC OE
In Fig. 1.8 are plotted the average and the standard deviation of the subjective VABC OE which can be quantified by the difference between MAMPop and MAMPoc. The subjective VABC OE is positive at low frequencies (below 2,000 Hz) and negative at high frequencies.
6
4
2
0
−2
−4
−6
−8
−10
−12
125
250
500
1000
Frequency [Hz]
2000
4000
8000
5
15
0
10
5
0
−5
−10
−15
125
250
500
1000
Frequency [Hz]
2000
4000
8000
NRcb diffused field − acoustic box [dB]
std [dB]
Figure 1.7 Delta between NRcb in diffused field and acoustic box experiments.
MAMPoc − MAMPop [dB]
Figure 1.8 Subjective VABC OE (MAMPoc − MAMPop) [average (−×), std (−− )]
1.6 Characterization of the voice OE using experimental results from the presented re- search work and from the literature
For each of the OEs considered here (the subjective and objective BC OE, the objective voice OE from the literature and the subjective VABC OE from the presented experimental work), the analysis will be separated in two frequency domains, each one in a separate subsection: low frequencies (below 1500-2000 Hz) and high frequencies (above 1500-2000 Hz). For each frequency domain, firstly we determine which path dominates in occluded and opened ears. Secondly, we determine how occlusion generates path modifications leading to a positive, neg- ative or null OE. Thirdly, we verify that the obtained results are in agreement with the main physical mechanism, described in section 1.2, which can only be applied to the indirect paths (A2a and B2a). Information on path identification and OE will only be given when a conclu- sion can be reached.
In order to easily determine the relative weight of each path in every case, for the rest of this article, the name of the path, for example B1, will also be used to denote the amount of acoustic energy that the path brings to the end point, so the path B1 brings the energy B1 to the inner ear which receives the sum of the energies X0, X0x, X0, and A2b. At the inner ear, energy should be understood as perceived energy by the subject, whereas at the ear canal, the energy is the real acoustic energy which can be measured as SPL with a microphone.
Firstly, we analyze the subjective VABC OE we quantified in our experiments. Then we con- sider successively the OEs which have been already quantified in the literature (subjective and objective BC OEs and objective voice OE). Finally we synthesize the results obtained for all these OEs.
1.6.1 Subjective VABC OE
According to Fig. 1.1, the VABC path corresponds to the paths B1 and B2 (B2a then B2b). The analyses that follow aim at determining the relative contribution of these components in both open and occluded ears according to the results presented in Fig. 1.8.
1.6.1.1 At low frequencies
At low frequencies, the positive OE of the order of +5 to +10 dB means that the energy transmitted by the sum of the B1 and B2 paths is much more important in occluded ear than in opened ear: B1oc + B2oc B1op + B2op. As mentioned in section 1.2, the occlusion has an effect on the ear canal vibratory and acoustic system, hence, a reasonable assumption is that occlusion will affect only the B2 path and leave unchanged the B1 path, so that B1oc ≈ B1op. This equation put into the first inequality leads to B2oc B2op and, furthermore, on the left hand-side of the inequality B2oc B1oc. Indeed, if we had B2oc ± B1oc, then we would have B1oc B1oc + B2op which is false. So the assumption B2oc ± B1oc is false. Otherwise, if we had B2oc ≈ B1oc, then we would have B1oc · 2 B1oc + B2op which is false. So the assumption B2oc ≈ B1oc is false.
In other words, for the VABC perception at low frequencies we can conclude:
• Path identification: in an occluded ear the indirect path B2 transmits more energy than the direct path B1;
• OE: a positive OE results from the fact that occlusion increases the energy transmitted by the B2 path which dominates the B1 path.
As the indirect path B2 dominates, this is in agreement with the low frequency physical mech- anisms presented in section 1.2. In open ear, the low frequencies ear canal wall radiations are propagated out of the ear. In occluded ear, these low frequencies are blocked inside and so the SPL at low frequencies is increased.
1.6.1.2 At high frequencies
At high frequencies, the negative OE of the order of −5 to −12 dB means that the energy transmitted by the sum of the B1 and B2 paths is much more important in opened ear than in occluded ear: B1oc + B2oc ± B1op + B2op. As in the case of the low frequencies the same assumption can be made to the effect that occlusion will affect only the B2 path and leave unchanged the B1 path, so that B1oc ≈ B1op. This equation put into the first inequality leads to B2oc ± B2op and, furthermore, on the right hand-side of the inequality B2op B1op.
In other words, for the VABC perception at high frequencies we can conclude:
• Path identification: in an open ear the indirect path B2 transmits more energy than the direct path B1;
• OE: a negative OE results from the fact that occlusion decreases the energy transmitted by the B2 path which dominates the B1 path.
As the indirect path B2, this is in agreement with the high frequency physical mechanism described in section 1.2. The canal occlusion causes modifications of its resonances. In opened ear, the first ear canal resonance is about 2.7 kHz. In occluded ear, this resonance disappears and is replaced by a higher one (5.5 kHz or above).
1.6.2 Subjective BC OE
According to Fig. 1.1, the subjective BC corresponds to the paths A1 and A2 (A2a then A2b). In Fig. 1.9 seven subjective BC OEs from the literature are presented and our subjective VABC OE is also represented as it will be represented on all the following graphs for reference. Three (one from Xxxxxxxxx et al. (2007) and two from Xxxxxxxx and Xxxxxxxxx (2007)) of the seven presented measurements cover the whole frequency range from 125 to 8000 Hz, they have been chosen because they correspond to insertion depths (18 mm, 15 mm and 7 mm) close to the one in this study (13 mm) and, for these three measurements, the objective BC OE has also been quantified in the same conditions (and will be presented in the next section). The other four measurements cover a more limited frequency (125 to 2000 Hz), but they have been chosen because they are four measurements from the same author (Xxxxxx and Xxxxxxx, 1983) that cover a wide range of insertion depth (indicated in terms of occluded ear canal volumes of respectively 0.2, 0.5, 0.6, and 0.8 cm3).
1.6.2.1 At low frequencies
As can be seen in Fig. 1.9 at low frequencies (below 2000 Hz), the subjective BC OE level is much dependent on the earplug insertion and on the kind of earplug. However, the subjective BC OE is globally positive just like in the case of the subjective VABC OE, so the same rela- tions that have been introduced previously for the VABC case will be valid here, after the letter
30
25
20
15
10
5
0
−5
−10
−15
125
250
500
1000
Frequency [Hz]
2000
4000
8000
Occlusion effect [dB]
−−
−×
Figure 1.9 Subjective BC OEs of the literature [this study ( black); Xxxxxxxxx et al. (2007): ( gray) occlusion with 18 mm insertion of E-A-R Classics and bone excitation at the forehead;
···
Xxxxxx and Xxxxxxx (1983) bone excitation at the forehead: ( +
···
···
black) E-A-R plug with 0.2 cm3 occluded volume, ( + dark gray) E-A-R plug with 0.5 cm3 occluded volume, ( + medium
···
gray) V-51R plug with 0.6 cm3 occluded volume, ( + light gray) E-A-R plug with 0.8 cm3 occluded volume; Stenfelt and Xxxxxxxxx
−·
···
(2007) bone excitation at the forehead: ( black) occlusion with 7 mm insertion of foam earplug, ( black) occlusion with 15 mm insertion of foam earplug].
B is replaced by the letter A: A1oc + A2oc A1op + A2op and, with the reasonable assumption that occlusion will affect only the A2 path and leave unchanged the A1 path, A1oc ≈ A1op, this will yield A2oc A2op and A2oc A1oc.
In other words, for the VBC perception at low frequencies we can conclude:
• Path identification: in an occluded ear the indirect path A2 transmits more energy than the direct path A1;
• OE: a positive OE results from the fact that occlusion increases the energy transmitted by the A2 path which dominates the A1 path.
For the same reasons as in the case of the subjective VABC OE, as the indirect path (A2 instead of B2) dominates, this is in agreement with the low frequency physical mechanisms presented in section 1.2.
1.6.2.2 At high frequencies
At high frequencies, the fact that the OE is close to zero means that the simultaneous con- duction through the A1 and A2 paths in the open and occluded ear cases are approximatively equivalent: A1oc + A2oc ≈ A1op + A2op. With the same assumption A1oc ≈ A1op as in the low frequency case, there are two possibilities: the first one is A2oc ≈ A2op, the second one is A1oc A2oc and A1op A2op. The choice will be made in section 1.6.4 using the results from the objective BC OE presented in the next section.
1.6.3 Objective BC OE
According to figure 1.1, the objective BC corresponds to the path A2a. In Fig. 1.10 four objec- tive BC OEs from the literature are presented with our subjective VABC OE. As mentioned in section 1.6.2, three of the objective BC OE measurements presented here (one from Xxxxxxxxx et al. (2007) and two from Xxxxxxxx and Xxxxxxxxx (2007)) are issued from the same studies and use the same earplugs as three of the subjective BC OE measurements presented in Fig. 1.9. The fourth one (Lundh (1986), Xxxxxx (1986)) is included because in their study an objective voice OE has also been quantified in the same conditions (it will be presented in section 1.6.5).
1.6.3.1 At low frequencies
At low frequencies, the OE is positive, so A2aoc A2aop. In other words, for the objective VBC at low frequencies we can conclude:
• OE: a positive OE results from the fact that occlusion increases the energy transmitted by the A2a path.
Like for the subjective BC OE, the objective one is in agreement with the low frequency phys- ical mechanism described in section 1.2.
35
30
25
20
15
10
5
0
−5
−10
−15
125
250
500
1000
Frequency [Hz]
2000
4000
8000
Occlusion effect [dB]
−−
−×
···
Figure 1.10 Objective BC OEs of the literature [this study ( black); Xxxxxxxxx et al. (2007): ( gray) occlusion with 18 mm insertion of E-A-R Classics and bone excitation at the forehead; Xxxxxxxx and Xxxxxxxxx (2007) bone excitation at the forehead: (
−·
black) occlusion with 7 mm insertion of foam earplug, ( black) occlusion with 15 mm insertion of foam earplug; Lundh (1986),
−
Xxxxxx (1986): ( dark gray) occlusion with full earmould impression and bone excitation at the contra-lateral mastoid].
1.6.3.2 At high frequencies
At high frequencies, the OE is negative, so A2aoc ± A2aop. In other words, for the objective VBC at high frequencies we can conclude:
• OE: a negative OE results from the fact that occlusion decreases the energy transmitted by the A2a path.
For the same reasons as for the subjective VABC OE, these results are in agreement with the high frequency physical mechanism described in section 1.2.
1.6.4 BC OE: Integration of physiological noise masking effect and synthesis
In this section, the relations derived in the previous sections on the subjective and objective BC OEs are brought together to characterize as completely as possible the path involved in the BC OE. An analysis of the physiological noise masking effect (PNME) that occurs when the subject is wearing a hearing protector is presented in a first sub-section. This analysis will be used to choose between the two possibilities which have been found for the subjective BC OE at high frequencies (cf. section 1.6.2.2). The two last sub-sections present the general results for BC OE.
1.6.4.1 Physiological noise masking effect
The physiological noise (PN) includes all noises for which the source is in the body (for ex- ample breath, heartbeats). The PNME is the masking due to this PN. This effect is more important in occluded ear than in open ear. When the subject is wearing a hearing protec- tor, the low frequency increases in SPL due to the ear canal occlusion affects the PN. The maximum level of the PN under diverse hearing protectors is less than 35 dB SPL around 80 Hz according to Xxxxxx and Xxxxxxx (1983), then it decreases with frequencies to reach 20 dB SPL at 250 Hz. The PN in opened ear is lower; in his experiments, Xxxxxx and Xxxxxxx (1983) found that it was even lower than the instrumentation noise of his measurement system (about 24 dB SPL at 80 Hz). For both open and occluded ear, the PNME can be written as A2 = A2b = A2a + A2PNME where A2PNME represents the PNME that affects the A2 pathway. The masking effect due to a 250 Hz pure tone at 40 dB SPL is present below 1 kHz and its effect is equivalent to the one due to a narrow band noise (Xxxxxxx (2004) and Ehmer (1959)). Consequently, the A2PNME decreases the importance of the A2 at low frequencies (a little in opened ear A2PNMEop ≤ 0 and more in occluded ear A2PNMEoc < 0), whereas it has nearly no effect at high frequencies (A2PNME ≈ 0). We can therefore write A2 < A2a at low frequencies and A2 ≈ A2a at high frequencies.
1.6.4.2 At low frequencies
At low frequencies, from the analyses of both the subjective and the objective BC OEs as well as the physiological masking effect analysis, the following conclusions can de derived:
• Path identification: From the subjective BC OE, it was found that in an occluded ear the indirect path A2 transmits more energy than the direct path A1 (A2oc A1oc);
• OE: A positive OE results for both subjective and objective BC OE because occlusion in-
creases the energy transmitted by the A2a path (A2aoc A2aop ) as well as the energy transmitted by the A2 path (A2oc A2op) which dominates the A1 path . The PNME analysis showed that the perceived energy (hearing threshold) will be higher than the ear canal measured energy (SPL), hence A2op ≤ A2aop, A2oc < A2aoc.
As indicated in sections 1.6.2.1 and 1.6.3.1 for the subjective and objective BC OE, these conclusions are in agreement with the low frequency physical mechanism presented in section 1.2.
1.6.4.3 At high frequencies
At high frequencies, the relation A2aoc < A2aop from the objective BC OE and the relation A2 ≈ A2a from the PNME give: A2oc < A2op. This invalidates the first of the two possibilities found in section 1.6.2.2 that requires A2oc ≈ A2op, hence validating the second solution: X0xx X0xx and A1op A2op. The conclusions for the BC OE at high frequencies are:
• Path identification: in open ear and in occluded ear the direct path A1 is dominant.
• OE: There is no significant subjective OE because the direct path A1 is dominant over the indirect path A2 in both open and occluded ears. However there is a negative objective OE because the indirect path A2a transmits less energy when the ear is occluded than when the ear is open. Moreover the indirect path A2 is more important objectively and subjectively in opened ear than in occluded ear.
As explained for the objective BC OE in section 1.6.3.2, these results are in agreement with the high frequency physical mechanism presented in section 1.2.
1.6.5 Objective voice OE
30
25
20
15
10
5
0
−5
−10
−15
−20
125
250
500
1000
Frequency [Hz]
2000
4000
8000
Occlusion effect [dB]
According to Fig. 1.1, the objective BC corresponds to the sum of the paths A2a and B2a. In Fig. 1.11 three objective voice OEs from the literature are presented with our subjective VABC OE. As mentioned in section 1.6.3, the objective voice OE measurement from Lundh (1986) and Xxxxxx (1986) is issued from the same article as the objective BC OE shown in Fig. 1.10. Two others measurements have been found in the literature and are presented here.
Figure 1.11 Objective voice OEs of the literature [this study
−× −− Δ
( black); Xxxxx (1986), Xxxxxx (1986): ( dark gray) occlusion with earmould impression and bone excitation at the
− ·
contra-lateral mastoid; Xxxxxx (1996) according to Xxxxxx (1997, 1998): ( dark gray) occlusion with full concha earmould;
May (1992) according to Xxxxxx (2000) and Xxxxxx (1997, 1998): (··· ♦ dark gray) occlusion with unvented skeleton earmould].
1.6.5.1 At low frequencies
At low frequencies, the OE is positive, so A2aoc + B2aoc > A2aop + B2aop. In other words, for the objective voice at low frequencies we can conclude:
• OE: a positive OE results from the fact that occlusion increases the energy transmitted by the A2a and B2a paths.
As for the aforementioned low frequencies OEs, this conclusion is in agreement with the low frequency physical explanations presented in section 1.2.
1.6.5.2 At high frequencies
At high frequencies, the OE is negative, so A2aoc + B2aoc < A2aop + B2aop. In other words, for the objective voice at high frequencies we can conclude:
• OE: a negative OE results from the fact that occlusion decreases the energy transmitted by the A2a and B2a paths.
Like for previously presented high frequencies OEs, this result is in agreement with the high frequency physical mechanisms described in section 1.2.
1.6.6 Voice OE: Synthesis
In this section the previous specific OEs (from the literature and from the present research) are analyzed simultaneously using the diagram presented in Fig. 1.1. The VBC, the VABC and the voice conduction path (which includes the VBC and VABC paths) and their associated OEs are addressed successively in this section and the conclusions obtained are also presented in Table
1.1 where a question mark is placed in elements for which no answer has been found.
Table 1.1 Main results about path identification and OE for the VBC, the VABC and the voice conduction path
VBC | VABC | Voice | |||
Low frequencies | Path | Op. ear | ? | ? | ? |
Oc. ear | Indirect Direct | Indirect Direct | Indirect Direct | ||
OE | Obj. | OE > 0 | OE > 0 | OE > 0 | |
Subj. | OE > 0 | OE > 0 | OE > 0 | ||
High frequencies | Path | Op. ear | Direct Indirect | Indirect Direct | ? |
Oc. ear | Direct Indirect | ? | ? | ||
OE | Obj. | OE < 0 | OE < 0 | OE < 0 | |
Subj. | OE ≈ 0 | OE < 0 | ? |
1.6.6.1 At low frequencies
At low frequencies, all the measured OEs previously presented are found to be positive.
For the VBC path, the subjective measurement lead to the conclusion that in the case of the occluded ear, the indirect path A2 through the ear canal transmits more energy than the direct path A1 to the inner ear. On the other hand, the positive subjective and objective OEs result from the energy increase of the indirect path A2 (and from the previously mentioned fact that A2 dominates A1 for subjective measure) in the occluded ear case. Furthermore, the subjective OE is lower than the objective one due to PNME.
For the VABC path, as for the VBC one, the subjective measurement shows that the indirect path B2 through the ear canal transmits more energy to the inner ear than the direct path B1 in the occluded ear case. On the other hand, the positive subjective OE effect results from the energy increase of the indirect path B2 and from the aforementioned fact that B2 dominates B1 in the occluded ear case. This subjective VABC OE is fairly constant with frequency and lower than the subjective VBC OE which is inversely proportional to frequency. This difference cannot be readily explained. Two assumptions are proposed. The first one is the different natures of the VABC path (which starts with an air-solid coupling) and the VBC path (which starts with a solid-solid coupling). The second one is the different kinds of earplug used in the considered studies. As far as the objective VABC OE is concerned, although it was not measured, it should be also positive. In fact as the subjective VBC OE is lower than the objective one, the subjective VABC OE should be lower than the objective one.
For the voice conduction path, the objective voice OE is positive, which is justified by the fact that it includes the objective VBC OE and the objective VABC OE which are both positive. For an equivalent reason, even though no subjective voice OE measurements have been done, we can deduce that the subjective voice OE should be positive. As far as the path identification, we previously mention that the indirect path dominates the direct one for the VBC (A2 A1) and VABC (B2 B1) paths. As the voice conduction path is composed of both the VBC and
VABC paths, we can deduce that the indirect path (A2 + B2) should also dominate the direct one (A1 + B1) for the voice conduction path.
1.6.6.2 At high frequencies
At high frequencies, the measured OEs previously presented are found to be either approxima- tively null or negative.
For the VBC path, the subjective and objective measurement lead to the conclusion that in both cases of the occluded ear and the open ear, the direct path A1 to the inner ear transmits more energy than the indirect path A2 through the ear canal. This path identification results in practically no subjective VBC OE as the direct path A1 to the inner ear is nearly not affected by occlusion. However there is a negative objective VBC OE because the SPL measurement in the ear canal involves only the indirect path A2a whose energy appeared to be lowered by occlusion.
For the VABC path, the subjective measurement conducted in this study shows a negative OE instead of a null one from the VBC path. This leads to an inversion of the dominant path: it is now the indirect path B2 which dominates the direct path B1 in open ear. As the indirect path B2, which dominates the direct path B1, appears to be decreased by occlusion, the subjective OE is negative. Although the objective VABC OE has not been measured, we can deduce that it should be negative since the indirect path B2a is decreased by occlusion.
For the voice conduction path, the objective voice OE is negative since the indirect paths A2a and B2a appear to be decreased by occlusion. This is also justified by the fact the voice OE is a combination of the VBC one and the VABC one which are both negative.
1.7 Conclusions and Recommendations
In this article, several OE measurements are analyzed in order to characterize the “hollow voice” OE that creates a discomfort when people wear hearing aids or hearing protectors. A diagram of the internal sound path components involved in the perception of one’s own
voice is proposed in order to identify the different conduction paths of the internal voice. This internal voice path is subdivided into the VBC path due to a structure-borne source and the VABC path due to an airborne source. Both of these paths are subdivided into a direct path to the inner ear and an indirect one through the ear canal. The VBC path is characterized using literature results on objective and subjective BC OE measurements. For the VABC path, as no results were available in the literature, a new kind of subjective OE measurement is proposed and experimented in this article: a sound source (speaker) is placed at the input of the mouth. The internal voice path is described using the aforementioned VBC and VABC paths characterization and the literature results on objective voice OE measurements.
At low frequencies, in the case of the occluded ear, the measurements performed on the VBC and the VABC paths have shown that the indirect path through the ear canal to the inner ear dominates the direct one. It has been concluded from these observations that this is also true for the internal voice path which is the combination of these two paths as mentioned previously. Hence, the VBC path, the VABC path and the internal voice conduction path have identical be- haviors. In the case of the open ear, the measurements used in this paper did not permit to draw conclusions. It has been found that the objective and subjective VBC OE, the subjective VABC OE and the objective internal voice OE are positive. This means that the energy transmitted by the indirect path is increased by the occlusion. Hence, we have been able to conclude that the objective VABC OE and the subjective internal voice OE should also be positive.
At high frequencies, the behaviors of the VBC, the VABC and the internal voice conduction path differ. In the case of the occluded ear, the direct path dominates the indirect one for the VBC path, whereas it is the inverse for the VABC path, hence no conclusion could be drawn for their combination, i.e. for the internal voice conduction path. In the case of the open ear, the dominating path has been identified for the VBC path only: the direct path dominates the indirect one. The objective OE is found to be negative in the three cases: from the literature, the objective VBC OE and the objective voice OE are negative, which leads to the conclusion that the objective VABC OE should also be negative. The subjective VBC OE is about zero, the subjective VABC OE is negative and the subjective internal voice OE is unknown.
Additional experiments such as objective OE measurement for the VABC path and more gen- erally, non threshold subjective measurement method, need to be developed in order to further characterize the internal voice OE as well as its associated dominating path (direct versus indi- rect) in open and occluded ears.
Acknowledgements
The authors wish to express their appreciation to SONOMAX HEARING HEALTHCARE INC. (Montre´al, Canada) and to NSERC (Natural Sciences and Engineering Research Council of Canada) for their support in this project.
References
E. Ba´ra´ny, “A contribution to the physiology of bone conduction”, Acta Oto-Laryngol. 26, 1–223 (1938).
G. v. Be´ke´sy, “The structure of the middle ear and the hearing of one’s own voice by bone conduction”, X. Acoust. Soc. Am. 21, 217–232 (1949).
G. v. Be´ke´sy, “Bone conduction”, in Experiments in hearing, edited by X. X. Xxxxx (McGraw Hill, 1960), 127–203.
E. H. Xxxxxx and J. E. Xxxxxxx, “Influence of physiological noise and the occlusion effect on the measurement of real-ear attenuation at threshold”, J. Acoust. Soc. Am. 74, 81–94 (1983).
E. H. Xxxxxx, “Hearing protection devices”, in “Noise & Hearing Conservation Manual”, edited by X. X. Xxxxxx, X. X. Xxxx, X. X. Xxxxxxx, and L. H. Royster (American Industrial Hygiene Association, 1986), fourth edition, pp. 319–382.
X. Xxxxxx, “Hearing aid earmolds, earshells and coupling systems”, in “Hearing aids” (Thieme, New York, 2000), pp. 117–158.
R. H. Xxxxx, “Masking patterns of tones”, X. Acoust. Soc. Am. 31, 1115–1120 (1959).
X. X. Xxxxxxx, “Hearing : an introduction to psychological and physiological acoustics” (Mar- cel Dekker, New York, 2004), 4th edition.
M. O. Xxxxxx, “Occlusion effects: Part 1: Hearing aid users experiences of the occlusion effect compared to the real ear sound level”, Technical Report 71, Technical University of Denmark (1997).
M. O. Xxxxxx, “Occlusion effects: Part 2: A study of the occlusion effect mechanism and the influence of the earmould properties”, Technical Report 73, Technical University of Den- mark (1998).
X. Xxxxxx, “Auditory feedback of the voice in singing”, in Musical structure and cognition, edited by X. Xxxxxx, X. Cross, and X. Xxxx (Academic Press, London, 1985), pp. 259–286.
Industrial Noise Laboratory, “Test report hearing protector noise attenuation - ANSI S12.6 - 1997 (b) - reat - subject fit”, Technical Report, Federal University of Santa Catarina (2007).
ISO 389-7, (1996). “Acoustics - reference zero for the calibration of audiometric equipment - part 7: Reference threshold of hearing under free-fied and diffuse-field listening conditions”.
ISO 4869-1, (1990). “Acoustics - hearing protectors - part 1: Subjective method for the mea- surement of sound attenuation”.
ISO 8253-1, (1989). “Acoustics - audiometric test methods - part 1: Basic pure tone air and bone conduction threshold audiometry”.
ISO 8253-2, (1992). “Acoustics - audiometric test methods - part 2: Sound field audiometry with pure tone and narrow-band test signals”.
M. C. Killion, “The hollow voice occlusion effect”, in Proceedings of 13th Danavox Sympo- sium, volume 3, pp. 231–241 (1988).
X. Xxxxx, “Sound pressure in the ear with vented and unvented earmould”, Technical Report 28-8-1, Oticon Electronics A/S (1986).
X. Xxxxxxxxx, X. Xxxxxxxx, X. Good, and X. Xxxxxxxxx, “Examination of bone-conducted trans- mission from sound field excitation measured by thresholds, ear-canal sound pressure, and skull vibrations”, J. Acoust. Soc. Am. 121, 1576–1587 (2007).
A. M. J. Xxxxx and X. X. Xxxxx, “Hearing characteristics”, in Handbook of acoustical measure- ments and noise control, edited by C. M. Xxxxxx (Acoustical Society of America, Woodbury, NY, 1998), third edition, pp. 17.1–17.25 .
S. Xxxxxxxx, X. Xxxx, and X. X. Xxxxx, “Factors contributing to bone conduction: The middle ear”, X. Acoust. Soc. Am. 111, 947–959 (2002).
S. Xxxxxxxx, X. Wild, N. Xxxx, and X. X. Xxxxx, “Factors contributing to bone conduction: the outer ear”, J. Acoust. Soc. Am. 113, 902–913 (2003).
S. Xxxxxxxx and X. Xxxxxxxxx, “A model of the occlusion effect with bone-conducted stimula- tion”, Int. J. Audiol. 46, 595–608 (2007).
X. Xxxxxxxx, E. C. Xxxxxxxxxx, and X. X. Xxxxxxx, “The occlusion of the external ear canal: its effect upon bone conduction in cats”, Acta Oto-Laryngol. 61, 80–104 (1966).
X. Xxxxxxxx, “Bone conduction”, in Foundations of Modern Auditory Theory, edited by X. X. Xxxxxx (Academic Press, New York, 1972), volume 2, pp. 195–237.
J. Voix, “Mise au point d’un bouchon d’oreille “intelligent” (Development of a “smart” earplug)”, Ph.d. thesis, E´ cole de Technologie Supe´rieure, Montre´al, Canada (2006).
J. Voix and X. Xxxxxxx, “The objective measurement of individual earplug field performance”, X. Acoust. Soc. Am. 125, 3722-3732 (2009).
X. X. Xxxxxx, “The occlusion effect from earmoulds”, Hearing Instruments 37, 19, 57–58 (1986).
CHAPITRE 2
ARTICLE #2
“WAVELET SPEECH ENHANCEMENT FOR INDUSTRIAL NOISE ENVIRONMENTS”
Ce´cile Le Cocq, Xxxxxxxxx Xxxxxxx, Fre´de´xxx Xxxxxxx, E´ cole de technologie supe´rieure, Xxxxxxxxxxx xx Xxxxxxx,
Xxxxxxxxx (Xxxxxxx), Xxxxxx, X0X 0X0
This article has been submitted to Speech Communication 20 January 2009.
Re´sume´
Dans un milieu industriel bruite´, les travailleurs doivent souvent porter des protecteurs auditifs afin de pre´server leur audition. Toutefois ces protecteurs et les signaux sonores de fort niveau diminuent leurs capacite´s a` communiquer en alte´rant l’intelligibilite´ du signal de parole. Dans cet article, le de´bruitage de la parole est utilise´ pour parer a` cet inconve´nient. Un grand nom- bre de me´thodes classiques de de´bruitage par seuillage d’ondelettes sont teste´es sur plusieurs signaux de parole qui sont alte´re´s par l’addition de bruits industriels selon une large gamme de rapports signal a` bruit. Quatre crite`res de se´lection sont utilise´s afin de quantifier les re´sultats obtenus en terme de performance de de´bruitage. Une base de donne´es de grande taille est ainsi constitue´e. Son analyse a ne´cessite´ la cre´ation d’un algorithme de se´lection. Pour chacun des crite`res de se´lection, cet algorithme calcule l’efficacite´ relative de chacune des me´thodes de de´bruitage conside´re´es. Cet algorithme est utilise´ plusieurs fois dans la me´thodologie ge´ne´rale afin de de´terminer la ou les me´thodes qui permettent d’obtenir les meilleurs re´sultats pour tous les cas teste´s.
Abstract
In industrial noise environments, workers often have to wear hearing protectors inorder to pre- vent hearing loss. However, hearing protectors and loud signals can interfere with speech intel-
ligibility. In this paper speech denoising is used to overcome this problem. A large number of speech denoising methods based on classical wavelet thresholding are tested on several speech signals which have been altered by the addition of industrial noises according to a broad range of signal to noise ratios. Four selection criteria are used to quantify the denoising performance results. A large database has been generated, requiring the creation of a selection algorithm. This algorithm determines the relative efficiency of each of the techniques under consideration for a given selection criterion. This algorithm is used several times in the general methodology to determine which specific methods yield the best results for all the tested cases.
2.1 Introduction
Workers in industrial noise environments are exposed to noise levels that could cause severe hearing damage. In 1995, the number of individuals suffering from disabling hearing difficul- ties was estimated to be 120 million worldwide (WHO, 1999, Ch. 3). To minimize the number of new cases of deafness, the World Health Organization (WHO) issued recommendations, some of which have been adopted in their entirety or in part in the legislature (WHO, 2001, Ch 4.). The first recommendation consists in limiting the equivalent noise exposure level over 8 hours per day to 85 dB(A). Wearing hearing protectors is an effective and low cost way to meet this recommendation. Most hearing protectors do not allow the user to differentiate useful signals such as speech or warning sounds from harmful noises; consequently, workers wearing hearing protectors find themselves in a distorted low-level sound environment which no longer enables them to understand one another or to perceive danger (Xxxxxx et al., 2000, Ch. 10). This communication problem could be solved with hearing protectors in which a microphone collects sound outside the protector, a digital signal processor denoises speech and warning signals and a speaker emits the denoised signal sound under the protector. This solution re- quires addressing the issue of speech enhancement in industrial noise environments; this is the object of the research presented in this paper.
Several speech denoising techniques already exist (x.x. Xxxxxxx, 0000; Xxxxxxxxxx et al., 2001; Xxxxxx, 2002); the present study focuses on speech denoising methods by classical wavelet
thresholding, the basis of this method was developed in the 1990s, by Xxxxxx and Xxxxxxxxx in their pioneering work (Xxxxxx and Xxxxxxxxx, 1994). This technique, which makes use of the wavelet transforms, consists in shrinking the wavelet coefficients below a certain threshold in order to remove noise and enhance speech. They proposed several thresholding rules and different ways of computing the threshold, notably soft and hard thresholding rules, universal and SURE (Stein unbiased risk estimator) thresholds (Xxxxxx and Xxxxxxxxx, 1995). The cor- responding algorithms were originally designed for speech signals corrupted by white Gaussian noise. Xxxxxxxxx and Xxxxxxxxx (Xxxxxxxxx and Xxxxxxxxx, 1997) generalized these methods by adapting the threshold calculation to coloured noises. Following these developments, sev- eral variations of these algorithms were proposed, mainly by modifying the thresholding rule being applied. Xxxxxxxxxxx presented the μ-law thresholding rule (Xxxxxxxxxxx and Xxxxxxxxx, 2001), which fits between the hard and the soft thresholding rule.
A relatively small number of complete comparative studies on these speech denoising methods by classical wavelet thresholding can be found to date in the literature. The examples cited below present some of the more extensive comparative works undertaken in this field. In 2001, Xxxxxxxxxx, Xxxxx and Xxxxxxxxx (Xxxxxxxxxx et al., 2001) compared classical and bayesian wavelet denoising methods on a bank of theoretical mathematical signals altered by white Gaussian noise with different signal to noise ratios (SNRs). More recently, in 2003, Xxxxx and Xxxxxx (Xxxxx and Xxxxxxxxx, 2003) compared classical wavelet denoising methods applied to images altered by white Gaussian noise with more traditional imagery techniques using spacial filters. Finally, in 2006, Xxxx, Xxxxxxx-Xxxxxxxx and Xxxxxx (Xxxx et al., 2006) experimented with classical wavelet denoising methods on speech signals altered by white Gaussian noise.
Unlike the studies mentioned above (Xxxxxxxxxx et al., 2001; Xxxx et al., 2006; Xxxxx and Xxxxxxxxx, 2003), the speech denoising is performed in this paper in the industrial noise en- vironments. Our purpose is to find an algorithm which could be adequately used for hearing protectors. An industrial noise environment is defined, in the context of our work, as an en- vironment in which workers are exposed to noises produced by machineries or any other in- dustrial equipment. In this kind of environment, the SNRs extend on a wide range of values.
Hence the workers could have various levels of difficulties to understand speech depending upon their specific environment such as the distance at which they are from the noise and the sound sources. In this paper, the performance of certain classical wavelet denoising meth- ods is evaluated in an industrial noise environment. While studies in the literature often use white Gaussian noise (Xxxxxxxxxx et al., 2001; Xxxx et al., 2006; Xxxxx and Xxxxxxxxx, 2003), the speech signals considered in this study are altered by industrial noises recorded at fac- tory workstations. Also, our study will cover a wide range of SNRs spanning from -20 dB to
+20 dB, while the SNRs studied in the literature are rarely lower than -10 dB (Xxxx et al., 2006;
Xxxxx and Xxxxxxxxx, 2003). In order to quantify the performance of the denoising methods considered, four selection criteria among the most commonly used (Xxxxxx and Xxxxxx, 1998; Xxxx et al., 1992) are considered namely: the global and segmented SNRs, the mean square error and the Itakura-Saito distortion measure. To ensure unbiased results given the interdepen- dence of certain parameters, a global study is conducted to compare all the methods without having to set some of these parameters to specified values as it is often done in the literature (Xxxx et al., 2006; Xxxxx and Xxxxxxxxx, 2003). A selection algorithm has been developed to perform this global study.
This paper is organized as follows. Section 2.2 presents the denoising methods by classical wavelet thresholding theory with its various parameters: thresholding rule, threshold expres- sion and noise estimate expression. In section 2.3, the signals, methods and selection criteria are presented. Section 2.4 explains the general methodology used to select the denoising meth- ods. In section 2.5, the results are presented. Section 2.6 discusses both the approach and results. Lastly, section 2.7 presents the conclusions and recommandations.
2.2 Wavelet thresholding theory
If s denotes a pure speech signal and w a noise, the noised speech signal x can be defined as the sum of these two signals:
x = s + w (2.1)
Denoising by wavelets consists in thresholding the wavelet coefficients of the noised signal. By applying the wavelet transform to Eq. (2.1) we obtain the following equation, where the capital letter represents the wavelet transform of the signal represented by the corresponding lower case one:
X = S + W (2.2)
In this study, the considered algorithm is meant to be used for hearing protectors. Several types of wavelet transforms do exist. The wavelet transform to be utilized should meet two criteria: its implementation should be possible using a fast algorithm and it should permit perfect reconstruction of the signal. The dyadic wavelet transform meets these two criteria, as long as the analysis wavelet type is properly chosen.
Denoising by wavelet thresholding consists in applying a thresholding rule THR with a thresh- old T to the wavelet coefficients of the noised signal X to obtain the wavelet coefficients of the denoised signal S˜, which is an estimate of the wavelet coefficients of the pure speech signal S:
S˜ = THR (X, T ) (2.3)
The risk rTHR (s, T ) associated to a thresholding rule THR with a given threshold T is defined by:
¨ ¨2
˜
rTHR (s, T ) = E
¨ ¨
¨
¨
S − S
(2.4)
The following sections present the three wavelet denoising parameters: the thresholding rule, the threshold expression and the noise estimate expression.
2.2.1 Thresholding rules
The first two thresholding rules have been formulated by Xxxxxx and Xxxxxxxxx (Xxxxxx and Xxxxxxxxx, 1994). they are the soft (THRs) and the hard (THRh) thresholding rules where:
⎧
THRs (X, T ) =
sgn(X) (|X| − T ) , |X| ≥ T
(2.5)
THRh (X, T ) =
⎪⎩0 , |X| < T
⎧
⎪⎨X , |X| ≥ T
⎪⎩0 , |X| < T
(2.6)
Xxxxxxxxxxx (Xxxxxxxxxxx and Xxxxxxxxx, 2001) has proposed the μ-law thresholding rule
(THRμ):
THR
μ
(X, T ) =
⎧
⎪⎨X , |X| ≥ T
(2.7)
⎩
⎪
μ
T 1 (1 + μ)
|X/T | − 1
sgn(X)
, |X| < T
This rule is intermediate between the hard and the soft thresholding rules: as the value of the parameter μ increases, the μ-law thresholding rule tends toward the hard thresholding one. It has three advantages over the hard and soft thresholding rules: first, it is a continuous function, secondly, no coefficient is set to zero and thirdly the coefficients greater than the threshold are not modified, thus the speech signal intelligibility is less affected by it than by the hard or soft thresholding rules (Nordstro¨m et al., 1999; Xxxxxxxxxxx and Xxxxxxxxx, 2001).
2.2.2 Threshold expressions
The threshold expressions: universal, SURE and hybrid SURE, under consideration in this paper are presented in the next three sections. The fourth section presents the risk estimator expressions associated with the thresholding rules.
2.2.2.1 Universal threshold
The threshold value T can be determined in different ways. Xxxxxx and Xxxxxxxxx (Xxxxxx and Xxxxxxxxx, 1994) introduced the universal threshold Tu, which they defined for white Gaus-
sian noise. Xxxxxxxxx and Xxxxxxxxx (Xxxxxxxxx and Xxxxxxxxx, 1997) adapted this threshold for coloured noises Tu(j): the threshold is determined on each analysis level j of the wavelet transform instead of being determined globally:
Tu =
Tu (j) =
√
σ˜ 2 log N (2.8)
√
σ˜j 2 log N (2.9)
σ˜ and σ˜j are estimates of the standard deviation of the noise and will be presented in the
following section (see Section 2.2.3) and N is the number of signal samples that are considered.
2.2.2.2 SURE threshold
To lower the risk associated to a given thresholding rule with the universal threshold, espe- cially for low noise levels, Xxxxxx and Johnstone (Xxxxxx and Xxxxxxxxx, 1995) proposed the SURE (Stein unbiased risk estimator) threshold TSURE. As with the universal threshold, it was mainly meant for white Gaussian noise before being generalized by Xxxxxxxxx and Xxxxxxxxx (Xxxxxxxxx and Xxxxxxxxx, 1997) to coloured noises:
TSURE =
σ˜ SURE (X/σ˜) (2.10)
TSURE (j) =
σ˜j SURE (Xj/σ˜j) (2.11)
SURE (X) denotes the threshold value T for which the estimator r˜THR(s, T ) of the risk asso- ciated to the thresholding rule THR is minimal:
SURE (X) = arg min r˜THR(s, T ) (2.12)
0≤T
2.2.2.3 Hybrid SURE threshold
Xxxxxx and Xxxxxxxxx (Xxxxxx and Xxxxxxxxx, 1995) brought to evidence that the SURE threshold tends to be too low when the noise level is very high. To mitigate this inconvenience, they proposed the hybrid SURE threshold TSURE hybrid, which replaces the SURE threshold by
the universal threshold when the noise level is high:
,
− ≤
,⎨TSURE , X 2 Nσ2 ϵN
TSURE hybrid =
(2.13)
,Tu
, X 2 − Nσ2 > ϵN
with ϵN = σ2 N1/2 (log N)3/2 (2.14)
2.2.2.4 Risk estimator expression
In Eq. (2.12), an estimator of the risk associated to the thresholding rule THR is used since the true risk associated to a given thresholding rule expressed by Eq. (2.4) cannot be determined when the characteristics of the pure speech signal S are unknown. The risk estimator expres- sions for soft r˜THRs (s, T ) (Xxxxxx and Xxxxxxxxx, 1995) and hard r˜THRh (s, T ) (Xxxx et al., 1999) thresholding rules are:
r˜THRs r˜THR
ΣN
(s, T ) | = | N − 2 # {n : |X[n]| < T } |
(s, T ) | = | N − 2 # {n : |X[n]| < T } |
+
n=Σ1
+
[min (|X[n]| ,T )]2 (2.15)
|X[n]|2 (2.16)
h
|X[n]|<T
in which # {n : |X[n]| < T } denotes the number of different values for n that verify the in- equality |X[n]| < T .
The risk estimator associated to soft thresholding rule is unbiased: E {r˜THRs (s, T )} = rTHRs (s, T ) (Xxxxxx and Xxxxxxxxx, 1995). However, the estimator associated to hard thresh- olding rule is biased and its bias can be expressed as follows (Xxxx et al., 1999):
Biash = rTHRh (s, T ) − E {r˜THRh (s, T )}
ΣN
in which φ(u) =
1
√
σ 2π
= 2 T σ2
e−(u2/2σ2)
n=1
[φ (T − S[n]) + φ (T + S[n])] (2.17)
For the μ-law thresholding rule, based on the definition, we have formulated the true associated
risk as:
rTHRμ
(s, T ) =
ΣN
S
E
n=1
[n] − T
1 (1 + μ)|X[n]/T |
μ
1 sgn(
X[n])
2
(2.18)
−
However, a risk estimator associated to this rule is more complex to obtain because of the presence of the term (1 + μ)|X[n]/T| in Eq. (2.18).
2.2.3 Noise estimate expressions
Three main different approaches can be used to estimate the standard deviation of the noise based on the noised signal.
In the first approach the noise added to the speech signal has a unit variance (σ˜ = 1). This is a common procedure.
In the second approach, Xxxxxx and Xxxxxxxxx (Xxxxxx and Xxxxxxxxx, 1994) proposed the estimation of the white Gaussian noise standard deviation using the median absolute deviation (MAD) calculated on the first analysis level of the wavelet transform of the noised signal.
with MAD (X1) = median (|X1|)
σ˜ = MAD(X1)
0.6745
(2.19)
In the third approach, Xxxxxxxxx and Xxxxxxxxx (Xxxxxxxxx and Xxxxxxxxx, 1997) adapted Xxxxxx and Xxxxxxxxx’x estimator to coloured noises by determining an estimate for each analysis level j of the wavelet transform of the noised signal.
with MAD (Xj) = median (|Xj|)
σ˜j
= MAD(Xj)
0.6745
(2.20)
2.3 Presentation of the studied cases
The performance of wavelets thresholding methods presented (see Section 2.2) is studied for experimental signals formed by speech signals and industrial noises. The quality of the denois-
ing obtained is quantified using four selection criteria. A database of 42 508 800 values of the different criteria is generated, its composition is given in Table 2.1. Firstly, the experimental signals used in simulations are presented (see Section 2.3.1): 8 200 signals are tested. Sec- ondly, the parameters of wavelet denoising methods tested are exposed in details (see Section 2.3.2): 1 296 denoising methods are tested. Thirdly, the four selection criteria used to quantify the quality of the denoising achieved are explained (see Section 2.3.3).
2.3.1 The signals
The twenty speech signals used come from the TIMIT database (Xxxxxxxx et al., 1993). Ten English phrases spoken by a woman and the same ten phrases spoken by a man were selected.
For this study, an industrial noise is considered to be any noise to which a worker is exposed in an industrial workplace. Ten industrial noises are used. Two are extracted from the NOISEX database (NOISEX , 1990), they are recorded in a car factory. The eight others are recorded in different locations at the NORANDA CCR copper refinery in Montreal (J. Voix, personal communication, 2000).The ten industrial noises tested could almost be considered as stationary since their frequency components vary only slightly over time. In addition to these recorded noises, white and pink Gaussian noises are tested to compare their behaviour to the industrial noises one and to identify which one of the two best simulates industrial noises in a speech denoising treatment.
Thus, two hundred (20 × 10) speech-noise pairs are formed. Every noised speech signal x is obtained by adding an industrial noise w to a speech signal s according to Eq. (2.21).
s
x =
std(s)
10SNR /20 + w
std(w)
(2.21)
with
std(s) =
1 NΣ−1
N −1 n=0
(s[n] − s¯)
2 denoting the standard deviation of the signal s.
The two signals are first brought down to a unit variance. The signal to noise ratio SNR is chosen between −20 dB and 20 dB by 1 dB increment. A total of 41 SNRs are considered. Hence, a total of 20 × 10 × 41 = 8 200 noised speech signals x are used to evaluate the denoising methods studied.
Index | Item | Type | Nb of items | |||
1 a | Speech | Men | Signals 10 phrases | 20 | ||
b | Noise | Women Noisex | 10 phrases car factory | 2 | 10 | |
Noranda | copper refinery | 8 | ||||
2 | c Nb of signals | SNR | [−20 : 1 : 20] dB Methods | 20 × 10 × 41 | 41 = 8 200 |
Table 2.1 Signals, methods and criteria
a | Wavelet | Daubechies | 1, 4, 8 | 8 | |
Symlets | 4, 8 | ||||
Coiflets | 1, 2, 4 | ||||
b | Nb of levels | 6, 8, 10 | 3 | ||
c Denoising techniques parameters | |||||
α Thresholding | hard soft μ-law | μ = 102, 103, | 104, 105 | 6 | |
β Threshold | universal | 3 | |||
SURE hybrid SURE | |||||
γ Noise estimate | σ˜ = 1 first level each level | 3 |
Nb
Nb of denoising techniques 6 × 3 × 3 = 54
of methods 8 × 3 × 54 = 1 296
3 Criteria
a SNR Global 2
Segmental
b MSE 1
c IS 1
Nb of criteria 2 + 1 + 1 = 4
Total nb of criteria values: 8 200 × 1 296 × 4 = 42 508 800
2.3.2 The methods
To achieve the inverse wavelet transform of the denoised signal, the analysis wavelet type chosen must ensure perfect reconstruction of the signal. Daubechies, Symlets, Coiflets and biorthogonal wavelets have this property. Unlike the other types, biorthogonal wavelets use
two forms of wavelets: one for analysis and one for reconstruction. For a given order of filter, this family of wavelets uses twice the amount of memory as the other possible families and will therefore not be included in this study. We will consider only the Daubechies (db), Symlets (sym) and Coiflets (coif) wavelets. The choice of the wavelet order is also based on the amount of available memory: M order Daubechies and Symlets wavelets require 2M length filters while M order Coiflets wavelets require 6M length filters. The wavelets tested were chosen based on their current usage in the literature (Xxx and Xxxx, 1998; Xxx and Xia, 2001) and to limit the length of the filters: Daubechies wavelets order 1, 4 and 8; Symlets order 4 and 8; and Coiflets order 1, 2 and 4. To keep the algorithm calculation time within acceptable limits and still obtain sufficient accuracy, the closer number of analysis levels of the wavelet transform, which we have utilized, are 6, 8 and 10. Therefore, a choice of 8 analysis wavelet types applied to 3 values for the number of analysis levels are considered.
The wavelet thresholding algorithm (see Section 2.2) involves three parameters: the threshold- ing rule, the threshold expression and the noise estimate expression. The three thresholding rules presented (see Section 2.2.1) are considered here, i.e. soft, hard and μ-law threshold- ing rules. The third one is tested using four different μ parameter values (100, 1 000, 10 000 and 100 000). The four μ-law thresholding rules thus defined are distributed uniformly over a log scale (see Fig. 2.1) between the hard thresholding rule and a thresholding rule that would not modify the signal (THR(X, T ) = X). Therefore, six thresholding rules (soft threshold- ing, hard thresholding and four μ-law thresholdings) are considered for this study. Fig. 2.1 shows each of the thresholding rules considered. The three threshold expressions proposed (see Section 2.2.2) are considered: the univeral threshold, SURE threshold and hybrid SURE threshold. For the SURE and hybrid SURE thresholds, a thresholding rule risk estimator is necessary. For soft thesholding, the estimator is unbiased and could therefore be used to deter- mine the threshold. For hard thresholding rule, the estimator is biased. Therefore, the SURE threshold is lower than the value it would attain with an unbiased estimator (Xxxx et al., 1999). The thresholding rule tends to “let through” more coefficients than necessary, therefore it “let through” more noise. Denoising results are therefore poorer than it could be expected. For
T
0
−T
−T
0
T
Amplitude of noisy speech
hard soft
μ = 100
μ = 1000
μ = 10000
μ = 100000
Amplitude of estimated speech
Figure 2.1 Thresholding rules.
μ-law thresholding rule, the risk estimator associated to the thresholding rule (see Eq. (2.18)) is not easily obtainable. Since we had no optimal estimator of the risk associated to hard and μ-law thresholding rules, we used the risk associated to the soft thresholding rule for these two thresholding rules. As for noise estimates, the three expressions presented (see Section 2.2.3) are used.
A total of 6 × 3 × 3 = 54 denoising techniques are considered. These 54 denoising techniques are indexed in Table 2.2 for later identification. So a total of 8 × 3 × 54 = 1 296 denoising methods are considered for this study.
Table 2.2 Denoising techniques
Index Thresholding Threshold Noise estimate
1 σ˜ = 1
2 universal first level
3 each level
4 σ˜ = 1
5 hard SUREsoft first level
6 each level
7 σ˜ = 1
8 hybrid SUREsoft first level
9 each level
10
σ˜ = 1
11 universal first level
12 each level
13
σ˜ = 1
14 soft SUREsoft first level
15 each level
16
σ˜ = 1
17 hybrid SUREsoft first level
18 each level
19
σ˜ = 1
20 universal first level
21 each level
22
σ˜ = 1
23 μ-law SUREsoft first level
24 μ = 100 each level
25
σ˜ = 1
26 hybrid SUREsoft first level
27 each level
28
σ˜ = 1
29 universal first level
30 each level
31
σ˜ = 1
32 μ-law SUREsoft first level
33 μ = 1 000 each level
34
σ˜ = 1
35 hybrid SUREsoft first level
36 each level
37
σ˜ = 1
38 universal first level
39 each level
40
σ˜ = 1
41 μ-law SUREsoft first level
42 μ = 10 000 each level
43
σ˜ = 1
44 hybrid SUREsoft first level
45 each level
46
σ˜ = 1
47 universal first level
48 each level
49
σ˜ = 1
50 μ-law SUREsoft first level
51 μ = 100 000 each level
52
σ˜ = 1
53 hybrid SUREsoft first level
54 each level
2.3.3 The criteria
In order to quantify the objective performance of results obtained for each wavelet denoising method tested, four selection criteria of three different types were used (Xxxxxx and Xxxxxx, 1998; Xxxx et al., 1992):
a. Two criteria quantify the improvement in signal to noise ratio:
α. The global signal to noise ratio is defined as:
SNRglo = 10 log10
var(s) var(s − s˜)
(2.22)
with
var(s) =
1 NΣ−1
1 NΣ−1
N −1 n=0
(s[n] − s¯)
2 denoting the signal variance and
s¯ =
N
n=0
s[n] denoting the average signal.
It is one of the most popular measures for establishing signal denoising quality. It is the measure by which Xxxx, Xxxxxxxx and Xxxxxx (Xxxx et al., 2006) evaluate the quality of their speech enhancement algorithm.
β. The segmental signal to noise ratio is defined as:
SNR
1 MΣ−1
=
10 log
var(sm)
(2.23)
seg
M
m=0
10 var(sm
− s˜m)
with {sm}m=0···M−1 the M frames of the signal s.
Derived from the global SNR, the segmental SNR is the average of the SNRs calcu- lated for each signal frame. A speech signal is non-stationary, its frequency compo- sition varies over time; therefore the segmental SNR provides a more accurate qual- ity measure of the denoised speech signal than the global SNR (Xxxxxx and Xxxxxx, 1998).
b. One statistical criterion quantifies the errors introduced by denoising. It is the mean square error defined as:
MSE =
1 NΣ−0
X
xx0
(s[n] − s˜[n])2 (2.24)
The MSE is commonly utilized as a comparison measure between the original signal and the denoised one. It has been used by Xxxxxxxxxx, Xxxxx and Xxxxxxxxx (Xxxxxxxxxx et al., 2001) and by Xxxxx and Xxxxxx (Xxxxx and Xxxxxxxxx, 2003).
− 1
c. One criterion quantifies the distortion of the speech signal: the Itakura-Saito distortion mea- sure defined as:
1 MΣ−1 G2 →a
R →aT
G2
s
sm
sm
IS =
sm
s˜m sm s˜m
+ log
s˜m
(2.25)
M
m=0
2
G
s˜m
→asm R m T 2
→a
G
in which Rsm denotes the autocorrelation matrix of the frame sm, →asm the linear prediction
sm
sm
sm
filter ceofficients of the frame sm and G2 defined by G2 = Rs (1, :) · →aT .
m
The IS distortion measure is seldom used in the literature in this context (Xx and Xxxx, 2007). It permits however an objective quantification of the denoised speech signal intelli- gibility (Xxxxxx and Xxxxxx, 1998).
2.4 Methods selection methodology
The database of 42 508 800 values obtained with the four selection criteria considered is very large and hence cannot be easily analyzed. Moreover, the various denoising method parameters are not all independent one from each other. The analysis wavelet type and the number of anal- ysis levels are mutually independent and are also independent from the other parameters. On the other hand, the three denoising techniques parameters (the thresholding rule, the threshold expression and the noise estimate expression) are not mutually independent: the mathematical expression for the thresholding rule depends on the threshold expression which depends on the noise estimate expression (see Section 2.2). Therefore it has been chosen in this paper to study globally the influence of these three parameters on the denoising performance. The perfor- mance of the 54 denoising techniques should therefore be studied simultaneously, for each of the four selection criteria considered. For this purpose, we designed an appropriate algorithm to study the performance of these 54 denoising techniques. The general methodology used to process the database of 42 508 800 values consists of four steps, which are defined below and presented in Fig. 2.2.
1st Step
Select
The wavelet
The number
of analysis levels
2nd Step
Select, using the selection algorithm,
the denoising techniques that preserve intelligibility (Itakura-Saito distortion measure)
3rd Step
Select, using the selection algorithm,
the denoising techniques that separately
Maximize
global SNR
Maximize
segmental SNR
Minimize
MSE
4th Step
Select
the denoising techniques
that maximize simultaneously the performances for all of the three criteria (SNRglo, SNRseg and MSE)
Figure 2.2 Main flowchart of the general methodology used for methods selection.
In the first step, the wavelet type and the number of analysis levels that will give the best results are determined by comparing the average of the best performance results for the various experi- mental signals according to each selection criteria. In the second step, the denoising techniques that preserve intelligibility are determined based on the IS distortion measure criterion using the selection algorithm. In the third step, the techniques that give the best results separately in
terms of SNRglo, SNRseg or MSE are determined using the selection algorithm. In the fourth step, the techniques that give the best results simultaneously for all the three selection criteria, SNRglo, SNRseg and MSE are identified. These four steps are explained in the four following sections. The selection algorithm itself is presented in the fifth section.
2.4.1 First step: Selecting the adequate wavelet type and the number of analysis levels
The first step consists in applying the same algorithm twice to determine firstly the type of analysis wavelet and secondly, the number of analysis levels that will separately give the best performance according to the 4 selection criteria considered, namely SNRglo, SNRseg, MSE and IS. The used methodology is presented in Fig. 2.3 and explained below:
a. At this point, the entire database is considered.
b. The 42 508 800 values of the database are divided into 4 sets of 10 677 200 values. Each set contains values obtained for each of the four selection criteria (SNRglo, SNRseg, MSE and IS).
c. Each of the 4 sets is divided into n sub-sets (8 for the wavelet types and 3 for the number of analysis levels) according to the possible options for the parameter considered. As an example, for wavelet types, a sub-set represents the set of results obtained during the use of Daubechies wavelets of order 1 (db1), another for db4, another for db8, another for Symlet wavelets of order 4 (sym4), etc.
d. Each of these 4 × n sub-sets is divided into 41 groups according to 41 possible values of the
SNR of the noised signal (see Table 2.1).
e. For each of these 4 × n × 41 groups, the average for the 200 (20 × 10) speech-noise pairs for the best performance results (max or min) obtained is calculated.
f. Four figures (see for example Figs. 2.7 and 2.8 which will be presented in section 2.5.1), one per selection criterion, show these averages, according to the SNR of the noised signal for each of the n possible options for the parameter considered.
g. The option for the considered parameter (wavelet type or number of analysis levels) provid- ing the best average of the best performance results for the entire set of SNR of the noised signal, according to the four selection criteria (SNRglo, SNRseg, MSE and IS), is chosen.
All the database
42,508,800 criteria values
Draw 4 figures (one per criterion) with n plots each representing the average of the best performances
for each n possible options of the considered parameter according to the 41 SNR of the noised speech
Select the option
that gives the best average of the best performances according to the 4 criteria for the considered parameter
Determine the average of the best performances for each group of values
4 x n x 41 best performances
Subdivide each subset according to the 41 SNR of the noised speech
4 x n x 41 groups of values
Subdivide each set according to the n possible options
of the considered parameter (wavelet, number of analysis level)
4 x n subsets of values
Subdivide the database according to the 4 criteria (SNRglo, SNRseg, MSE, IS)
4 sets of 10,677,200 values
Figure 2.3 Flowchart of first step: Selecting the adequate wavelet type and the number of analysis levels.
At the end of this first step, the analysis wavelet type and the number of analysis levels, that enable us to obtain the best average for the best performance results according to the four selection criteria, SNRglo, SNRseg, MSE and IS, are determined.
2.4.2 Second step: Selecting the denoising techniques that preserve intelligibility
The second step consists in selecting the techniques that will preserve intelligibility of the denoised signal by using the IS distortion measure. The methodology is summarized in Fig.
2.4 and explained below:
The values of the Itakura-Saito distortion measures that correspond to the chosen wavelet
and number of analysis levels
Apply the selection algorithm
Select the techniques that preserve intelligibility
Figure 2.4 Flowchart of second step: Selecting the denoising techniques that preserve intelligibility.
a. The values considered here are the IS distortion measures that correspond to the choice made in step one in terms of the analysis wavelet type and of the number of analysis levels.
b. The selection algorithm, which will be explained in Section 2.4.5, is applied to the IS dis- tortion measures considered in a.
c. The techniques that ensure a preset minimum level of intelligibility of the denoised signal are selected; their efficiency must not be zero (the “efficiency” is defined in this article as the percentage of speech-noise pairs for which the technique has been selected).
The set of parameters (thresholding rule, threshold expression and noise estimate expression) that will preserve intelligibility of the denoised signal are identified from the outcome of this second step.
2.4.3 Third step: Selecting the denoising techniques that separately optimize each cri- terion SNRglo, SNRseg and MSE
The third step consists in applying the same algorithm three times to determine the techniques that first maximize the SNRglo criterion, second maximize the SNRseg criterion and third min- imize the MSE criterion. The methodology which is used and presented in Fig. 2.5 and ex- plained below is applied on each of the three above mentioned criteria:
The values of the SNRglo, SNRseg or MSE of the database that correspond to the chosen wavelet,
number of analysis levels
and techniques that preserve intelligibility
Apply the selection algorithm
Efficiency of each technique
in terms of SNRglo or SNRseg or MSE
Figure 2.5 Flowchart of third step: Selecting the denoising techniques that separately optimize each criterion SNRglo, SNRseg and MSE.
a. The values considered here are those of one of the three criteria SNRglo, SNRseg and MSE that correspond to the choices made in the first two steps in terms of analysis wavelet type, number of analysis levels and denoising techniques that preserve signal intelligibility.
b. The selection algorithm, which has already been used in the second step and which will be explained in Section 2.4.5, is applied to the values considered in a.
c. The efficiency for each of the techniques considered in terms of SNRglo or SNRseg or MSE, is obtained. For techniques that do not preserve signal intelligibility, the results are set to zero for all the three criteria SNRglo, SNRseg and MSE. The results obtained are presented
in a histogram (see for example Fig. 2.10 which will be presented is section 2.5.3), which represents the efficiency of each of the 54 denoising techniques considered.
2.4.4 Fourth step: Selecting the denoising techniques that simultaneously maximize per- formance for all the three criteria SNRglo, SNRseg and MSE
The fourth step consists in regrouping the results obtained in step three for all the three selection criteria, SNRglo, SNRseg and MSE. For each of these three selection criteria, a histogram (see for example Fig. 2.10 which will be presented is section 2.5.3), is plotted to show the efficiency of each of the 54 techniques considered. The comparison of these three histograms highlights the most efficient techniques.
2.4.5 Selection algorithm
Fig. 2.6 is a flowchart showing the selection algorithm. Its data input are the results obtained for a given criterion (SNRglo, SNRseg, MSE and IS), for 200 × 41 signals (200 speech-noise pairs for which the signals are added according to 41 SNR) denoised using the techniques considered. Here are the different steps performed for each speech-noise pair:
a. Only the selected database values that correspond to one of the selection criteria (IS for the second step of the general methodology, SNRglo, SNRseg or MSE for its third step as shown in Fig. 2.2), are considered.
b. The set of values to process are divided into 200 sub-sets corresponding to the results ob- tained for each of the 200 speech-noise pairs.
c. A minimum-quality test is then performed. Each of the 200 sub-sets, i.e. each speech-noise pair, is tested to ensure that a minimum quality of denoising is obtained. For example, if the sub-set of values considered contains the gains in terms of SNRglo for a given speech- noise pair, this sub-set is first divided into 41 groups according to the 41 possible values of the noised signal SNR. For each of these 41 groups, the maximum gain in terms of SNR is determined. The average of these 41 maximum gains is then calculated. This average of maximum gains must be greater or equal to the preset base value (1 dB for example)
to ensure that the speech-noise pair is valid. This base value is chosen depending on the type of criterion, the noised speech signals, the denoising techniques considered and the minimum quality one subsequently wishes to ensure. The study on the speech-noise pair is continued only if the minimum-quality test condition is satisfied. If not, the speech-noise pair considered cannot properly be denoised using the methods considered.
d. Each of the M sub-sets that meet the quality-test is divided into 41 groups according to the 41 possible values of the SNR of the noised signal.
e. For each of the M × 41 groups, the denoising techniques that provide the best performance
results are selected according to the following rule: the average of the performance should not be too far from the extremums obtained for the methods retained. Two parameters, cho- sen according to the type of criterion, the noised speech signals and the denoising techniques considered, enable us to quantify these deviations: an absolute deviation and a relative de- viation between the maximum and minimum values. For example, if the group of values considered contains gains in terms of SNRglo, the absolute deviation between the maximum and minimum values obtained for the retained denoising techniques must be lower than or equal to the preset maximum absolute deviation value (4 dB for example). The same procedure is applied for a preset maximum relative deviation (20% for example).
f. For every one of the M speech-noise pairs, the results obtained from the previous step for each of the 41 SNR are examined to identify and retain the techniques that yield the best performance results for the whole set of the 41 SNR values of the noised signal.
g. For each of the denoising techniques considered for the algorithm, the percentage of speech- noise pairs, for which this technique yields the best performance for all the 41 SNR of the noised signal, is calculated. This value is the “efficiency” defined is section 2.4.2.
The selection algorithm therefore enables us to determine the efficiency of each of the denois- ing techniques considered. The algorithm is blind: it has no information about the techniques it is testing and thus it cannot favour one over another.
Set of database values considered
Fail
Minimum quality test
on each subset
Pass
This speech-noise pair cannot be denoised
with the selected techniques
Select the techniques that give
the best results according to all 41 SNR
M subsets of techniques
Efficiency of each technique: the percentage of speech-noise pairs
for which this technique gives the best performances
Select the techniques that give
the best results for each group of values
M x 41 groups of techniques
Subdivide each subset according to 41 SNR of the noised speech
M x 41 groups of values
Subdivide the set of values according to the 200 speech-noise pairs
200 subsets of values
Figure 2.6 Flowchart of the selection algorithm.
2.5 Experimental results
2.5.1 First step: Choice of the analysis wavelet type and of the number of analysis levels
2.5.1.1 Choice of the analysis wavelet type
The results obtained for the choice of the analysis wavelet type in step 1 of the general method- ology are shown in Fig. 2.7.
20 30
Gain in SNRglo
Gain in SNRseg
25
15
20
10 15
10
5
5
db1 db4 db8 sym4 sym8
coif1
coif2
coif4
0
−20 −10 0 10 20
0
−20 −10 0 10 20
(a)
SNRin
(b)
SNRin
0.8
0.7
0.6
MSE
0.5
0.4
0.3
0.2
0.1
0
−20 −10 0 10 20
SNRin
(c)
3
2.5
2
IS
1.5
1
0.5
0
−20 −10 0 10 20
SNRin
(d)
Figure 2.7 First step experimental results: Choice of the analysis wavelet type. (a) Gain in SNRglo. (b) Gain in SNRseg.
(c) MSE. (d) IS.
If we consider the results in terms of the denoising gain of the global SNR, we notice that, except for the Daubechies wavelet of order 1, which does not perform as well as the other wavelets, especially when the SNR of the noised signal is positive, all the wavelets considered perform similarly over the entire range of SNRs of the noised signal (SNRin). Therefore,
among the tested analysis wavelet types, the one that gives the best performance with the smallest calculation time is the Daubechies wavelet of order 4. The results obtained for the three other selection criteria (segmental SNR, MSE and IS distortion measure) are similar.
2.5.1.2 Choice of the number of analysis levels
level 6
level 8
level 10
20
30
25
xxxxx 0
xxxxx 0
xxxxx 10
15
20
10
15
10
5
5
−20
(a)
1
0
−10
0
SN Rin
10
20
−20
(b)
3
0
−10
0
SN Rin
10
20
0.8
2.5
2
0.6
1.5
0.4
1
0.2
0.5
−20
(c)
0
−10
0
SN Rin
10
20
−20
(d)
0
−10
0
SN Rin
10
20
Gain in SN Rglo
Gain in SN Rseg
The results obtained for the choice of the number of analysis levels in step 1 of the general methodology are shown in Fig. 2.8.
MSE
IS
Figure 2.8 First step experimental results: Choice of the number of analysis levels. (a) Gain in SNRglo. (b) Gain in SNRseg. (c) MSE. (d) IS.
If we consider the results in terms of gain in segmental SNR, the more the SNR of the noised signal (SNRin) is negative, the more the gain obtained by increasing the number of analysis levels of the wavelet transform is significant. Good results have been found to be obtained
using ten analysis levels. The three other criteria (global SNR, MSE and IS distortion measure) lead to the same conclusion.
2.5.2 Second step: Selection of the denoising techniques that preserve intelligibility
IS
100
50
25
12
6
3
1.5
0.7
1
9 10
18 19
27 28
mu=100
36 37
mu=1000
45 46
54
mu=10000
mu=100000
hard
sof t
mu- law
Methods
Efficiency [ %]
The results obtained in step 2 of the general methodology for selecting the denoising techniques that preserve intelligibility by using the IS distortion measure are indicated in Fig. 2.9 . The efficiency of the 54 denoising techniques is shown. The histogram in Fig. 2.9 is presented with an ordinate in logarithmic scale in order to properly distinguish techniques with low denoising efficiency from those with zero efficiency.
Figure 2.9 Second step experimental results: Selection of denoising techniques that preserve intelligibility.
Of the 54 denoising techniques tested, all those that have a nonzero efficiency, that is all those that preserve the intelligibility for at least one speech-noise pair are considered in step three of the general methodology.
2.5.3 Third step: Selection of the denoising techniques that separately optimize each criterion SNRglo, SNRseg and MSE
100
75
50
25
0
SNRglo valid IS
1 9 10 18 19 27 28
36 37
45 46
54
100
75
50
25
0
SNRse g valid IS
1 9 10 18 19 27 28
36 37
45 46
54
100
75
50
25
0
MSE valid IS
1 9 10 18 19 27 28
36 37
mu=1000
45 46
54
mu=100
mu=10000
mu=100000
hard sof t mu- law
Methods
Efficiency [ %]
The results obtained in step 3 of the general methodology for selecting the denoising techniques that separately maximize the gain in global SNR, maximize the gain in segmental SNR and minimize the MSE are shown in Fig. 2.10.
Figure 2.10 Third step experimental results: Selection of denoising techniques that separately optimize each criterion SNRglo, SNRseg and MSE.
2.5.4 Fourth step: Selection of the denoising techniques that simultaneously maximize performances for all the three criteria SNRglo, SNRseg and MSE
One technique seems to outperform all others in terms of efficiency as much for gain in SNRglo, gain in SNRseg and MSE. This is technique No. 27 (defined in Table 2.2), which corresponds to the μ-law thresholding rule when μ = 100, the hybrid SUREsoft threshold and the estimate of the standard deviation of the noise on each analysis level.
For the experimental signals considered, among the denoising methods considered and accord- ing to the selection criteria considered, the method that yields the best performance results is therefore the denoising using the Daubechies wavelet of order 4 with a 10 analysis levels and the use of the μ-law thresholding rule when μ = 100 with the hybrid SUREsoft threshold and the estimate of the standard deviation of the noise on each analysis level.
The actual performances in term of gain inSNRglo, gain in SNRseg, MSE and IS obtained by the application of this method are presented on the table 2.3. The first two noises are those extracted from the NOISEX database, the eight followed are those recorded at the NORANDA CCR copper refinery. The results are presented for the noised signal of -20, 0 and 20 dB SNR. Althougth the results obtained for all the noises tested are not exactly the same since some noises are more difficult to treat than others, some general observations can be made. The behavior of these quantities (gain in SNRglo, gain in SNRseg, MSE and IS) vary according to the SNRin of the input signal more or less in the same way for all the noises tested For all noises, it is easier to have high gain in SNRglo and SNRseg at low SNRin rather than at high SNRin: at high SNRin the noise level is very low, so it is very difficult to suppress noise without suppressing speech, this is also why the MSE is higher at high SNRin than at low SNRin. On the other hand the speech intelligibility (IS) is more easily deteriorated at low SNRin rather than at high SNRin.
Table 2.3 Performance results for the best method for each noise
Criteria IS MSE Gain in SNRglo Gain in SNRseg SNRin [dB] −20 0 20 −20 0 20 −20 0 20 −20 0 20
Noise 1 5.0 1.7 0.1 0.2 0.5 0.8 8.2 2.9 1.3 10.3 3.8 1.8
2 2.4 0.4 0.0 0.1 0.2 0.4 10.4 6.4 4.4 12.1 6.3 4.4
3 1.9 0.2 0.0 0.0 0.1 0.2 16.9 11.3 6.7 17.9 12.5 8.4
4 2.4 0.6 0.0 0.0 0.2 0.3 15.8 8.1 5.0 16.8 8.4 5.9
5 2.2 0.4 0.0 0.0 0.1 0.8 15.8 9.1 1.3 16.6 10.0 3.9
6 2.2 0.3 0.0 0.1 0.2 0.3 9.2 7.1 5.0 12.6 7.7 5.3
7 3.6 1.5 0.1 0.0 0.4 0.7 16.0 4.4 1.4 16.7 6.0 1.8
8 3.0 1.4 0.1 0.0 0.4 0.9 13.4 3.9 0.6 14.8 5.1 1.1
9 1.9 0.3 0.0 0.1 0.2 0.5 12.8 6.2 3.2 14.4 6.9 3.8
10 3.3 1.5 0.1 0.1 0.4 0.8 12.6 4.2 1.1 14.6 5.1 1.6
2.6 Discussion
In this section, firstly, the general methodology developped to obtain our results is discussed. Secondly, the results are discussed. Thirdly, the independence or interdependence of parame- ters is examined. Finally, we will attempt to answer the following question: Which theoretical Gaussian noise (white or pink) best simulates industrial noises in a speech denoising treat- ment ?
2.6.1 Methods selection methodology: A positioning among similar studies
We shall begin by positioning our study among similar studies already presented in the litera- ture and which we mentioned in the introduction. Xxxxxxxxxx et al. (Xxxxxxxxxx et al., 2001) tested 34 wavelet denoising methods on a dozen of theoretical signals altered by white Gaus- sian noise. Their study is mostly qualitative and does not provide a systematic comparison of the results according to the theoretical signals considered. Xxxxx and Xxxxxx (Xxxxx and Xxxxxxxxx, 2003) conducted the same type of study by denoising images altered by a white Gaussian noise. They consider 36 denoising by wavelet methods and 20 spatial filter tech- niques. For each set of methods, they presented a global table of results, which they later used to interpret the influence of various parameters. The study by Xxxx et al. (Xxxx et al., 2006) focuses on a large number of wavelet denoising methods tested on speech signals altered by white Gaussian noise. They took into consideration 22 wavelet types, 4 numbers of analysis levels, 3 threshold expressions and 5 thresholding rules. Their study was based on the variation of one parameter at a time.
Instead of the traditional white Gaussian noise (Xxxxxxxxxx et al., 2001; Xxxx et al., 2006; Xxxxx and Xxxxxxxxx, 2003) used in the literature, we use industrial noises recorded in factories to be as realistic as possible. In addition, the range of SNRs considered is broad [-20 20] dB and subdivided into 1 dB increments, whereas the range of SNRs studied in the literature is seldom lower than -10 dB and often contains less than 10 different values (Xxxx et al., 2006; Xxxxx and Xxxxxxxxx, 2003). As shown in Table 2.1, 1 296 denoising methods are considered for this study; hence a global study analysis as the one conducted by Xxxxx (Xxxxx and Xxxxxxxxx,
2003) on 36 techniques would be difficult and lengthy to perform. A parameter by parameter study, such as the one conducted by Xxxx et al. (Xxxx et al., 2006) might have been possible, however, we chose to proceed otherwise to avoid affecting our results with the mathematical dependence of certain parameters on other ones (see Section 2.2). We were able to study the in- fluence of the analysis wavelet type and of the number of analysis levels independently because they are independent from the other parameters (see Sections 2.5.1.1 and 2.5.1.2). However the thresholding rule, the threshold expression and the noise estimate expression are not inde- pendent parameters, so they were studied simultaneously (see Sections 2.5.2, 2.5.3 and 2.5.4) using our selection algorithm (see Section 2.4.5). Among the four selection criteria considered, SNRglo, SNEseg, MSE and IS (see Section 2.3.3), we have chosen to favour speech intelligi- bility, which is quantified by the IS distortion measure because a method that would give very good gain in terms of SNR yet for which the denoised signal would be incomprehensible would not be useful for our purposes.
2.6.2 Analysis of experimental results
First we will discuss hereafter the choice of the analysis wavelet type. Then the number of analysis levels chosen will be discussed. Finally we will examine the choices made in terms of the parameters used in the denoising techniques (thresholding rule, threshold expression and noise estimate expression) in the third section.
2.6.2.1 Choice of the analysis wavelet type
Our study enables us to bring to evidence (see Section 2.5.1.1) that the Daubechies wavelets of order 4 provides good denoising results with a reasonably short processing time, while the Daubechies wavelets of order 1 gives slightly poorer results. We will try hereafter to give an explanation to these observations based upon the correspondence between the analysis level and the central frequency of the wavelet analysis bands. These central frequencies are shown in Fig. 2.11 for each tested wavelet type. On a logarithmic scale, excluding Daubechies wavelet of order 1, all wavelets have almost equal central frequencies, which differ by more or less 1 percent. The central frequencies for Daubechies wavelet of order 1 are shifted by more than