[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2006109268A1 - Automated speech disorder detection method and apparatus - Google Patents

Automated speech disorder detection method and apparatus Download PDF

Info

Publication number
WO2006109268A1
WO2006109268A1 PCT/IB2006/051144 IB2006051144W WO2006109268A1 WO 2006109268 A1 WO2006109268 A1 WO 2006109268A1 IB 2006051144 W IB2006051144 W IB 2006051144W WO 2006109268 A1 WO2006109268 A1 WO 2006109268A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
person
language analysis
analysis
oral response
Prior art date
Application number
PCT/IB2006/051144
Other languages
French (fr)
Inventor
Andreas Brauers
Andreas Kellner
Gerd Lanfermann
Jurgen Te Vrugt
Original Assignee
Koninklijke Philips Electronics N.V.
Philips Intellectual Property & Standards Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V., Philips Intellectual Property & Standards Gmbh filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2006109268A1 publication Critical patent/WO2006109268A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Definitions

  • the invention relates to the field of medical detection systems, especially to the field of automated medical systems. More particularly, the invention comprises a method and an apparatus for automated detection of a change in speech of a patient based on an evaluation of his/her speech performance. This change in speech may form part of an automated stroke detection system.
  • a patient can suffer from a stroke without being aware of it, it is important that people suspect to having a stroke, such as patients with a prior stroke, undergo regular observation with respect to their health state.
  • Tests for motor and speech disorders are often used by professionals to detect stroke.
  • the most prominent stroke symptoms are speech disorders such as Aphasia (acquired language disorder caused by a stroke or trauma), Apraxia (acquired articulation disorder cause by a stroke or trauma) or Dysarthria (motor speech disorder).
  • Patent application WO 02/39423 Al describes an automated computer based speech disorder therapy system based on a dialog with the patient.
  • the system is especially suited to train patients with a known speech disorder, such as stuttering.
  • the speech received from the patient is fed back to the patient in order to enhance his speech performance.
  • Patent application US 2004/0044273 Al proposes a system with a bilateral interlace, which examines the user for motor deficits by a variety of tests.
  • the system may also analyze visual recognition of objects and the analysis of speech is also mentioned.
  • the method and apparatus can also be used in evaluating a progress and relapse for patients under rehabilitation.
  • the invention provides method of identifying a change in speech of a person, the method comprising the steps of requesting a predefined speech input from the person, receiving an oral response from the person, performing a language analysis on the received oral response, comparing a result of the language analysis with a result of a corresponding previously obtained language analysis for the person, and detecting a change in speech from the person based on the step of comparing.
  • the step of comparing the language analysis result with previous language analysis results it is possible to provide a simple measure of a possible disorder, and it is possible to compare with language analysis results obtained for persons or group of persons that suffer from specific speech disorders.
  • the previous language analysis results that are used for the comparison may comprise one or more of the following: a) language analysis results obtained for the person under test, such as the latest performed language analysis test results, which will enable precisely track a development for the person rather than just detect a speech disorder or not.
  • the method can also function in a rehabilitation situation where it is not only the task to detect a speech disorder but rather to detect if the person with a known disorder has made a progress or has relapsed.
  • non- personalized data d) language analysis results obtained for a group of persons with a known disorder, such as a library with non-personalized data related to groups of persons with specific speech disorders, and e) static threshold values for certain parameters related to results of the language analysis, these threshold values being determined by a skilled professional within speech disorders.
  • any type of analysis performed in order to analyze the speech with respect to characterize specific aspects of the person's speech It is not necessary to include the step of performing speech recognition since a predefined speech input is requested, and thus the words expected to be received are known in advance. Rather, the analysis refers to the process of identifying specific parts of the received oral response that are essential in relation to detecting speech disorders. Speech recognition may be used, for example in order to test the received oral input for compliance with the expected, predefined speech.
  • the detected change in speech may be a simple yes/no changed speech.
  • the result may also be a graduated classification of the expected disorder, and it may be a classification pointing out an expected specific type of disorder, e.g. Aphasia, Apraxia or Dysarthria, but also Parkinson episodes and other motor-neurological disorders may be derived.
  • the detected change in speech may also comprise a scalar value indicating the severity of the detected speech disorder that can be used to evaluate a possible progress or relapse in a known speech disorder.
  • the speech change may comprise an indicator of whether the change has become more severe or if the person has improved his speech performance, such as may be used in a rehabilitation situation.
  • the step of requesting may comprise an acoustic request, such as a voice presented via a loudspeaker asking the patient to pronounce a predefined sentence. It may also comprise a visual request, such as a written sentence presented on a display. Such visual request may also be accompanied by drawings, photos and/or symbols.
  • the step of requesting may comprise a combination of both an acoustic and a visual request.
  • a subsequent stroke analysis may then be based on an overall evaluation of the different types of expected disorders, or the stroke analysis may based on only one of these disorders.
  • the step of performing comprises aligning the received oral response to a pattern of the predefined speech, i.e. the expected words, the aligning process comprising such as aligning phonemes.
  • the aligning process comprises extracting statistics related to phonemes in the received oral response.
  • it comprises extracting a speaking rate for the received oral response.
  • it comprises all of the three aforementioned steps.
  • the step of performing comprises an initial step of checking if the received oral response provides a suitable match with the predefined speech, i.e. if the speech output from the person comprises the expected words in the predefined speech. If a poor match is detected, the steps of requesting and receiving may be repeated if there are indications for patient non-compliance due to non-attention: e.g. if the patient has responded to similar request in the same session earlier. However, if a poor match is detected and if it can be derived from the oral response that the patient complied with the request, it can be assumed that the person possibly suffers from Aphasia.
  • the initial step may comprise extracting a length of the received oral response and comparing the length to an expected predefined maximal length.
  • a maximal length can be defined corresponding to an expected maximum length of the oral response.
  • From an extracted length of the oral response it is possible to determine if the oral response does not comply with the requested input, i.e. if the extracted length of the oral response exceeds the predefined maximal length.
  • patient compliance may be measured by comparing the expected duration of an acoustic response to the duration of the actual response. If the patient starts to articulate upon request but continuous past a certain time interval, which is depended on the request, Aphasia may be assumed. Other characteristics, which indicate Aphasia, are long pauses in the oral articulation. Such evaluation does not require the recognition of words.
  • the step of performing comprises aligning the received oral response to a pattern of the expected predefined speech, extracting statistics related to phonemes in the received oral response, extracting a speaking rate for the received oral response together with an initial step of checking if the received oral response provides a suitable match with the predefined speech.
  • the detected change in speech may be communicated to the person by acoustic means, such as a voice presenting the change, or the result may be presented by visual means, such as in a written sentence and/or using one or more symbols and/or using a visual scale or meter indicating the severity of changed speech.
  • acoustic means such as a voice presenting the change
  • visual means such as in a written sentence and/or using one or more symbols and/or using a visual scale or meter indicating the severity of changed speech.
  • the method can be implemented as an automated dialog with a person using a computer device with known input means such as a microphone and with known output means such as a display and/or a loudspeaker.
  • the method can be implemented with existing low-cost equipment, e.g. a Personal Computer, a Laptop Computer, or a Personal Digital Assistant (PDA) with appropriate software.
  • PDA Personal Digital Assistant
  • the method may also be included in diagnosis equipment that can also be implemented with low-cost hardware.
  • the method may be started by pressing a button, or it may be voice activated.
  • the invention provides an apparatus comprising: requesting means adapted to request a predefined speech input from a person, acoustic receiving means adapted to receive an oral response from the person, signal processing means adapted to perform a language analysis on the received oral response, to compare a result of the language analysis with a result of a corresponding previously obtained language analysis for the person, and to detect a change in speech based on said comparison.
  • the apparatus may further comprise communication means adapted to communicate the detected change in speech via a communication channel selected from the group consisting of: telephone net, Internet, mobile telephone net, Local Area Network, and near field communication techniques (e.g. Blue tooth).
  • a communication channel selected from the group consisting of: telephone net, Internet, mobile telephone net, Local Area Network, and near field communication techniques (e.g. Blue tooth).
  • the apparatus comprises stroke analysis means adapted to perform a stroke analysis based on the detected change in speech.
  • the stroke analysis may be based on additional parameters, such as inputs regarding the person's motor performance, e.g. vision data from a camera monitoring the person.
  • the apparatus may be a dedicated speech analysis device.
  • the apparatus may also comprise further analysis means.
  • Such further analysis means may comprise a camera so as to be able to film the person performing a motor task.
  • the apparatus is preferably adapted to perform a stroke analysis based on the detected change in speech.
  • a stroke analysis may be based on an overall evaluation of the changed speech and at least one of the further results.
  • the apparatus comprises storing means for storing the previously obtained language analysis results, i.e. data according to the above description, a)-d).
  • These historical analysis results may be stored in a memory or on a hard disk, or the apparatus may be adapted to retrieve such data from an external storing device, such as via a communication link, such as a telephone line, an Internet connection etc.
  • the apparatus may comprise means for automatically performing a test in a dialog with the person under test.
  • the apparatus comprises automatic adaptation means that automatically adapt the test to individual abilities or disabilities of the person.
  • the person may not be able to read, and thus the apparatus thus adapts to this and communicates with the person by speech messages via a loudspeaker instead of written messages on a display.
  • the knowledge about the disabilities or impairments of a user may either be supplied by a medical professional by means of database access or alike, or the knowledge may be build up during a session. In this case, the disability may be caused by stroke. Nevertheless, the sequence of test methods and further use input/output modalities is changed accordingly to ensure that the patient is able to perceive information.
  • the apparatus may comprise means for adapting the test sequence to previous test results, i.e. intelligent adaptation, e.g. the test sequence may be adapted to the results of the last test performed.
  • the invention provides a computer system adapted to perform the method according to the first aspect.
  • the computer system may comprise a Personal Computer, e.g. a Laptop computer, e.g. a Laptop computer with built-in microphone, display and loudspeaker.
  • a Personal Computer e.g. a Laptop computer, e.g. a Laptop computer with built-in microphone, display and loudspeaker.
  • An alternative implementation is a set-top box connected to a TV set, where the person can respond using a remote control.
  • the invention provides a computer executable program code adapted to perform the method according to the first aspect.
  • the invention provides a computer readable storage medium comprising a computer executable program code according to the fourth aspect.
  • the storage medium may be a hard disk, a floppy disk, a CD, a DVD, an SD card, a memory stick, a memory chip etc.
  • Fig. 1 shows a sketch of the principle of an automated dialog with a person
  • Fig. 2 shows a block diagram of essential part of a preferred apparatus according to the invention.
  • Fig. 3 shows a block diagram of steps in a preferred embodiment of the method
  • Fig. 4 shows a block diagram of steps in another preferred embodiment of the method.
  • Fig. 1 shows a block diagram illustrating the basic principle of a preferred apparatus DA according to the invention, based on an automated dialog with a person P.
  • Person P is subject to undergo a speech disorder test with the purpose of having a speech disorder result SDR as a result.
  • a test may be initiated by pressing a button, or the person P may orally request a test.
  • the test may also be initiated by remote control, such as controlled, by medical personal via telephone net (PSTN), the Internet or the like, in case the person P and the apparatus DA are located e.g. in the person's home and thus far away from medical personal.
  • PSTN telephone net
  • the apparatus DA may be programmed to start a test at a predefined time, such as by regular intervals in the course of a day, and the person P may then be by informed that it is time to a test by an alarm signal.
  • the core of the speech disorder test is that the apparatus DA communicates a request 1 to the person P to produce a predefined speech, and the oral response 2 from the person P is then received and processed by the apparatus DA.
  • the request 1 may be presented visually on a display and/or acoustically via a loudspeaker.
  • a test comprises a number of requests 1 and oral responses 2, and preferably at least some of the oral responses 2 comprise sentences of several words spoken by the person P in order to contain an appropriate amount of test samples to allow a reliable speech disorder result SDR.
  • possible requests 1 examples are:
  • Fig. 2 shows in block diagram from a preferred embodiment of the apparatus DA of Fig. 1, an embodiment that can be implemented with low cost components and thus serve as home-use equipment.
  • a processing unit PU is connected with input and output means: a microphone 10 that can receive oral responses from the person P, a display device 11 and a loudspeaker 12 which can both serve to present messages to the person, e.g. requests for oral responses and the final diagnosis.
  • Further input and output means connected to the processing unit PU may be means for performing external communication with a telephone line via (PSTN), via a mobile telephone net such as GSM, or via a connection to the Internet.
  • PSTN telephone line via
  • GSM mobile telephone net
  • Such external communication line may be used to deliver speech disorder results and/or stroke analysis to a remotely located receiver.
  • the processing unit PU may also be controlled via such communication line. i.e. it may be programmed; start/stop of a test may be controlled.
  • the communication line may also be used for a dialog between the person P and e.g. medical personal.
  • the person P may use a remote control, game-pad device, a keyboard, a joystick, a Blue tooth connected mobile phone, etc. connected to the processing unit PU to communicate with the apparatus DA.
  • a camera may also be connected to the processing unit PU so as to allow the apparatus DA to receive a visual representation of the person P.
  • the apparatus DA can then supplement the speech disorder test with a test of other motor abilities of the person P, such as by requesting the person P to perform movements of arms and legs etc. A more precise stroke analysis may be obtained with such enhancement.
  • the processing unit PU controls the dialog with the person P using the input and output means 10, 11, 12, and the processing unit PU processes the speech received from the person P via the microphone 10. Based on the processing of the speech input, the processing unit PU produces a speech disorder result SDR and optionally a stroke analysis based thereon. One of or both of these are then presented either on the display 11 or presented in spoken language via the loudspeaker 12.
  • the processing unit PU comprises computer means, i.e. comprising means for performing signal processing on the speech input.
  • the processing unit PU may be formed as a dedicated speech disorder/stroke device, e.g. with the necessary software permanently stored in a chip, or it may be formed by a general-purpose computer.
  • the apparatus DA is implemented as an automated dialog system using a Personal Computer (PC) with microphone 10, display 11 and loudspeaker 12 connected thereto. It may be a Laptop PC in which microphone 10, display 11 and loudspeaker 12 are integrated.
  • the dialog control and speech signal processing method are implemented in software. The speech signal processing method is described in the following.
  • the apparatus DA may as well be embedded in a dedicated stroke test device, which also performs other tests on the person P, e.g. motor tests using a camera.
  • the method may be run and controlled from a remotely based server connected via the Internet, via a telephone line or via a mobile telephone line, to a device, such as a PC, that merely handles data to and from the input/output means 10, 11, 12.
  • an automatic call for medical help is provided.
  • Such call may be achieved via the apparatus DA performs a pre-defined alarm call to medical personal via a telephone net or sends a message via the Internet.
  • Fig. 3 shows a block diagram with steps of a preferred speech signal processing method according to the invention.
  • a test as described in the foregoing is intended to produce a speech input SI of known content, i.e. a known sequence of words.
  • this speech input SI can be aligned to the expected speech pattern to identify words and phonemes.
  • the compliance of the speech input by the user with the given known content might be quantified.
  • a statistics on phoneme length can be used for evaluation of speech disorders when compared to previous results. This can be particularly helpful, when the acquired statistics can be compared to data from the same patient in an earlier (healthy) state.
  • This scenario makes the device useful for use during rehabilitation, as well as in a home setting for regular testing, e.g. of patients after rehabilitation.
  • the speech input SI is subject to a language analysis LA, i.e. including speech recognition, that results in a speech disorder classification CL based on previous analysis results PAR, and this classification CL that can be used to perform a speech disorder result SDR, and thus optionally a stroke analysis. It is in the classification CL process that the previous language analysis results are taken into account.
  • the language analysis LA process will be described in details in the following.
  • the speech input SI referred to is a representation of the oral response received from the person P, i.e. a data representation of an acoustic signal, e.g.
  • the language analysis LA preferably comprises a forced alignment FA performed on the speech input SI. Since the expected input is known in advance, since the person has been requested to pronounce specific words, a rating for the "closeness" of the speech input SI from the person P and the expected input is relevant for the analysis process based on the expected input. Two preferred variants of the forced alignment FA step will be described: a first one performed during analysis, and a second one performed after analysis.
  • the speech input SI signal is processed sequentially in temporal order.
  • the internal model of the speech analysis is utilized to compute if a certain phoneme (one might also think of other basic units) is detected, i.e. "recognized".
  • state-of-the art speech analysis or recognition systems use stochastic models, therefore not only one variant is considered but a wide range of candidate phonemes are obtained in parallel, generally all carrying a rating on the quality (e.g. some probability).
  • the next unit from the speech input SI is evaluated. This process builds up a large tree-like structure containing a variety of alternative recognition results for the speech input SI.
  • a (probability) model is used to rate all these different paths from the root to the leafs of the tree.
  • the known text input can now be used to restrict the search space (the "tree"). Instead of allowing all possible phonemes to occur at every time, only the sequences of phonemes, which comply, with the expected word sequence are allowed - even if the real speech input SI does not match the expected word sequence, it will be mapped onto phoneme sequences according to the expected input. Again the underlying (probability) models are used to compute a rating for the acoustic input given the expected input. A high rating indicates a "near-perfect" match; a low rating indicates mismatch between the speech input SI and the expected input.
  • each candidate is a sequence of words.
  • Each candidate can now be compared to the expected input, using e.g. equal number of words, number of equal words, number of different words, additionally inserted words, missing words, ... From this comparison (together with the rating from the recognition, if available) the "optimal" candidate from the set of candidates compared to the expected input can be chosen, or all candidates might be rejected. From the candidates computed in the forced alignment step FA, the best candidate according to an underlying rating is selected.
  • a format convert FC may be used to prepare the selected candidate for further processing by the next steps to obtain "features" from the selected recognition result.
  • the result after format conversion FC is applied to a speaking rate SRA and a phoneme statistics step PS.
  • additional information from the analysis process can be extracted from the speech input SI for the candidate under consideration.
  • This might include information on the phoneme sequence used to build up the candidate and a relation to the time-slots of the speech input SI that have been used to recognize a certain phoneme.
  • the temporal information can then be used to determine the speaking rate SRA of the patient, i.e. the speed of his/her speech (which might be normalized for each phoneme).
  • the result is preferably used for comparison with a speech rate of the patient obtained in previous utterances, such as in a healthy state of the patient, or in another known state of the patient.
  • an adaptation (personalization) of the system towards the patient over time can be obtained.
  • Statistics on properties of the phonemes might be used to find indications on non-regular speech, e.g. the mean length of certain phonemes, or the distribution which phonemes are used how often by the patient. Again these analyses can be compared to results from a non-related comparison group or compared to results collected from the patient, in the latter case the adaptation can be continued over time. Phonemes extracted from the input might be compared to phonemes obtained in previous interactions with the user to detect changes during time.
  • a classification CL is carried out, and a speech disorder result SDR is produced.
  • the above-mentioned parameters are compared to previous analysis results PAR, and from this comparison a result is derived.
  • a result is derived.
  • a person's previously recorded phoneme lengths are much shorter than present ones, this indicates that the person's speech has changed significantly, and it if is known that the person has a large risk of stroke, such significant prolonged phoneme lengths may be a strong indicator of a stroke.
  • a iundamental classification CL would be to distinguish between distorted and non-distorted speech. In the case of distorted speech, further classifications might lead to a more detailed reason for the distortion. The comparison may be based on similarity measures.
  • Fig. 4 shows the processing steps LA, CL, and SDD as explained in relation to Fig. 3. In order to enhance the precision of the classification mechanism, however in Fig. 4 additional steps are included before steps LA in order to exclude speech input SI, which do not comply with the predefined requested speech.
  • An initial speech recognition step SR has been included to test the speech input SI.
  • This speech recognition step SR may comprises a forced alignment such as described in connection with step FA of Fig. 3 together with an appropriate classifier.
  • the result of the speech recognition SR may be a single text sequence or a set of alternative candidates (e.g. represented in a n-best list or graph). These candidates might again contain further information, which was extracted during the recognition, e.g. ratings or temporal information on phonemes with respect to the speech input SI.
  • step EI the SR outcome is considered to be in compliance with an expected input, i.e. if further processing is reasonable and required, i.e. irregularities in speech are detected, or the input cannot be processed further.
  • the reason for stopping the processing can be the lack of dysfunctions or indications that the acoustic input is not reliable, thus the patient might be prompted to re-read the given sequence.
  • a confidence check CC can be performed to evaluate if the input was given with clear speech (but wrong words spoken) or with highly perturbed pronunciation. This can be used to decide whether re-input of the utterance is required etc.
  • the forced alignment FA step of the language analysis LA of Fig. 3 may be omitted, since such forced alignment might already have been performed in the SR step. If not, a forced alignment FA might be performed to support further analysis.
  • A, B and C can to some extent be handled by adding an additional dialog: also maybe using another type of interfacing.
  • the stored speech disorder result data for the person may be adapted to new speech disorder result data obtained, such as if the test has been performed and no stroke has been detected.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pathology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

Speech disorder detection method that can be implemented as an automated dialog with a person suspect to suffer from a change in speech, e.g. Aphasia, Dysarthria or Apraxia. Preferably, a stroke analysis is performed based on the change in speech. The method comprises performing a language analysis on a pre-defined spoken language received from a person under test . Preferably the language analysis comprises statistical analysis of phoneme lengths and speaking rate. These language analysis results are compared with results of previous language analysis results obtained either for the person under test, such as obtained when then the person was in a healthy or in another known health state. The previous analysis results may also include non- individual data such as obtained for a group of persons. If personal historical data are used, a development in a change of speech e.g. during rehabilitation can be tracked. The changed in speech is then based on the comparison of present analysis results with the previous results. Stored data may be adapted to new data, if no stroke was detected. The method is suited for implementation using low cost equipment and is thus suited for home-use, e.g. of persons during rehabilitation. A preferred apparatus can be implemented on: a stand-alone PC such as a Laptop computer, a set-top box connected to a TV set, a dedicated stand-alone device or a PC controlled from a server via the Internet- connection. Preferably, such apparatus can perform a dialog using a microphone, a loudspeaker and a display and adapt the test sequence to the response from the person.

Description

Automated speech disorder detection method and apparatus
The invention relates to the field of medical detection systems, especially to the field of automated medical systems. More particularly, the invention comprises a method and an apparatus for automated detection of a change in speech of a patient based on an evaluation of his/her speech performance. This change in speech may form part of an automated stroke detection system.
Correct and fast medical treatment of strokes or cerebrovascular accidents, i.e. vascular occlusion or bleeding of a brain artery, is crucial since parts of the brain are insufficiently supplied with oxygen and will thus be permanently damaged if not properly treated. Fast detection of a stroke state is crucial since the amount of brain damage can be reduced if therapy is initiated quickly - "time is brain". However, a patient may not be aware of a stroke, since such state may only imply deficiencies that are less noticeable and thus not recognized by the patient him/herself, and in fact it is a strong indication of a stroke that the patient suffers from severe inabilities without being aware of it.
Since a patient can suffer from a stroke without being aware of it, it is important that people suspect to having a stroke, such as patients with a prior stroke, undergo regular observation with respect to their health state. Tests for motor and speech disorders are often used by professionals to detect stroke. The most prominent stroke symptoms are speech disorders such as Aphasia (acquired language disorder caused by a stroke or trauma), Apraxia (acquired articulation disorder cause by a stroke or trauma) or Dysarthria (motor speech disorder).
Patients living alone are in a particular risk group since there are no bystanders around to detect disorders indicating a stroke, and therefore such patients may well have a stroke without seeking therapy. In order to detect a stroke before severe brain damages have occurred, the patient is preferably evaluated for stroke symptoms and warned. Evaluation may either be done in regular intervals or on demand. In the latter case, the patient suspects an abnormal condition, but is not able to contribute the pathology to a stroke event. In order to do so for patients during rehabilitation or for patients suspect to having a stroke, there is a need for an automated system for home use that enables the patient to run a self-test on a regular basis and thus detect a possible stroke in due time to seek medical therapy. Such system must be easy to operate, it must provide reliable results, and it should preferably be possible to implement with low cost components.
Paper "Automatic recognition of Dutch Dysarthric speech: a pilot study" by E. Sanders, M. Ruiter, L. Beijer and H. Strik, Proceedings of the 7th Int. Conf. on Spoken Language Processing (CSLP) pp. 661-664, September 2002, describes a speech recognition system with the purpose of recognizing Dutch Dysarthric speech.
Patent application WO 02/39423 Al describes an automated computer based speech disorder therapy system based on a dialog with the patient. The system is especially suited to train patients with a known speech disorder, such as stuttering. The speech received from the patient is fed back to the patient in order to enhance his speech performance.
Paper "Computer aided methods for diagnosis and therapy of speech breathing disorders", IEEE Engineering in Medicine & Biology Society 11th Annual Int. Conf, pp. 663-664 describes a computer aided method for diagnosis and therapy of speech breathing disorders caused by brain damage. This method includes recording corresponding speech and breathing data.
Patent application US 2004/0044273 Al proposes a system with a bilateral interlace, which examines the user for motor deficits by a variety of tests. The system may also analyze visual recognition of objects and the analysis of speech is also mentioned.
However, there is no description of a systematic approach to automatically identify speech disorders, like Aphasia, Apraxia or Dysarthria, such as for use in stroke assessment.
It may be seen as an object of the present invention to provide a method and an apparatus capable of providing an easy and reliable detection of a change in speech that can be performed by the patient himself. Preferably, the method and apparatus can also be used in evaluating a progress and relapse for patients under rehabilitation.
According to a first aspect, the invention provides method of identifying a change in speech of a person, the method comprising the steps of requesting a predefined speech input from the person, receiving an oral response from the person, performing a language analysis on the received oral response, comparing a result of the language analysis with a result of a corresponding previously obtained language analysis for the person, and detecting a change in speech from the person based on the step of comparing.
The step of comparing the language analysis result with previous language analysis results, it is possible to provide a simple measure of a possible disorder, and it is possible to compare with language analysis results obtained for persons or group of persons that suffer from specific speech disorders. In general, the previous language analysis results that are used for the comparison may comprise one or more of the following: a) language analysis results obtained for the person under test, such as the latest performed language analysis test results, which will enable precisely track a development for the person rather than just detect a speech disorder or not. The method can also function in a rehabilitation situation where it is not only the task to detect a speech disorder but rather to detect if the person with a known disorder has made a progress or has relapsed. Hereby a reliable detection of a possible new stroke can be obtained also in case of persons with known speech disorders since it is possible to adjust the speech disorder analysis in response to such known disorder. b) language analysis results obtained for the person under test, such as language analysis results obtained for the person while at a healthy state or at a state with a known and well-defined speech disorder. As in a), this improves the ability to track a development in a disorder. c) language analysis results obtained for a group of healthy person , i.e. non- personalized data, d) language analysis results obtained for a group of persons with a known disorder, such as a library with non-personalized data related to groups of persons with specific speech disorders, and e) static threshold values for certain parameters related to results of the language analysis, these threshold values being determined by a skilled professional within speech disorders.
In general, it may be preferred to use a combination of one or more of a)-e) in order to evaluate different parameters of the performed language analysis.
By language analysis is to be construed, any type of analysis performed in order to analyze the speech with respect to characterize specific aspects of the person's speech. It is not necessary to include the step of performing speech recognition since a predefined speech input is requested, and thus the words expected to be received are known in advance. Rather, the analysis refers to the process of identifying specific parts of the received oral response that are essential in relation to detecting speech disorders. Speech recognition may be used, for example in order to test the received oral input for compliance with the expected, predefined speech.
The detected change in speech may be a simple yes/no changed speech. The result may also be a graduated classification of the expected disorder, and it may be a classification pointing out an expected specific type of disorder, e.g. Aphasia, Apraxia or Dysarthria, but also Parkinson episodes and other motor-neurological disorders may be derived. The detected change in speech may also comprise a scalar value indicating the severity of the detected speech disorder that can be used to evaluate a possible progress or relapse in a known speech disorder. In case of comparison with previous language analysis results a history of a known change in speech, i.e. referring to a) in the above description, the speech change may comprise an indicator of whether the change has become more severe or if the person has improved his speech performance, such as may be used in a rehabilitation situation.
The step of requesting may comprise an acoustic request, such as a voice presented via a loudspeaker asking the patient to pronounce a predefined sentence. It may also comprise a visual request, such as a written sentence presented on a display. Such visual request may also be accompanied by drawings, photos and/or symbols. The step of requesting may comprise a combination of both an acoustic and a visual request.
In case the detected change in speech comprises a classification of expected different types of disorders, such as Aphasia, Dysarthria and Apraxia, a subsequent stroke analysis may then be based on an overall evaluation of the different types of expected disorders, or the stroke analysis may based on only one of these disorders.
Preferably, the step of performing comprises aligning the received oral response to a pattern of the predefined speech, i.e. the expected words, the aligning process comprising such as aligning phonemes. Preferably, it comprises extracting statistics related to phonemes in the received oral response. Preferably, it comprises extracting a speaking rate for the received oral response. Preferably, it comprises all of the three aforementioned steps.
Preferably, the step of performing comprises an initial step of checking if the received oral response provides a suitable match with the predefined speech, i.e. if the speech output from the person comprises the expected words in the predefined speech. If a poor match is detected, the steps of requesting and receiving may be repeated if there are indications for patient non-compliance due to non-attention: e.g. if the patient has responded to similar request in the same session earlier. However, if a poor match is detected and if it can be derived from the oral response that the patient complied with the request, it can be assumed that the person possibly suffers from Aphasia.
The initial step may comprise extracting a length of the received oral response and comparing the length to an expected predefined maximal length. For a given requested speech input, a maximal length can be defined corresponding to an expected maximum length of the oral response. From an extracted length of the oral response, it is possible to determine if the oral response does not comply with the requested input, i.e. if the extracted length of the oral response exceeds the predefined maximal length. This provides a simple test for compliance of the oral response with respect the expected speech input. Thus, patient compliance may be measured by comparing the expected duration of an acoustic response to the duration of the actual response. If the patient starts to articulate upon request but continuous past a certain time interval, which is depended on the request, Aphasia may be assumed. Other characteristics, which indicate Aphasia, are long pauses in the oral articulation. Such evaluation does not require the recognition of words.
In a preferred method, the step of performing comprises aligning the received oral response to a pattern of the expected predefined speech, extracting statistics related to phonemes in the received oral response, extracting a speaking rate for the received oral response together with an initial step of checking if the received oral response provides a suitable match with the predefined speech.
The detected change in speech may be communicated to the person by acoustic means, such as a voice presenting the change, or the result may be presented by visual means, such as in a written sentence and/or using one or more symbols and/or using a visual scale or meter indicating the severity of changed speech.
The method can be implemented as an automated dialog with a person using a computer device with known input means such as a microphone and with known output means such as a display and/or a loudspeaker. Thus, the method can be implemented with existing low-cost equipment, e.g. a Personal Computer, a Laptop Computer, or a Personal Digital Assistant (PDA) with appropriate software. The method may also be included in diagnosis equipment that can also be implemented with low-cost hardware.
The method may be started by pressing a button, or it may be voice activated.
According to a second aspect, the invention provides an apparatus comprising: requesting means adapted to request a predefined speech input from a person, acoustic receiving means adapted to receive an oral response from the person, signal processing means adapted to perform a language analysis on the received oral response, to compare a result of the language analysis with a result of a corresponding previously obtained language analysis for the person, and to detect a change in speech based on said comparison.
The same advantages and optional features as described for the method according to the first aspect apply for the apparatus according to the second aspect.
The apparatus may further comprise communication means adapted to communicate the detected change in speech via a communication channel selected from the group consisting of: telephone net, Internet, mobile telephone net, Local Area Network, and near field communication techniques (e.g. Blue tooth). Thereby, it is possible to request medical assistance in case an expected severe disorder and/or stroke has been derived by the apparatus.
Preferably, the apparatus comprises stroke analysis means adapted to perform a stroke analysis based on the detected change in speech. Optionally, the stroke analysis may be based on additional parameters, such as inputs regarding the person's motor performance, e.g. vision data from a camera monitoring the person.
The apparatus may be a dedicated speech analysis device. The apparatus may also comprise further analysis means. Such further analysis means may comprise a camera so as to be able to film the person performing a motor task.
The apparatus is preferably adapted to perform a stroke analysis based on the detected change in speech. In embodiments comprising further means adapted to produce further results, a stroke analysis may be based on an overall evaluation of the changed speech and at least one of the further results.
Preferably, the apparatus comprises storing means for storing the previously obtained language analysis results, i.e. data according to the above description, a)-d). These historical analysis results may be stored in a memory or on a hard disk, or the apparatus may be adapted to retrieve such data from an external storing device, such as via a communication link, such as a telephone line, an Internet connection etc.
The apparatus may comprise means for automatically performing a test in a dialog with the person under test. Preferably, the apparatus comprises automatic adaptation means that automatically adapt the test to individual abilities or disabilities of the person. E.g. the person may not be able to read, and thus the apparatus thus adapts to this and communicates with the person by speech messages via a loudspeaker instead of written messages on a display. The knowledge about the disabilities or impairments of a user may either be supplied by a medical professional by means of database access or alike, or the knowledge may be build up during a session. In this case, the disability may be caused by stroke. Nevertheless, the sequence of test methods and further use input/output modalities is changed accordingly to ensure that the patient is able to perceive information.
The apparatus may comprise means for adapting the test sequence to previous test results, i.e. intelligent adaptation, e.g. the test sequence may be adapted to the results of the last test performed.
In a third aspect, the invention provides a computer system adapted to perform the method according to the first aspect. The computer system may comprise a Personal Computer, e.g. a Laptop computer, e.g. a Laptop computer with built-in microphone, display and loudspeaker. An alternative implementation is a set-top box connected to a TV set, where the person can respond using a remote control.
In a fourth aspect, the invention provides a computer executable program code adapted to perform the method according to the first aspect.
In a fifth aspect, the invention provides a computer readable storage medium comprising a computer executable program code according to the fourth aspect. The storage medium may be a hard disk, a floppy disk, a CD, a DVD, an SD card, a memory stick, a memory chip etc.
In the following the invention is described in more details with reference to the accompanying Figures, of which
Fig. 1 shows a sketch of the principle of an automated dialog with a person,
Fig. 2 shows a block diagram of essential part of a preferred apparatus according to the invention, and
Fig. 3 shows a block diagram of steps in a preferred embodiment of the method, and
Fig. 4 shows a block diagram of steps in another preferred embodiment of the method.
While the invention is susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
Fig. 1 shows a block diagram illustrating the basic principle of a preferred apparatus DA according to the invention, based on an automated dialog with a person P. Person P is subject to undergo a speech disorder test with the purpose of having a speech disorder result SDR as a result. A test may be initiated by pressing a button, or the person P may orally request a test. The test may also be initiated by remote control, such as controlled, by medical personal via telephone net (PSTN), the Internet or the like, in case the person P and the apparatus DA are located e.g. in the person's home and thus far away from medical personal. Still another possibility for initiation of a test, the apparatus DA may be programmed to start a test at a predefined time, such as by regular intervals in the course of a day, and the person P may then be by informed that it is time to a test by an alarm signal.
The core of the speech disorder test is that the apparatus DA communicates a request 1 to the person P to produce a predefined speech, and the oral response 2 from the person P is then received and processed by the apparatus DA. Preferably, the request 1 may be presented visually on a display and/or acoustically via a loudspeaker. Preferably, a test comprises a number of requests 1 and oral responses 2, and preferably at least some of the oral responses 2 comprise sentences of several words spoken by the person P in order to contain an appropriate amount of test samples to allow a reliable speech disorder result SDR.
In this scenario, typical tests performed by medical professionals in order to identify possible stroke related speech disorders could be performed as well. Thus, possible requests 1 examples are:
Show a picture and ask the person P what is displayed in the picture
Let the person P read a given sentence (e.g. displayed on a display connected to the apparatus DA)
Let the person P speak a previously spoken sentence (e.g. the spoken sentence being presented via a loudspeaker connected to the apparatus DA)
Let the patient produce speech (e.g. ask the patient to enumerate the months of the year)
Fig. 2 shows in block diagram from a preferred embodiment of the apparatus DA of Fig. 1, an embodiment that can be implemented with low cost components and thus serve as home-use equipment. A processing unit PU is connected with input and output means: a microphone 10 that can receive oral responses from the person P, a display device 11 and a loudspeaker 12 which can both serve to present messages to the person, e.g. requests for oral responses and the final diagnosis.
Further input and output means connected to the processing unit PU may be means for performing external communication with a telephone line via (PSTN), via a mobile telephone net such as GSM, or via a connection to the Internet. Such external communication line may be used to deliver speech disorder results and/or stroke analysis to a remotely located receiver. The processing unit PU may also be controlled via such communication line. i.e. it may be programmed; start/stop of a test may be controlled. The communication line may also be used for a dialog between the person P and e.g. medical personal.
The person P may use a remote control, game-pad device, a keyboard, a joystick, a Blue tooth connected mobile phone, etc. connected to the processing unit PU to communicate with the apparatus DA.
A camera may also be connected to the processing unit PU so as to allow the apparatus DA to receive a visual representation of the person P. The apparatus DA can then supplement the speech disorder test with a test of other motor abilities of the person P, such as by requesting the person P to perform movements of arms and legs etc. A more precise stroke analysis may be obtained with such enhancement.
The processing unit PU controls the dialog with the person P using the input and output means 10, 11, 12, and the processing unit PU processes the speech received from the person P via the microphone 10. Based on the processing of the speech input, the processing unit PU produces a speech disorder result SDR and optionally a stroke analysis based thereon. One of or both of these are then presented either on the display 11 or presented in spoken language via the loudspeaker 12.
Preferably, the processing unit PU comprises computer means, i.e. comprising means for performing signal processing on the speech input. The processing unit PU may be formed as a dedicated speech disorder/stroke device, e.g. with the necessary software permanently stored in a chip, or it may be formed by a general-purpose computer.
In a preferred embodiment, the apparatus DA is implemented as an automated dialog system using a Personal Computer (PC) with microphone 10, display 11 and loudspeaker 12 connected thereto. It may be a Laptop PC in which microphone 10, display 11 and loudspeaker 12 are integrated. The dialog control and speech signal processing method are implemented in software. The speech signal processing method is described in the following. The apparatus DA may as well be embedded in a dedicated stroke test device, which also performs other tests on the person P, e.g. motor tests using a camera.
In principle, the method may be run and controlled from a remotely based server connected via the Internet, via a telephone line or via a mobile telephone line, to a device, such as a PC, that merely handles data to and from the input/output means 10, 11, 12.
In case of a strong indication of a stroke, it may be preferred that an automatic call for medical help is provided. Such call may be achieved via the apparatus DA performs a pre-defined alarm call to medical personal via a telephone net or sends a message via the Internet.
Fig. 3 shows a block diagram with steps of a preferred speech signal processing method according to the invention. A test as described in the foregoing is intended to produce a speech input SI of known content, i.e. a known sequence of words. Thus, without dedicated speech recognition this speech input SI can be aligned to the expected speech pattern to identify words and phonemes. The compliance of the speech input by the user with the given known content might be quantified. Based on the alignment, a statistics on phoneme length can be used for evaluation of speech disorders when compared to previous results. This can be particularly helpful, when the acquired statistics can be compared to data from the same patient in an earlier (healthy) state. This scenario makes the device useful for use during rehabilitation, as well as in a home setting for regular testing, e.g. of patients after rehabilitation.
Instead of phoneme statistics other changes of the speech output in the time or frequency domain can be checked as well. Also a combination of different analysis methods may be useful.
In Fig. 3 the speech input SI is subject to a language analysis LA, i.e. including speech recognition, that results in a speech disorder classification CL based on previous analysis results PAR, and this classification CL that can be used to perform a speech disorder result SDR, and thus optionally a stroke analysis. It is in the classification CL process that the previous language analysis results are taken into account. The language analysis LA process will be described in details in the following. In the following it is understood that the speech input SI referred to is a representation of the oral response received from the person P, i.e. a data representation of an acoustic signal, e.g. an amplitude versus time digital representation or a representation based on a set of features computed from the speech input at certain points in time, usually at a fixed interval rate (e.g. every 10 ms). The language analysis LA preferably comprises a forced alignment FA performed on the speech input SI. Since the expected input is known in advance, since the person has been requested to pronounce specific words, a rating for the "closeness" of the speech input SI from the person P and the expected input is relevant for the analysis process based on the expected input. Two preferred variants of the forced alignment FA step will be described: a first one performed during analysis, and a second one performed after analysis.
In the first one FA variant, during the analysis process, the speech input SI signal is processed sequentially in temporal order. At each time unit (e.g. each computed frame from the speech input SI) the internal model of the speech analysis is utilized to compute if a certain phoneme (one might also think of other basic units) is detected, i.e. "recognized". However, state-of-the art speech analysis or recognition systems use stochastic models, therefore not only one variant is considered but a wide range of candidate phonemes are obtained in parallel, generally all carrying a rating on the quality (e.g. some probability). Starting from each of these candidates, the next unit from the speech input SI is evaluated. This process builds up a large tree-like structure containing a variety of alternative recognition results for the speech input SI. A (probability) model is used to rate all these different paths from the root to the leafs of the tree.
The known text input can now be used to restrict the search space (the "tree"). Instead of allowing all possible phonemes to occur at every time, only the sequences of phonemes, which comply, with the expected word sequence are allowed - even if the real speech input SI does not match the expected word sequence, it will be mapped onto phoneme sequences according to the expected input. Again the underlying (probability) models are used to compute a rating for the acoustic input given the expected input. A high rating indicates a "near-perfect" match; a low rating indicates mismatch between the speech input SI and the expected input.
In the second FA variant, assume that a speech analysis has produced a set of candidates each representing a recognition result for the speech input SI, generally rated with some score. For illustration, each candidate is a sequence of words. Each candidate can now be compared to the expected input, using e.g. equal number of words, number of equal words, number of different words, additionally inserted words, missing words, ... From this comparison (together with the rating from the recognition, if available) the "optimal" candidate from the set of candidates compared to the expected input can be chosen, or all candidates might be rejected. From the candidates computed in the forced alignment step FA, the best candidate according to an underlying rating is selected. A format convert FC may be used to prepare the selected candidate for further processing by the next steps to obtain "features" from the selected recognition result. The result after format conversion FC is applied to a speaking rate SRA and a phoneme statistics step PS.
Together with the word sequence additional information from the analysis process can be extracted from the speech input SI for the candidate under consideration. This might include information on the phoneme sequence used to build up the candidate and a relation to the time-slots of the speech input SI that have been used to recognize a certain phoneme. The temporal information can then be used to determine the speaking rate SRA of the patient, i.e. the speed of his/her speech (which might be normalized for each phoneme). The result is preferably used for comparison with a speech rate of the patient obtained in previous utterances, such as in a healthy state of the patient, or in another known state of the patient. Thus, an adaptation (personalization) of the system towards the patient over time can be obtained.
Statistics on properties of the phonemes might be used to find indications on non-regular speech, e.g. the mean length of certain phonemes, or the distribution which phonemes are used how often by the patient. Again these analyses can be compared to results from a non-related comparison group or compared to results collected from the patient, in the latter case the adaptation can be continued over time. Phonemes extracted from the input might be compared to phonemes obtained in previous interactions with the user to detect changes during time.
To make the results of the analysis steps PS and SRA comparable to previous results obtained by the patient, a normalization NM of the computed outcome may be necessary.
Based on the normalized analysis results a classification CL is carried out, and a speech disorder result SDR is produced. The above-mentioned parameters are compared to previous analysis results PAR, and from this comparison a result is derived. As a simple example if a person's previously recorded phoneme lengths are much shorter than present ones, this indicates that the person's speech has changed significantly, and it if is known that the person has a large risk of stroke, such significant prolonged phoneme lengths may be a strong indicator of a stroke.
A iundamental classification CL would be to distinguish between distorted and non-distorted speech. In the case of distorted speech, further classifications might lead to a more detailed reason for the distortion. The comparison may be based on similarity measures.
Fig. 4 shows the processing steps LA, CL, and SDD as explained in relation to Fig. 3. In order to enhance the precision of the classification mechanism, however in Fig. 4 additional steps are included before steps LA in order to exclude speech input SI, which do not comply with the predefined requested speech.
An initial speech recognition step SR has been included to test the speech input SI. This speech recognition step SR may comprises a forced alignment such as described in connection with step FA of Fig. 3 together with an appropriate classifier. The result of the speech recognition SR may be a single text sequence or a set of alternative candidates (e.g. represented in a n-best list or graph). These candidates might again contain further information, which was extracted during the recognition, e.g. ratings or temporal information on phonemes with respect to the speech input SI.
In step EI the SR outcome is considered to be in compliance with an expected input, i.e. if further processing is reasonable and required, i.e. irregularities in speech are detected, or the input cannot be processed further. The reason for stopping the processing can be the lack of dysfunctions or indications that the acoustic input is not reliable, thus the patient might be prompted to re-read the given sequence.
If the answer to the question "Expected input?" is 'no' N, there are indications that further processing might not lead to meaningful results. If due to such input a final classification is not useful, a confidence check CC can be performed to evaluate if the input was given with clear speech (but wrong words spoken) or with highly perturbed pronunciation. This can be used to decide whether re-input of the utterance is required etc.
If the answer to the questions "Expected input?" is 'yes' Y, then input by the user is reasonable, shows dysfunctions, and requires more analysis. If further analysis is decided upon, the forced alignment FA step of the language analysis LA of Fig. 3 may be omitted, since such forced alignment might already have been performed in the SR step. If not, a forced alignment FA might be performed to support further analysis.
There may be different reasons for a speech input SI being different from expected:
A. Non-compliant user
B. Mal-operation
C. Background noise
D. Patient with Aphasia E. Different ways to respond to dialog
A, B and C can to some extent be handled by adding an additional dialog: also maybe using another type of interfacing. An example: "Did you repeat my sentence? If yes, please press "Enter" if no, please try now."
D - persons suffering from with Aphasia will produce the same type of "no sense"-input every time depending on the type of Aphasia (not able to produce or reproduce language).
E- different ways to respond to dialog should be treated by refining the forced alignment FA to the different possible valid answers, as for example "What does this image show?" "Animal", "Cat", "Black Cat" etc.
In case a test has been run and no stroke has been tested, the stored speech disorder result data for the person may be adapted to new speech disorder result data obtained, such as if the test has been performed and no stroke has been detected.
It is appreciated that various known speech recognition and analysis methods and techniques can be applied instead of or be used to supplement the described preferred embodiments.
Reference signs in the claims merely serve to increase readability. These reference signs should not in anyway be construed as limiting the scope of the claims.

Claims

CLAIMS:
1. Method of identifying a change in speech of a person, the method comprising the steps of requesting a predefined speech input from the person (P), receiving an oral response (SI) from the person, performing a language analysis (LA) on the received oral response (SI), comparing a result of the language analysis (LA) with a result of a corresponding previously obtained language analysis (PAR), and detecting a change in speech from the person based on the step of comparing.
2. Method according to claim 1, wherein the step of performing comprises aligning (FA) the received oral response (SI) to a pattern of the expected predefined speech.
3. Method according to claim 1, wherein the step of performing comprises extracting statistics related to phonemes (PS) in the received oral response (SI).
4. Method according to claim 1, wherein the step of performing comprises extracting a speaking rate (SRA) for the received oral response (SI).
5. Method according to claim 1, wherein the step of performing comprises an initial step (SR, EI, CC) of checking if the received oral response (SI) provides a suitable match with the predefined speech.
6. Method according to claim 5, wherein the initial step comprises extracting a length of the received oral response (SI) and comparing the length to an expected predefined maximal length.
7. Method according to claim 1, wherein the corresponding previously obtained language analysis (PAR) comprises previously obtained language analysis results (PAR) obtained by the person (P).
8. Apparatus (DA) comprising requesting means (11, 12) adapted to request a predefined speech input from a person, acoustic receiving means (10) adapted to receive an oral response from the person (P), signal processing means (PU) adapted to perform a language analysis on the received oral response, to compare a result of the language analysis with a result of a corresponding previously obtained language analysis, and to detect a change in speech based on said comparison.
9. Apparatus (DA) according to claim 8, further comprising communication means adapted to communicate the detected change in speech via a communication channel selected from the group consisting of: display (11), loudspeaker (12), telephone net, Internet, mobile telephone net, Local Area Network, and near field communication techniques.
10. Computer system adapted to perform the method according to any of the claims 1 to 7.
11. Computer executable program code adapted to perform the method according to any of the claims 1 to 7.
12. Computer readable storage medium comprising a computer executable program code according to claim 11.
PCT/IB2006/051144 2005-04-13 2006-04-13 Automated speech disorder detection method and apparatus WO2006109268A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05102909.8 2005-04-13
EP05102909 2005-04-13

Publications (1)

Publication Number Publication Date
WO2006109268A1 true WO2006109268A1 (en) 2006-10-19

Family

ID=36616823

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/051144 WO2006109268A1 (en) 2005-04-13 2006-04-13 Automated speech disorder detection method and apparatus

Country Status (1)

Country Link
WO (1) WO2006109268A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100298649A1 (en) * 2007-11-02 2010-11-25 Siegbert Warkentin System and methods for assessment of the aging brain and its brain disease induced brain dysfunctions by speech analysis
WO2014042878A1 (en) * 2012-09-12 2014-03-20 Lingraphicare America Incorporated Method, system, and apparatus for treating a communication disorder
WO2014062441A1 (en) * 2012-10-16 2014-04-24 University Of Florida Research Foundation, Inc. Screening for neurologial disease using speech articulation characteristics
WO2014188408A1 (en) * 2013-05-20 2014-11-27 Beyond Verbal Communication Ltd Method and system for determining a pre-multisystem failure condition using time integrated voice analysis
CN107111672A (en) * 2014-11-17 2017-08-29 埃尔瓦有限公司 Carry out monitoring treatment compliance using the speech pattern passively captured from patient environmental
FR3051280A1 (en) * 2016-05-12 2017-11-17 Paris Sciences Et Lettres - Quartier Latin DEVICE FOR RATING ACQUIRED LANGUAGE DISORDERS AND METHOD OF IMPLEMENTING SAID DEVICE
CN107456208A (en) * 2016-06-02 2017-12-12 深圳先进技术研究院 The verbal language dysfunction assessment system and method for Multimodal interaction
WO2018102579A1 (en) * 2016-12-02 2018-06-07 Cardiac Pacemakers, Inc. Multi-sensor stroke detection
US20190221317A1 (en) * 2018-01-12 2019-07-18 Koninklijke Philips N.V. System and method for providing model-based treatment recommendation via individual-specific machine learning models
US10430557B2 (en) 2014-11-17 2019-10-01 Elwha Llc Monitoring treatment compliance using patient activity patterns
CN110415783A (en) * 2018-04-26 2019-11-05 北京新海樱科技有限公司 A kind of Functional Activities of OT method of rehabilitation based on body-sensing
CN110720124A (en) * 2017-05-31 2020-01-21 国际商业机器公司 Monitoring patient speech usage to identify potential speech and associated neurological disorders
CN111276130A (en) * 2020-01-21 2020-06-12 河南优德医疗设备股份有限公司 MFCC cepstrum coefficient calculation method for computer language knowledge education system
US10796715B1 (en) 2016-09-01 2020-10-06 Arizona Board Of Regents On Behalf Of Arizona State University Speech analysis algorithmic system and method for objective evaluation and/or disease detection
US11139079B2 (en) 2017-03-06 2021-10-05 International Business Machines Corporation Cognitive stroke detection and notification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1089246A2 (en) * 1999-10-01 2001-04-04 Siemens Aktiengesellschaft Method and apparatus for speech impediment therapy
WO2002059856A2 (en) * 2001-01-25 2002-08-01 The Psychological Corporation Speech transcription, therapy, and analysis system and method
US20040044273A1 (en) * 2002-08-31 2004-03-04 Keith Peter Trexler Stroke symptom recognition devices and methods
WO2004034355A2 (en) * 2002-10-07 2004-04-22 Carnegie Mellon University System and methods for comparing speech elements
US20040230430A1 (en) * 2003-05-14 2004-11-18 Gupta Sunil K. Automatic assessment of phonological processes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1089246A2 (en) * 1999-10-01 2001-04-04 Siemens Aktiengesellschaft Method and apparatus for speech impediment therapy
WO2002059856A2 (en) * 2001-01-25 2002-08-01 The Psychological Corporation Speech transcription, therapy, and analysis system and method
US20040044273A1 (en) * 2002-08-31 2004-03-04 Keith Peter Trexler Stroke symptom recognition devices and methods
WO2004034355A2 (en) * 2002-10-07 2004-04-22 Carnegie Mellon University System and methods for comparing speech elements
US20040230430A1 (en) * 2003-05-14 2004-11-18 Gupta Sunil K. Automatic assessment of phonological processes

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100298649A1 (en) * 2007-11-02 2010-11-25 Siegbert Warkentin System and methods for assessment of the aging brain and its brain disease induced brain dysfunctions by speech analysis
WO2014042878A1 (en) * 2012-09-12 2014-03-20 Lingraphicare America Incorporated Method, system, and apparatus for treating a communication disorder
US10010288B2 (en) 2012-10-16 2018-07-03 Board Of Trustees Of Michigan State University Screening for neurological disease using speech articulation characteristics
WO2014062441A1 (en) * 2012-10-16 2014-04-24 University Of Florida Research Foundation, Inc. Screening for neurologial disease using speech articulation characteristics
US9579056B2 (en) 2012-10-16 2017-02-28 University Of Florida Research Foundation, Incorporated Screening for neurological disease using speech articulation characteristics
WO2014188408A1 (en) * 2013-05-20 2014-11-27 Beyond Verbal Communication Ltd Method and system for determining a pre-multisystem failure condition using time integrated voice analysis
CN107111672A (en) * 2014-11-17 2017-08-29 埃尔瓦有限公司 Carry out monitoring treatment compliance using the speech pattern passively captured from patient environmental
US10430557B2 (en) 2014-11-17 2019-10-01 Elwha Llc Monitoring treatment compliance using patient activity patterns
EP3221839A4 (en) * 2014-11-17 2018-05-16 Elwha LLC Monitoring treatment compliance using speech patterns passively captured from a patient environment
FR3051280A1 (en) * 2016-05-12 2017-11-17 Paris Sciences Et Lettres - Quartier Latin DEVICE FOR RATING ACQUIRED LANGUAGE DISORDERS AND METHOD OF IMPLEMENTING SAID DEVICE
CN107456208A (en) * 2016-06-02 2017-12-12 深圳先进技术研究院 The verbal language dysfunction assessment system and method for Multimodal interaction
US10796715B1 (en) 2016-09-01 2020-10-06 Arizona Board Of Regents On Behalf Of Arizona State University Speech analysis algorithmic system and method for objective evaluation and/or disease detection
WO2018102579A1 (en) * 2016-12-02 2018-06-07 Cardiac Pacemakers, Inc. Multi-sensor stroke detection
US11139079B2 (en) 2017-03-06 2021-10-05 International Business Machines Corporation Cognitive stroke detection and notification
CN110720124A (en) * 2017-05-31 2020-01-21 国际商业机器公司 Monitoring patient speech usage to identify potential speech and associated neurological disorders
CN110720124B (en) * 2017-05-31 2023-08-11 国际商业机器公司 Monitoring the use of patient language to identify potential speech and related neurological disorders
US20190221317A1 (en) * 2018-01-12 2019-07-18 Koninklijke Philips N.V. System and method for providing model-based treatment recommendation via individual-specific machine learning models
US10896763B2 (en) * 2018-01-12 2021-01-19 Koninklijke Philips N.V. System and method for providing model-based treatment recommendation via individual-specific machine learning models
CN110415783A (en) * 2018-04-26 2019-11-05 北京新海樱科技有限公司 A kind of Functional Activities of OT method of rehabilitation based on body-sensing
CN111276130A (en) * 2020-01-21 2020-06-12 河南优德医疗设备股份有限公司 MFCC cepstrum coefficient calculation method for computer language knowledge education system

Similar Documents

Publication Publication Date Title
WO2006109268A1 (en) Automated speech disorder detection method and apparatus
US10010288B2 (en) Screening for neurological disease using speech articulation characteristics
US10478111B2 (en) Systems for speech-based assessment of a patient's state-of-mind
JP4002401B2 (en) Subject ability measurement system and subject ability measurement method
US8200494B2 (en) Speaker intent analysis system
McKechnie et al. Automated speech analysis tools for children’s speech production: A systematic literature review
US9508268B2 (en) System and method of training a dysarthric speaker
Bunnell et al. STAR: articulation training for young children.
TWI665657B (en) Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method, and recording medium
EP3899938B1 (en) Automatic detection of neurocognitive impairment based on a speech sample
TW201913648A (en) Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method and program
Bone et al. Classifying language-related developmental disorders from speech cues: the promise and the potential confounds.
KR20220048381A (en) Device, method and program for speech impairment evaluation
JP7022921B2 (en) Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method and program
Gong et al. Towards an Automated Screening Tool for Developmental Speech and Language Impairments.
WO2002071390A1 (en) A system for measuring intelligibility of spoken language
JP4631464B2 (en) Physical condition determination device and program thereof
JP7307507B2 (en) Pathological condition analysis system, pathological condition analyzer, pathological condition analysis method, and pathological condition analysis program
CN110338747B (en) Auxiliary method, storage medium, intelligent terminal and auxiliary device for visual inspection
Middag et al. DIA: a tool for objective intelligibility assessment of pathological speech.
JP7479013B2 (en) Method, program, and system for assessing cognitive function
McKechnie Exploring the use of technology for assessment and intensive treatment of childhood apraxia of speech
CN116705070B (en) Method and system for correcting speech pronunciation and nasal sound after cleft lip and palate operation
CN116189668B (en) Voice classification and cognitive disorder detection method, device, equipment and medium
Pompili et al. Speech and language technologies for the automatic monitoring and training of cognitive functions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06727912

Country of ref document: EP

Kind code of ref document: A1