[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109994211B - Modeling method for chronic kidney disease worsening risk based on EHR data - Google Patents

Modeling method for chronic kidney disease worsening risk based on EHR data Download PDF

Info

Publication number
CN109994211B
CN109994211B CN201910263178.8A CN201910263178A CN109994211B CN 109994211 B CN109994211 B CN 109994211B CN 201910263178 A CN201910263178 A CN 201910263178A CN 109994211 B CN109994211 B CN 109994211B
Authority
CN
China
Prior art keywords
data
information
task
binary
logistic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910263178.8A
Other languages
Chinese (zh)
Other versions
CN109994211A (en
Inventor
莫毓昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910263178.8A priority Critical patent/CN109994211B/en
Publication of CN109994211A publication Critical patent/CN109994211A/en
Application granted granted Critical
Publication of CN109994211B publication Critical patent/CN109994211B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Physiology (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Veterinary Medicine (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a modeling method of chronic kidney disease worsening risk based on EHR data, which relates to the technical field of medical treatment. The advantages are that: the EHR data (data set) of the patient can be incorporated into the logistic regression model, and the characteristics of the EHR data that different periods of tracking of chronic kidney disease can be performed are utilized to obtain a more optimized logistic regression model.

Description

Modeling method for chronic kidney disease worsening risk based on EHR data
Technical Field
The invention relates to the technical field of medical treatment, in particular to a modeling method of chronic kidney disease worsening risk based on EHR data.
Background
By tracking repeated measurements of patient status over time, EHR data contains important information about disease progression, and patient data is recorded only when a physical examination is made or the patient is in a hospital for regular medical care, resulting in irregular sampling of the data, another feature of EHR data is the tracking of the patient over different periods.
Chronic kidney disease is generally defined by loss of kidney function, as indicated by an estimated glomerular filtration rate (evfr), which is calculated from serum creatinine. Chronic kidney disease is divided into five stages, stage 3b being the turning point for progression to End Stage Renal Disease (ESRD) and cardiovascular adverse outcomes. Current research aims at developing predictive models of progression to ESRD or death, and many of the past studies in order to predict progression use data from carefully controlled prospective studies. Thus, using the data in the longitudinal medical history available in EHR, it is important to predict the risk of exacerbation of chronic kidney disease the next year, considering patients with mild to moderate kidney function impairment.
Disclosure of Invention
The invention aims to provide a modeling method for chronic kidney disease worsening risk based on EHR data, so as to solve the problems in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a modeling method of chronic kidney disease exacerbation risk based on EHR data, comprising the steps of:
s1, from patient at t 0 Extracting predictive factors from the previous EHR data, and extracting each of the predictive factorsThe prediction factors are mapped into binary variables respectively, and the binary variables are combined into a set to obtain a first data set;
s2, setting granularity level of time windows, dividing the first data set into T non-overlapping time windows according to the granularity level, and according to the latest time window T 0 And establishing a logistic regression model by adopting a multitask-temporal inclusion method.
Preferably, the predictors include diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information for the patient; the diagnosis information and the prescription drug information belong to classification variable type prediction factors, and the prediction factor value is set to be zero; the laboratory measurement information, vital sign information and statistical data information belong to numerical predictors, the predictor value of which is the value of the most recent time window.
Preferably, the method is characterized in that: the classified variable type predictors are expressed as a binary variable, and each numerical type predictor is respectively discretized into a binary variable according to the quartiles of each numerical type predictor.
Preferably, the first data set is represented as follows,
Figure BDA0002015944930000021
wherein D is the first dataset, x i For the ith example, F is the dimension of the feature space and N is the number of examples.
Preferably, the multitasking temporal method comprises the steps of,
using the task of each of the T-th window predictors as a separate task, wherein t=1, 2,..t;
for each task t, acquiring a second data set corresponding to each task t;
jointly learning all tasks t by using a multitasking formula;
a logistic regression model is obtained.
Preferably, the second data set is represented as follows,
Figure BDA0002015944930000022
wherein Dt is the second dataset, x it For the binary variable extracted from the t-th window, F1 is the binary variable number, N t Representing an example number of tasks t; the relationship between the binary variable number and the dimension of the feature space is f=f1×t;
the multi-tasking formula is shown below,
Figure BDA0002015944930000023
wherein U is t Is the weight of the t-th task, lambda 1 And lambda (lambda) 2 Is a tuning parameter.
Preferably, the logistic regression model is expressed as follows,
Figure BDA0002015944930000024
wherein W is E R F Is the feature weight and c is the intercept.
The beneficial effects of the invention are as follows: by extracting predictors from the patient EHR, mapping the predictors into binary variables, forming a dataset, and incorporating the patient EHR data (dataset) into a logistic regression model by employing a multitasking temporal approach, a logistic regression model that is more optimized than conventional non-temporal incorporation methods is obtained.
Drawings
FIG. 1 is a flow chart of a modeling method in an embodiment of the invention;
FIG. 2 is a schematic diagram of a multi-task temporal inclusion process for including EHR data into a logistic regression model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a non-temporal inclusion process for the inclusion of EHR data into a logistic regression model according to an embodiment of the present invention;
FIG. 4 is an average performance of two different methods with a preset threshold of 10% in an embodiment of the present invention;
fig. 5 shows the average performance of two different methods with a preset threshold of 20% in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.
Example 1
As shown in fig. 1, the invention provides a modeling method for chronic kidney disease exacerbation risk based on EHR data, which comprises the following steps:
s1, from patient at t 0 Extracting predictive factors from the prior EHR data, mapping each predictive factor into binary variables respectively, and combining each binary variable into a set to obtain a first data set;
s2, setting granularity level of time windows, dividing the first data set into T non-overlapping time windows according to the granularity level, and according to the latest time window T 0 And establishing a logistic regression model by adopting a multitask-temporal inclusion method.
Example two
As shown in fig. 1 to 5, in the present embodiment, the example data includes electronic health files of patients of the medical society of the ciclesonide mountain hospital and the ciclesonide mountain instructor in new york city, and data is extracted from patients diagnosed with hypertension, diabetes or both of them with impaired renal function. The EHR (electronic health record) of the patient includes diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information of the patient, which are extracted as predictors as shown in fig. 2. Wherein the predictors include diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information for the patient; the diagnosis information and the prescription drug information belong to classification variable type prediction factors, and the prediction factor value is set to be zero; the laboratory measurement information, vital sign information and statistical data information belong to numerical predictors, the predictor value of which is the value of the most recent time window.
In this embodiment, the classification variable type predictors are expressed as one binary variable, and each numerical type predictor is respectively discretized into a binary variable according to the quartiles of each numerical type predictor. That is, all predictors are represented by binary variables; for the numerical predictor, the mean value, the median value, the minimum value, the maximum value and the standard deviation of the numerical predictor in a specified time window are calculated, and furthermore, the linear slope of the numerical predictor in a time window is calculated, a line suitable for data is fitted by using a least square method, and the fitted slope is characterized. The diagnostic information and prescription drug information are represented as a binary variable representing whether the patient associated with this example has been assigned an ICD-9 code or prescription drug within a time window specified in the past.
In this embodiment, all binary variables are grouped into a set, forming a first data set, the first data set being represented as follows,
Figure BDA0002015944930000041
wherein D is the first dataset, x i For the ith example, F is the dimension of the feature space and N is the number of examples. Then, given the granularity level L of the time window, dividing the patient history into T non-overlapping time periods, and utilizing the latest time period T 0 And establishing a logistic regression model f, namely RF-R.
In this embodiment, as shown in fig. 2, the multitasking temporal method includes the steps of using each task of the T-th window prediction result as a separate task, where t=1, 2,..t; for each task t, acquiring a second data set corresponding to each task t; jointly learning all tasks t by using a multitasking formula; obtaining a logistic regression model; the second data set is represented as follows,
Figure BDA0002015944930000042
wherein Dt is the second data set,x it For the binary variable extracted from the t-th window, F1 is the binary variable number, N t Representing an example number of tasks t; the relationship between the binary variable number and the dimension of the feature space is f=f1×t; the multi-tasking formula is shown below,
Figure BDA0002015944930000043
wherein U is t Is the weight of the t-th task, lambda 1 And lambda (lambda) 2 Is a tuning parameter. Once the logistic regression model is obtained, for the new example x i By prediction from each f t (x it ) Intermediate prediction is obtained
Figure BDA0002015944930000044
Then averaging the predictions obtained from the time window T generates a single prediction +.>
Figure BDA0002015944930000045
In this embodiment, the multitasking method does not make any estimation of the time window where patient information is small, and learns f t When generating predictions for a new example using only examples of at least five medical events within the time window t, a single prediction is generated using a time window of at least 5 medical events and then averaging.
In this embodiment, the logistic regression model obtained in step S2 is represented as follows,
Figure BDA0002015944930000051
wherein W is E R F Is the feature weight and c is the intercept.
In this embodiment, a logistic regression model is optimized using a logistic regression L2 regularization model, and the formula is as follows:
Figure BDA0002015944930000052
wherein lambda is 1 Is a tuning parameter, and L2 regularization reduces overfitting.
In this example, longitudinal EHR data is used to predict clinical tasks of kidney function impairment in chronic kidney disease patients in the next year. Given a time-stamped sequence of patient out-patient gfr values, multiple examples are generated for each patient. One example relates to patient P, timestamp t0, and one tuple of eGFR measurements. Given a patient, consider an example meeting the following criteria:
1. patient P at t 0 At least two eGFR measurements over the subsequent 1 year window, at t 0 At least two gfr measurements over the previous 1 year window;
2. the previous example of patient P is at least one year earlier than the current example.
In this embodiment, the performance of the logistic regression model is compared between the multitask-temporal approach and the conventional non-temporal approach. As shown in FIG. 3, the conventional non-temporal method includes the steps of summarizing measurements in all medical events in each time window of T; representing the example by an F-dimensional vector; a logistic regression model is obtained. The non-temporal approach only considers univariates with significant correlation (p < 0.05) to y in the training set. Whereas the multitasking-tense method retains a univariate with a significant correlation (p < 0.05) with y in at least one window T of the training set.
In this example, fig. 4 and 5 show experimental results. The logistic regression performance of the two preset thresholds, the multitasking-temporal approach, with respect to patient history of different lengths is always better than the non-temporal approach. As the length of the patient's medical history increases, non-temporal logistic regression performance eventually declines; the logistic regression performance of the multitasking-tense method is improved and is always in steady state.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
according to the modeling method of the chronic kidney disease worsening risk based on the EHR data, the prediction factors are extracted from the EHR of the patient, the prediction factors are mapped into binary variables to form a data set, a logistic regression model is built by adopting a multitask-temporal inclusion method, the EHR data (data set) of the patient is included in the logistic regression model, and the characteristics that the EHR data can track the chronic kidney disease in different periods are utilized to obtain a more optimized logistic regression model.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.

Claims (2)

1. A modeling method of chronic kidney disease exacerbation risk based on EHR data, comprising the steps of:
s1, from patient at t 0 Extracting predictive factors from the prior EHR data, mapping each predictive factor into binary variables respectively, and combining each binary variable into a set to obtain a first data set;
the first data set is represented as follows,
Figure FDA0004041873250000011
wherein D is the first dataset, x i The method is characterized in that the method is an ith binary variable, F is the dimension of a feature space, and N is a binary variable number;
s2, setting granularity level of time windows, dividing the first data set into T non-overlapping time windows according to the granularity level, and according to the latest time window T 0 Establishing a logistic regression model by adopting a multitask-temporal inclusion method;
the multitasking-temporal inclusion method includes the steps of,
using the task of each time window T x prediction result as an individual task T, wherein T x = 1,2,..t;
for each task t, acquiring a second data set corresponding to each task t;
jointly learning all tasks t by using a multitasking formula;
acquiring T logistic regression models corresponding to each task T respectively, wherein T=T;
the second data set is represented as follows,
Figure FDA0004041873250000012
wherein D is t For the second set of data to be used,
Figure FDA0004041873250000015
for the ith binary variable extracted from the t-th time window, F1 is the total number of binary variables, N t A binary variable number representing task t; the relation between the total number of binary variables and the dimension of the feature space is f=f1×t;
the multi-tasking formula is shown below,
Figure FDA0004041873250000013
wherein w is t Is the weight of the t-th task, lambda 1 Is corresponding to
Figure FDA0004041873250000016
Tuning parameters lambda 2 Is corresponding to->
Figure FDA0004041873250000014
Is used for adjusting the optimal parameters;
the logistic regression model is represented as follows,
Figure FDA0004041873250000021
the predictors include diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information for the patient; the diagnosis information and the prescription drug information belong to classification variable type prediction factors, and the prediction factor value is set to be zero; the laboratory measurement information, vital sign information and statistical data information belong to a numerical predictor, the predictor value of which is the value of the nearest time window.
2. The modeling method of chronic kidney disease exacerbation risk based on EHR data of claim 1, wherein: the classified variable type predictors are expressed as a binary variable, and each numerical type predictor is respectively discretized into a binary variable according to the quartiles of each numerical type predictor.
CN201910263178.8A 2019-04-02 2019-04-02 Modeling method for chronic kidney disease worsening risk based on EHR data Active CN109994211B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910263178.8A CN109994211B (en) 2019-04-02 2019-04-02 Modeling method for chronic kidney disease worsening risk based on EHR data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910263178.8A CN109994211B (en) 2019-04-02 2019-04-02 Modeling method for chronic kidney disease worsening risk based on EHR data

Publications (2)

Publication Number Publication Date
CN109994211A CN109994211A (en) 2019-07-09
CN109994211B true CN109994211B (en) 2023-05-02

Family

ID=67131296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910263178.8A Active CN109994211B (en) 2019-04-02 2019-04-02 Modeling method for chronic kidney disease worsening risk based on EHR data

Country Status (1)

Country Link
CN (1) CN109994211B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2747159C1 (en) * 2020-06-03 2021-04-28 Федеральное государственное бюджетное образовательное учреждение высшего образования "Смоленский государственный медицинский университет" министерства здравоохранения Российской Федерации Method for assessing clinically significant chronic kidney disease
CN113057586B (en) * 2021-03-17 2024-03-12 上海电气集团股份有限公司 Disease early warning method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108091397A (en) * 2018-01-24 2018-05-29 浙江大学 A kind of bleeding episode Forecasting Methodology for the Ischemic Heart Disease analyzed based on promotion-resampling and feature association
AU2018100759A4 (en) * 2018-06-06 2018-07-12 Macau University Of Science And Technology Computer implemented method and system for analysing data in a dataset
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180315507A1 (en) * 2017-04-27 2018-11-01 Yale-New Haven Health Services Corporation Prediction of adverse events in patients undergoing major cardiovascular procedures

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108091397A (en) * 2018-01-24 2018-05-29 浙江大学 A kind of bleeding episode Forecasting Methodology for the Ischemic Heart Disease analyzed based on promotion-resampling and feature association
AU2018100759A4 (en) * 2018-06-06 2018-07-12 Macau University Of Science And Technology Computer implemented method and system for analysing data in a dataset
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Cox Regression with Correlation based Regularization for Electronic Health Records;Bhanukiran Vinzamuri et al.;《2013 IEEE 13th International Conference on Data Mining》;20140203;第757-756页 *
基于逻辑回归的多任务域快速分类学习算法;顾鑫 等;《计算机工程与应用》;20160818;第47-56页 *

Also Published As

Publication number Publication date
CN109994211A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
US8990135B2 (en) Personalized health risk assessment for critical care
US10638965B2 (en) Method and system for monitoring stress conditions
US20220044148A1 (en) Adapting prediction models
JP2018524137A (en) Method and system for assessing psychological state
JP2012221508A (en) System and computer readable medium for predicting patient outcomes
CN107430645B (en) System for laboratory value automated analysis and risk notification in intensive care units
US20160125159A1 (en) System for management of health resources
Zheng et al. Capturing feature-level irregularity in disease progression modeling
Chen et al. Performance metrics for online seizure prediction
CN109994211B (en) Modeling method for chronic kidney disease worsening risk based on EHR data
CN113782192A (en) Grouping model construction method based on causal inference and medical data processing method
KR20210113042A (en) Device, method and program for predict hospital stay period based on patient information
CN116844725A (en) Health information generation method, device, medium and equipment
CN116230222A (en) Method for predicting death probability of coronary heart disease inpatient
Toma et al. Discovery and integration of univariate patterns from daily individual organ-failure scores for intensive care mortality prediction
CN113053530B (en) Medical time series data comprehensive information extraction method
van der Woerd et al. Studying sleep: towards the identification of hypnogram features that drive expert interpretation
Rajmohan et al. G-Sep: A deep learning algorithm for detection of long-term sepsis using bidirectional gated recurrent unit
US11081217B2 (en) Systems and methods for optimal health assessment and optimal preventive program development in population health management
Shroff et al. GlucoseAssist: Personalized Blood Glucose Level Predictions and Early Dysglycemia Detection
Hansen et al. Individual health indices via register-based health records and machine learning
US20240013917A1 (en) Systems and methods for end of life analysis
CN118471457B (en) Hyperuricemia case management system based on ultrasonic detection
EP4278953A1 (en) Assessing gum recession
CN115762761A (en) Automatic sleep staging method and device based on single expert annotation data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant