CN109994211B - Modeling method for chronic kidney disease worsening risk based on EHR data - Google Patents
Modeling method for chronic kidney disease worsening risk based on EHR data Download PDFInfo
- Publication number
- CN109994211B CN109994211B CN201910263178.8A CN201910263178A CN109994211B CN 109994211 B CN109994211 B CN 109994211B CN 201910263178 A CN201910263178 A CN 201910263178A CN 109994211 B CN109994211 B CN 109994211B
- Authority
- CN
- China
- Prior art keywords
- data
- information
- task
- binary
- logistic regression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7271—Specific aspects of physiological measurement analysis
- A61B5/7275—Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Psychiatry (AREA)
- Heart & Thoracic Surgery (AREA)
- Physiology (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Veterinary Medicine (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a modeling method of chronic kidney disease worsening risk based on EHR data, which relates to the technical field of medical treatment. The advantages are that: the EHR data (data set) of the patient can be incorporated into the logistic regression model, and the characteristics of the EHR data that different periods of tracking of chronic kidney disease can be performed are utilized to obtain a more optimized logistic regression model.
Description
Technical Field
The invention relates to the technical field of medical treatment, in particular to a modeling method of chronic kidney disease worsening risk based on EHR data.
Background
By tracking repeated measurements of patient status over time, EHR data contains important information about disease progression, and patient data is recorded only when a physical examination is made or the patient is in a hospital for regular medical care, resulting in irregular sampling of the data, another feature of EHR data is the tracking of the patient over different periods.
Chronic kidney disease is generally defined by loss of kidney function, as indicated by an estimated glomerular filtration rate (evfr), which is calculated from serum creatinine. Chronic kidney disease is divided into five stages, stage 3b being the turning point for progression to End Stage Renal Disease (ESRD) and cardiovascular adverse outcomes. Current research aims at developing predictive models of progression to ESRD or death, and many of the past studies in order to predict progression use data from carefully controlled prospective studies. Thus, using the data in the longitudinal medical history available in EHR, it is important to predict the risk of exacerbation of chronic kidney disease the next year, considering patients with mild to moderate kidney function impairment.
Disclosure of Invention
The invention aims to provide a modeling method for chronic kidney disease worsening risk based on EHR data, so as to solve the problems in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a modeling method of chronic kidney disease exacerbation risk based on EHR data, comprising the steps of:
s1, from patient at t 0 Extracting predictive factors from the previous EHR data, and extracting each of the predictive factorsThe prediction factors are mapped into binary variables respectively, and the binary variables are combined into a set to obtain a first data set;
s2, setting granularity level of time windows, dividing the first data set into T non-overlapping time windows according to the granularity level, and according to the latest time window T 0 And establishing a logistic regression model by adopting a multitask-temporal inclusion method.
Preferably, the predictors include diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information for the patient; the diagnosis information and the prescription drug information belong to classification variable type prediction factors, and the prediction factor value is set to be zero; the laboratory measurement information, vital sign information and statistical data information belong to numerical predictors, the predictor value of which is the value of the most recent time window.
Preferably, the method is characterized in that: the classified variable type predictors are expressed as a binary variable, and each numerical type predictor is respectively discretized into a binary variable according to the quartiles of each numerical type predictor.
Preferably, the first data set is represented as follows,
wherein D is the first dataset, x i For the ith example, F is the dimension of the feature space and N is the number of examples.
Preferably, the multitasking temporal method comprises the steps of,
using the task of each of the T-th window predictors as a separate task, wherein t=1, 2,..t;
for each task t, acquiring a second data set corresponding to each task t;
jointly learning all tasks t by using a multitasking formula;
a logistic regression model is obtained.
Preferably, the second data set is represented as follows,
wherein Dt is the second dataset, x it For the binary variable extracted from the t-th window, F1 is the binary variable number, N t Representing an example number of tasks t; the relationship between the binary variable number and the dimension of the feature space is f=f1×t;
the multi-tasking formula is shown below,
wherein U is t Is the weight of the t-th task, lambda 1 And lambda (lambda) 2 Is a tuning parameter.
Preferably, the logistic regression model is expressed as follows,
wherein W is E R F Is the feature weight and c is the intercept.
The beneficial effects of the invention are as follows: by extracting predictors from the patient EHR, mapping the predictors into binary variables, forming a dataset, and incorporating the patient EHR data (dataset) into a logistic regression model by employing a multitasking temporal approach, a logistic regression model that is more optimized than conventional non-temporal incorporation methods is obtained.
Drawings
FIG. 1 is a flow chart of a modeling method in an embodiment of the invention;
FIG. 2 is a schematic diagram of a multi-task temporal inclusion process for including EHR data into a logistic regression model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a non-temporal inclusion process for the inclusion of EHR data into a logistic regression model according to an embodiment of the present invention;
FIG. 4 is an average performance of two different methods with a preset threshold of 10% in an embodiment of the present invention;
fig. 5 shows the average performance of two different methods with a preset threshold of 20% in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.
Example 1
As shown in fig. 1, the invention provides a modeling method for chronic kidney disease exacerbation risk based on EHR data, which comprises the following steps:
s1, from patient at t 0 Extracting predictive factors from the prior EHR data, mapping each predictive factor into binary variables respectively, and combining each binary variable into a set to obtain a first data set;
s2, setting granularity level of time windows, dividing the first data set into T non-overlapping time windows according to the granularity level, and according to the latest time window T 0 And establishing a logistic regression model by adopting a multitask-temporal inclusion method.
Example two
As shown in fig. 1 to 5, in the present embodiment, the example data includes electronic health files of patients of the medical society of the ciclesonide mountain hospital and the ciclesonide mountain instructor in new york city, and data is extracted from patients diagnosed with hypertension, diabetes or both of them with impaired renal function. The EHR (electronic health record) of the patient includes diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information of the patient, which are extracted as predictors as shown in fig. 2. Wherein the predictors include diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information for the patient; the diagnosis information and the prescription drug information belong to classification variable type prediction factors, and the prediction factor value is set to be zero; the laboratory measurement information, vital sign information and statistical data information belong to numerical predictors, the predictor value of which is the value of the most recent time window.
In this embodiment, the classification variable type predictors are expressed as one binary variable, and each numerical type predictor is respectively discretized into a binary variable according to the quartiles of each numerical type predictor. That is, all predictors are represented by binary variables; for the numerical predictor, the mean value, the median value, the minimum value, the maximum value and the standard deviation of the numerical predictor in a specified time window are calculated, and furthermore, the linear slope of the numerical predictor in a time window is calculated, a line suitable for data is fitted by using a least square method, and the fitted slope is characterized. The diagnostic information and prescription drug information are represented as a binary variable representing whether the patient associated with this example has been assigned an ICD-9 code or prescription drug within a time window specified in the past.
In this embodiment, all binary variables are grouped into a set, forming a first data set, the first data set being represented as follows,
wherein D is the first dataset, x i For the ith example, F is the dimension of the feature space and N is the number of examples. Then, given the granularity level L of the time window, dividing the patient history into T non-overlapping time periods, and utilizing the latest time period T 0 And establishing a logistic regression model f, namely RF-R.
In this embodiment, as shown in fig. 2, the multitasking temporal method includes the steps of using each task of the T-th window prediction result as a separate task, where t=1, 2,..t; for each task t, acquiring a second data set corresponding to each task t; jointly learning all tasks t by using a multitasking formula; obtaining a logistic regression model; the second data set is represented as follows,
wherein Dt is the second data set,x it For the binary variable extracted from the t-th window, F1 is the binary variable number, N t Representing an example number of tasks t; the relationship between the binary variable number and the dimension of the feature space is f=f1×t; the multi-tasking formula is shown below,
wherein U is t Is the weight of the t-th task, lambda 1 And lambda (lambda) 2 Is a tuning parameter. Once the logistic regression model is obtained, for the new example x i By prediction from each f t (x it ) Intermediate prediction is obtainedThen averaging the predictions obtained from the time window T generates a single prediction +.>
In this embodiment, the multitasking method does not make any estimation of the time window where patient information is small, and learns f t When generating predictions for a new example using only examples of at least five medical events within the time window t, a single prediction is generated using a time window of at least 5 medical events and then averaging.
In this embodiment, the logistic regression model obtained in step S2 is represented as follows,
wherein W is E R F Is the feature weight and c is the intercept.
In this embodiment, a logistic regression model is optimized using a logistic regression L2 regularization model, and the formula is as follows:
wherein lambda is 1 Is a tuning parameter, and L2 regularization reduces overfitting.
In this example, longitudinal EHR data is used to predict clinical tasks of kidney function impairment in chronic kidney disease patients in the next year. Given a time-stamped sequence of patient out-patient gfr values, multiple examples are generated for each patient. One example relates to patient P, timestamp t0, and one tuple of eGFR measurements. Given a patient, consider an example meeting the following criteria:
1. patient P at t 0 At least two eGFR measurements over the subsequent 1 year window, at t 0 At least two gfr measurements over the previous 1 year window;
2. the previous example of patient P is at least one year earlier than the current example.
In this embodiment, the performance of the logistic regression model is compared between the multitask-temporal approach and the conventional non-temporal approach. As shown in FIG. 3, the conventional non-temporal method includes the steps of summarizing measurements in all medical events in each time window of T; representing the example by an F-dimensional vector; a logistic regression model is obtained. The non-temporal approach only considers univariates with significant correlation (p < 0.05) to y in the training set. Whereas the multitasking-tense method retains a univariate with a significant correlation (p < 0.05) with y in at least one window T of the training set.
In this example, fig. 4 and 5 show experimental results. The logistic regression performance of the two preset thresholds, the multitasking-temporal approach, with respect to patient history of different lengths is always better than the non-temporal approach. As the length of the patient's medical history increases, non-temporal logistic regression performance eventually declines; the logistic regression performance of the multitasking-tense method is improved and is always in steady state.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
according to the modeling method of the chronic kidney disease worsening risk based on the EHR data, the prediction factors are extracted from the EHR of the patient, the prediction factors are mapped into binary variables to form a data set, a logistic regression model is built by adopting a multitask-temporal inclusion method, the EHR data (data set) of the patient is included in the logistic regression model, and the characteristics that the EHR data can track the chronic kidney disease in different periods are utilized to obtain a more optimized logistic regression model.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.
Claims (2)
1. A modeling method of chronic kidney disease exacerbation risk based on EHR data, comprising the steps of:
s1, from patient at t 0 Extracting predictive factors from the prior EHR data, mapping each predictive factor into binary variables respectively, and combining each binary variable into a set to obtain a first data set;
the first data set is represented as follows,
wherein D is the first dataset, x i The method is characterized in that the method is an ith binary variable, F is the dimension of a feature space, and N is a binary variable number;
s2, setting granularity level of time windows, dividing the first data set into T non-overlapping time windows according to the granularity level, and according to the latest time window T 0 Establishing a logistic regression model by adopting a multitask-temporal inclusion method;
the multitasking-temporal inclusion method includes the steps of,
using the task of each time window T x prediction result as an individual task T, wherein T x = 1,2,..t;
for each task t, acquiring a second data set corresponding to each task t;
jointly learning all tasks t by using a multitasking formula;
acquiring T logistic regression models corresponding to each task T respectively, wherein T=T;
the second data set is represented as follows,
wherein D is t For the second set of data to be used,for the ith binary variable extracted from the t-th time window, F1 is the total number of binary variables, N t A binary variable number representing task t; the relation between the total number of binary variables and the dimension of the feature space is f=f1×t;
the multi-tasking formula is shown below,
wherein w is t Is the weight of the t-th task, lambda 1 Is corresponding toTuning parameters lambda 2 Is corresponding to->Is used for adjusting the optimal parameters;
the logistic regression model is represented as follows,
the predictors include diagnostic information, laboratory measurement information, vital sign information, prescription drug information, and statistical data information for the patient; the diagnosis information and the prescription drug information belong to classification variable type prediction factors, and the prediction factor value is set to be zero; the laboratory measurement information, vital sign information and statistical data information belong to a numerical predictor, the predictor value of which is the value of the nearest time window.
2. The modeling method of chronic kidney disease exacerbation risk based on EHR data of claim 1, wherein: the classified variable type predictors are expressed as a binary variable, and each numerical type predictor is respectively discretized into a binary variable according to the quartiles of each numerical type predictor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910263178.8A CN109994211B (en) | 2019-04-02 | 2019-04-02 | Modeling method for chronic kidney disease worsening risk based on EHR data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910263178.8A CN109994211B (en) | 2019-04-02 | 2019-04-02 | Modeling method for chronic kidney disease worsening risk based on EHR data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109994211A CN109994211A (en) | 2019-07-09 |
CN109994211B true CN109994211B (en) | 2023-05-02 |
Family
ID=67131296
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910263178.8A Active CN109994211B (en) | 2019-04-02 | 2019-04-02 | Modeling method for chronic kidney disease worsening risk based on EHR data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109994211B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2747159C1 (en) * | 2020-06-03 | 2021-04-28 | Федеральное государственное бюджетное образовательное учреждение высшего образования "Смоленский государственный медицинский университет" министерства здравоохранения Российской Федерации | Method for assessing clinically significant chronic kidney disease |
CN113057586B (en) * | 2021-03-17 | 2024-03-12 | 上海电气集团股份有限公司 | Disease early warning method, device, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108091397A (en) * | 2018-01-24 | 2018-05-29 | 浙江大学 | A kind of bleeding episode Forecasting Methodology for the Ischemic Heart Disease analyzed based on promotion-resampling and feature association |
AU2018100759A4 (en) * | 2018-06-06 | 2018-07-12 | Macau University Of Science And Technology | Computer implemented method and system for analysing data in a dataset |
CN109036553A (en) * | 2018-08-01 | 2018-12-18 | 北京理工大学 | A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180315507A1 (en) * | 2017-04-27 | 2018-11-01 | Yale-New Haven Health Services Corporation | Prediction of adverse events in patients undergoing major cardiovascular procedures |
-
2019
- 2019-04-02 CN CN201910263178.8A patent/CN109994211B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108091397A (en) * | 2018-01-24 | 2018-05-29 | 浙江大学 | A kind of bleeding episode Forecasting Methodology for the Ischemic Heart Disease analyzed based on promotion-resampling and feature association |
AU2018100759A4 (en) * | 2018-06-06 | 2018-07-12 | Macau University Of Science And Technology | Computer implemented method and system for analysing data in a dataset |
CN109036553A (en) * | 2018-08-01 | 2018-12-18 | 北京理工大学 | A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge |
Non-Patent Citations (2)
Title |
---|
Cox Regression with Correlation based Regularization for Electronic Health Records;Bhanukiran Vinzamuri et al.;《2013 IEEE 13th International Conference on Data Mining》;20140203;第757-756页 * |
基于逻辑回归的多任务域快速分类学习算法;顾鑫 等;《计算机工程与应用》;20160818;第47-56页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109994211A (en) | 2019-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8990135B2 (en) | Personalized health risk assessment for critical care | |
US10638965B2 (en) | Method and system for monitoring stress conditions | |
US20220044148A1 (en) | Adapting prediction models | |
JP2018524137A (en) | Method and system for assessing psychological state | |
JP2012221508A (en) | System and computer readable medium for predicting patient outcomes | |
CN107430645B (en) | System for laboratory value automated analysis and risk notification in intensive care units | |
US20160125159A1 (en) | System for management of health resources | |
Zheng et al. | Capturing feature-level irregularity in disease progression modeling | |
Chen et al. | Performance metrics for online seizure prediction | |
CN109994211B (en) | Modeling method for chronic kidney disease worsening risk based on EHR data | |
CN113782192A (en) | Grouping model construction method based on causal inference and medical data processing method | |
KR20210113042A (en) | Device, method and program for predict hospital stay period based on patient information | |
CN116844725A (en) | Health information generation method, device, medium and equipment | |
CN116230222A (en) | Method for predicting death probability of coronary heart disease inpatient | |
Toma et al. | Discovery and integration of univariate patterns from daily individual organ-failure scores for intensive care mortality prediction | |
CN113053530B (en) | Medical time series data comprehensive information extraction method | |
van der Woerd et al. | Studying sleep: towards the identification of hypnogram features that drive expert interpretation | |
Rajmohan et al. | G-Sep: A deep learning algorithm for detection of long-term sepsis using bidirectional gated recurrent unit | |
US11081217B2 (en) | Systems and methods for optimal health assessment and optimal preventive program development in population health management | |
Shroff et al. | GlucoseAssist: Personalized Blood Glucose Level Predictions and Early Dysglycemia Detection | |
Hansen et al. | Individual health indices via register-based health records and machine learning | |
US20240013917A1 (en) | Systems and methods for end of life analysis | |
CN118471457B (en) | Hyperuricemia case management system based on ultrasonic detection | |
EP4278953A1 (en) | Assessing gum recession | |
CN115762761A (en) | Automatic sleep staging method and device based on single expert annotation data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |