[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117275661A - Deep reinforcement learning-based lung cancer patient medication prediction method and device - Google Patents

Deep reinforcement learning-based lung cancer patient medication prediction method and device Download PDF

Info

Publication number
CN117275661A
CN117275661A CN202311567874.0A CN202311567874A CN117275661A CN 117275661 A CN117275661 A CN 117275661A CN 202311567874 A CN202311567874 A CN 202311567874A CN 117275661 A CN117275661 A CN 117275661A
Authority
CN
China
Prior art keywords
patient
medication
data
lung cancer
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311567874.0A
Other languages
Chinese (zh)
Other versions
CN117275661B (en
Inventor
常云青
冯秀芳
董云云
白玉洁
张源榕
杨炳乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202311567874.0A priority Critical patent/CN117275661B/en
Publication of CN117275661A publication Critical patent/CN117275661A/en
Application granted granted Critical
Publication of CN117275661B publication Critical patent/CN117275661B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a method and a device for predicting the medication of a lung cancer patient based on deep reinforcement learning, belonging to the technical field of the medication prediction of the lung cancer patient; the technical problems to be solved are as follows: providing a method and a device for predicting the medication of a lung cancer patient based on deep reinforcement learning; the technical scheme adopted for solving the technical problems is as follows: collecting lung cancer patient data information, extracting vital signs and related medical histories of a lung cancer patient within a period of time, and preprocessing the vital signs and the related medical histories to construct a patient data set; constructing a patient-based environment model by using the collected data, wherein the model is used for simulating a reward mechanism of a drug effect on a patient body and comprises a patient state, a drug action space, a reward function, a transfer model and an initial state; constructing a network model comprising an online network and a target network for calculating each possible drug regimen adjustment value for the current state of the patient; the method is applied to the drug administration prediction of lung cancer patients.

Description

Deep reinforcement learning-based lung cancer patient medication prediction method and device
Technical Field
The invention provides a method and a device for predicting the medication of a lung cancer patient based on deep reinforcement learning, and belongs to the technical field of the medication prediction of the lung cancer patient.
Background
The deep reinforcement learning is a technology combining the deep learning and reinforcement learning, and can optimize the decision process of the intelligent body through simulating and learning the behaviors and results in the environment, and can be applied to the aspects of medical diagnosis, treatment scheme design, health management and the like in the personalized medical field, thereby providing more accurate and effective medical services for patients.
Personalized medicine is a medical mode based on individual differentiation of patients and based on unique gene, physiological and psychological characteristics of the patients, and corresponding traditional medical modes usually only consider general rules, and individual differences among patients are ignored, so that the prediction effect is poor, the disease mode of the patients cannot be found, and the optimal medicine adjustment scheme cannot be predicted; among these, the patient with lung cancer needs to predict the medication situation by analyzing a large amount of medical data and the lung cancer characteristics of the individual patient, and the currently adopted traditional medical mode cannot meet the prediction requirement for medication.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and solves the technical problems that: a method and a device for predicting the medication of a lung cancer patient based on deep reinforcement learning are provided.
In order to solve the technical problems, the invention adopts the following technical scheme: a lung cancer patient medication prediction method based on deep reinforcement learning comprises the following medication prediction steps:
step S1: collecting lung cancer patient data information;
step S2: extracting vital signs and related medical histories of a patient suffering from lung cancer for a period of time and preprocessing the vital signs and the related medical histories to construct a patient data set;
step S3: constructing a patient-based environmental model using the collected data for simulating a reward mechanism of a drug effect on a patient's body, comprising: patient state, medication action space, reward function, transfer model, and initial state;
step S4: setting up a network model comprising an online network and a target network, and calculating an adjustment value of each medication scheme under the current state of a patient;
step S5: taking the collected patient history treatment data as input, and outputting a predicted drug adjustment scheme;
step S6: updating network parameters by using a random gradient descent method;
step S7: through constantly interacting with the environment, the method carries out training and learning for multiple times, achieves the goal of rewarding maximization, and predicts and outputs the medication type and medication dosage adjustment scheme suitable for patients.
The specific method for collecting the data information of the lung cancer patient in the step S1 comprises the following steps:
step S11: collecting personal basic information, medical history, physiological data and drug treatment scheme data of a lung cancer patient;
step S12: collecting data on the relationship between the type of medication, dosage and treatment effect of a lung cancer patient, comprising: the patient takes different drug types and the change data of the lung tumor size of the patient when the patient takes different doses.
The specific method for constructing the patient data set in the step S2 is as follows:
step S21: screening lung cancer patient data of a set age group;
step S22: preprocessing the screened data, including: removing repeated data, processing missing values and processing abnormal values;
step S23: and (3) dividing the data obtained in the previous step into a training set and a testing set according to the ratio of 8:2 after the data obtained in the previous step are stored.
The specific method for constructing the environment model based on the patient in the step S3 is as follows:
step S31: determining a patient state space S, including tumor size, pathological stage and physiological index data;
features defining the patient's condition of a visit include: demographic, medical history, disease risk, historical medicine, laboratory data, physical measurement, and establishing a state space according to the information to obtain a multidimensional state vector;
step S32: determining a medicine action space A, including adjustment of medicine types and dosages thereof;
according to the type and the dosage of the historical medication of the patient, determining the adjustment scheme of the medication, wherein the action space A comprises four-dimensional vectors: no prescription change 0, 1 increase of drug dose, 2 decrease of drug dose, 3 replacement of drug;
step S33: determining a reward function R, designing a reasonable reward function according to the disease condition, the drug dosage and the treatment effect factors of a patient, and feeding back the change of various indexes of the body when the patient takes different drugs and the doses thereof to mark the improvement or the deterioration of the tumor;
step S34: based on patient historical medication data, a probability model is built for transitioning to each medication action in the current patient stateCalculating the current state of the patient +.>Taking medication strategy->After transition to the next state +.>Probability of (2)PAnd use +.>Strategy is used for balancing development and exploration, and expected benefits at the current moment are maximized;
step S35: for the state of the patient at the beginning of the treatment, an initial state is determined based on the patient's basic condition and medical history.
The specific method for constructing the network model to calculate the adjustment value of the medication scheme of the patient in the step S4 is as follows:
step S41: setting up an online network for calculating the adjustment value of each personalized medication scheme under the current physical state of the patient, and updating the optimal scheme according to the adjustment value;
the parameter weight of the online network is updated in the process of each iteration to minimize the difference between the predicted value and the target value in the current state;
step S42: calculating a target value according to the action in the current state and the maximum adjustment value in the next state;
step S43: constructing a neural network model to calculate an action-cost function Q (s, a) for estimating the cumulative rewards of each adjustment scheme in the current treatment state, wherein the action-cost function Q is denoted as Q (s, a) and is used for calculating an expected return value Q caused by the change of the body index of a patient after taking the medication strategy a in the state s;
step S44: training the DQN model, two structurally identical neural networks were used: online networkAnd a target networkFor obtaining optimal dosing action decisions +.>And is->Training is carried out;
wherein,for the current state of the patient,acurrent medication strategy for the patient,/->For online networksQParameter of->For calculating in a given state->Has the maximum at the bottomQAction of valueaQExpected return value for patient physical index change in online network, < >>The expected return value for the patient's physical index change in the target network,Lfor loss function, calculating the difference between the two to perform network training;
estimating expected action values using a target networkTo calculate a loss functionLBy tracking the online network parameters in each training iteration +.>To update the target network->Parameter->Finally, the optimal personalized medicine adjustment scheme suitable for the patient is obtained.
The specific method for outputting the predicted drug adjustment scheme in the step S5 is as follows:
step S51: the method comprises the steps that five kinds of information including patient illness state, physiological indexes, laboratory examination results, image examination results and medication conditions are formed into a high-dimensional vector to serve as input data of a model;
step S52: the output of the model is a drug adjustment regimen based on the patient's condition, including four drug adjustment regimens: definition 0 indicates no prescription change, 1 indicates increased medication dose, 2 indicates decreased medication dose, and 3 indicates replacement medication, respectively.
The device for realizing the lung cancer patient medication prediction method based on deep reinforcement learning comprises an acquisition computer for collecting lung cancer patient data information, a data server for collecting and storing the data information, and a prediction server for building a network model, training learning and outputting a prediction scheme.
Compared with the prior art, the invention has the following beneficial effects: the invention provides a prediction method and a device for using medicines for individualized lung cancer patients, which are mainly based on the deep reinforcement learning step of a strategy optimization algorithm, and are characterized in that an individualized environment model is built by collecting historical medicine data of the patients, and intelligent bodies and the environment are trained to perform interactive learning, so that the future optimal medicine scheme of the lung cancer patients is obtained through adjustment and prediction; the deep reinforcement learning method adopted by the invention has the perception capability of deep learning and the decision capability of reinforcement learning, can integrate perception, learning and decision into the same framework, and is used for solving the problem of high-dimensional decision based on time sequences, so that the intelligent body can be trained to learn an optimal drug adjustment scheme based on the treatment history of a cancer patient, and more accurate and effective medical service is provided for the lung cancer patient.
It should be noted that, all actions for acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
Drawings
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a diagram of a drug prediction model structure employing a strategy-based optimization algorithm;
FIG. 2 is a flow chart of the steps of the medication prediction method of the present invention.
Detailed Description
As shown in fig. 1 and fig. 2, the method for predicting the medication of the lung cancer patient based on the deep reinforcement learning and strategy optimization algorithm adopted by the invention specifically comprises the following steps:
step S1: collecting lung cancer patient data information;
step S2: extracting vital signs and related medical histories of a patient suffering from lung cancer within 6 months, and preprocessing the vital signs and the related medical histories to construct a patient data set;
step S3: constructing a patient-based environment model by using the collected data, wherein the model is used for simulating a reward mechanism of a drug effect on a patient body and comprises a patient state, a drug action space, a reward function, a transfer model and an initial state;
step S4: constructing a network model comprising an online network and a target network, wherein the network model is used for calculating the Q value adjusted by each possible drug scheme under the current state of a patient;
step S5: taking the collected historical treatment data of the patient as input, and outputting a predicted drug adjustment scheme;
step S6: updating network parameters by using a random gradient descent method;
step S7: the intelligent body performs multiple training and learning, and outputs the optimal personalized medicine adjustment scheme.
Further, the collecting lung cancer patient data information in step S1 includes:
step S11: collecting personal basic information, medical history, physiological data, drug treatment scheme and other relevant data of a lung cancer patient;
step S12: and collecting the relation data between the medicine types, the dosages and the treatment effects of the lung cancer patients.
Further, the personal basic information described in step S11 includes the patient' S age, sex, race, smoking, and others; medical history including complications, cancer complications, hospitalization history, emergency treatment history, etc.; physiological data includes systolic pressure (SBP), diastolic pressure (DBP), heart rate, weight, height, BMI, etc.; drug treatment regimens include chemotherapy, targeted therapy, immunotherapy, and anti-vascular therapy.
Further, the relationship between the drug type, the dose and the therapeutic effect in step S12 includes the change of the physical index of the patient when the patient takes different drug types and doses, which is specifically represented by the change of the lung tumor size of the lung cancer patient.
Further, extracting and preprocessing vital signs of a lung cancer patient within 6 months as described in step S2 includes:
step S21: screening lung cancer patient data above 18 years old and below 75 years old;
step S22: and preprocessing the screened data. Operations including removing duplicate data, processing missing values (for missing physical measurements, replacing missing data with the value of the nearest data point of the same patient, if the data is still missing, estimating the missing data using the median of the variable observations of all patients without missing data), processing outliers, etc., to ensure the integrity and accuracy of the data;
step S23: and (3) dividing the data obtained in the previous step into a training set and a testing set according to the ratio of 8:2 after the data obtained in the previous step are stored, and training and evaluating the effectiveness of the drug prediction model.
Further, the constructing the patient-based environment model using the collected data in step S3 includes:
step S31: determining a patient state space S: the state space refers to the pathological condition of patients, including tumor size, pathological stage, physiological index and the like. Features defining patient visit status in the present invention include demographics, medical history, disease risk, historical medication, laboratory data, and physical measurements. Establishing a state space according to the information to obtain a 20-dimensional state vector;
step S32: determining a medicine action space A: the action space refers to actions that an intelligent doctor can take, i.e. the adjustment of the kind of drug and its dosage. According to the type and the dosage of the historical medication of a patient, the invention determines the adjustment scheme of the medication, and consists of four-dimensional vectors of 0 no-prescription change, 1 increase of the medication dosage, 2 decrease of the medication dosage and 3 replacement of the medication;
step S33: determining a reward function R: the reward function refers to feedback rewards obtained after the intelligent agent makes corresponding actions according to the current state in the reinforcement learning algorithm. According to the factors such as the illness state, the medicine dosage and the treatment effect of the patient, a reasonable reward function is designed to feed back the changes of various indexes of the body when the patient takes different medicines and dosages thereof, and the tumor improvement or deterioration is marked to assist doctors in providing medicine adjustment schemes for the patient;
step S34: establishment ofTransfer model: the transition model is the probability of transition between finger state space and action space, i.e., the probability of transition to the next state after taking some action in the current state. The invention establishes a probability model for transferring to each medication action under the current patient state according to the historical medication data of the patient, and uses +.>Strategies trade-off development and exploration, maximizing the expected benefit at the current moment, e.g. at time step t, the patient has k possible medication selection strategies, then +.>The policy may be expressed as:
wherein,representing possible medication strategies,/->Expressed in given policy->Cumulative rewards due to changes in physical index of patient, < ->For calculating so->Administration strategy to achieve maximum->To explore the rate parameters, expressed inRandom medication strategy selection at 1-Is selected by greedy medication policy actionsThe administration strategy with the maximum action value is +.>
Step S35: determining an initial state: the initial state refers to a state of the patient at the time of starting treatment, and is determined according to the basic condition of the patient and the medical history.
Further, the network model in step S4 is a deep reinforcement learning model based on a policy optimization algorithm, and is specifically used for solving the decision problem of the high-dimensional space, and the specific construction steps are as follows:
step S41: an online network is built for calculating the Q value of each personalized medicine adjustment scheme under the current physical state of the patient, and the optimal scheme is updated according to the Q value. Parameters (weights) of the online network are updated in the process of each iteration to minimize the gap between the predicted Q value and the target Q value in the current state;
step S42: the target network is used for estimating a target Q value, i.e. calculating the target Q value based on the action in the current state and the maximum Q value in the next state. The parameters of the target network are updated slowly compared with those of the online network to maintain the stability of the target Q value. During each iteration, the parameters of the target network are copied from the online network, but are not updated directly.
By constructing a neural network model to calculate the action-cost function Q of estimating the jackpot for each adjustment scheme at the current visit status, the cost function can be expressed as Q (s, a). To train the model, two structurally identical neural networks were used: online networkAnd target network->. The online network is used for obtaining optimal drug administration action decisionAnd is->Training is performed. Estimating the expected action value using the target network>To calculate the loss function L and by slowly tracking the online network parameter +.>To update its parameters->Finally, the optimal personalized medicine adjustment scheme suitable for the patient is obtained.
Further, the step S5 takes the collected treatment data of the patient as input, and outputs a predicted drug adjustment scheme, which specifically includes:
step S51: the method comprises the steps that five kinds of information including patient illness state, physiological indexes, laboratory examination results, image examination results and medication conditions are formed into a high-dimensional vector to serve as input data of a model;
step S52: the output of the model is a drug adjustment scheme based on the condition of the patient, and specifically comprises four drug adjustment schemes of 0 no-prescription change, 1 increase of drug dosage, 2 decrease of drug dosage and 3 replacement of drug.
Further, the updating of network parameters using random gradient descent as described in step S6, and in addition, to alleviate the problems of related data and non-stationary distribution, an empirical replay mechanism is introduced to interact each time-step agent with the environment to obtain a transition sampleStored in buffers and randomly decimated, in this way, differences in data distribution can be mitigated, thereby smoothing the training distribution of many behaviors in the past.
Further, in step S7, the agent continuously interacts with the environment, and through learning the strategy, the goal of maximizing rewards is achieved, and the optimal medicine and the dosage adjustment scheme thereof suitable for the patient are predicted, so that the doctor can make more intelligent medicine adjustment for the patient, and the survival rate and life quality of the lung cancer patient can be improved.
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail by describing exemplary embodiments of the application with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. The following describes the technical scheme of the present invention in detail with reference to examples and drawings, but the scope of protection is not limited thereto.
In this embodiment, specifically based on the drug prediction model structure shown in fig. 1, the personalized drug adjustment can be performed for the lung cancer patient according to the actual situation of the patient through the step flow of the technical scheme shown in fig. 2, and the processing steps include:
step S1: collecting lung cancer patient data information;
the data used in the experiment was collected from a hospital in a region covering 6124 lung cancer patients, including 632565 outpatient visits over a period of time, and the personal basic information, medical history, physiological data, drug treatment regimen and other relevant data of the patients visited during the period and the relationship data between the type of drug taken by the patients, the dosage and the treatment effect were recorded.
Step S2: vital signs and related medical history of lung cancer patients within 6 months are extracted and preprocessed to construct a patient dataset:
screening collected lung cancer patient data, such as patient data of 18 years old and 75 years old or less, and preprocessing the screened data, wherein the method comprises the following steps: removing duplicate data, processing missing values, for missing physical measured values, replacing missing data by using the value of the nearest data point of the same patient, if the data is still lost, estimating the lost data by using the median of the variable observed values of all patients without the lost data, processing abnormal values and the like so as to ensure the integrity and the accuracy of the data; normalizing vital sign data so that all features are in the same scale range, and avoiding the influence of overlarge weight difference among different features on the performance of a model; the preprocessed data are stored and then divided into a training set and a testing set according to the ratio of 8:2, and the training set and the testing set are used for training and evaluating the effectiveness of a medicine prediction model.
Step S3: constructing a patient-based environment model by using the collected data, wherein the model is used for simulating a reward mechanism of a drug effect on a patient body and comprises a patient state, a drug action space, a reward function, a transfer model and an initial state:
step S31: determining a patient state space S: the state refers to the perception of the environment by the machine, i.e. the specific environment in which the agent is located, all possible states are referred to as state spaces, and in this embodiment the features defining the patient's visit status include: demographics, medical history, risk of illness, historical medication, laboratory data, and physical measurements; wherein the continuous variable is normalized to a common scale, the binary variable is expressed as 0 or 1, other classified variables are converted into a plurality of binary variables by adopting single thermal coding, and finally, a state space is established according to the information to obtain a 20-dimensional state vector;
step S32: determining a medicine action space A: refers to a specific action that an agent can take in the current environment, in an embodiment, the action space consists of four-dimensional vectors of 0 no-prescription change, 1 increase of medicine dosage, 2 decrease of medicine dosage, 3 change of medicine; wherein no prescription change indicates that the same medication and dosage as the previous prescription is used, and increasing or decreasing the medication dosage refers to adjusting the dosage of the medication taken by the patient in the current input state;
step S33: determining a reward function R: the reinforcement learning winning function is used for evaluating feedback rewards obtained by the agent after executing a certain action, and is a function for mapping environment feedback to scalar values, and the goal in reinforcement learning is to enable the agent to learn an optimal strategy through interaction with the environment so as to maximize accumulated rewards;
in the present embodiment, the bonus function is set as follows:
wherein:is at presenttPatient status at moment->Is at presenttDosing action performed by the moment agent, +.>Is thattPatient status at +1, ∈1>Is that the agent is executing the drug administration action +.>Posterior slave state->Transition to State->The awards obtained;s_rrepresenting the patient's survival in the current state, < + >>Representing the toxic side effects of a drug on a patient in the current state, for guiding an agent to avoid selecting drugs harmful to the patient,/I>Representing the cost of the selected drug in the current state, for guiding the agent to avoid selecting too expensive drugs; />、/>、/>The specific values are respectively set to be 1, -0.5 and-0.5 for the weight coefficients; training DQN models to optimize accumulationA prize, the jackpot being equal to the current prize plus the desired jackpot for the next visit multiplied by the discount factor +.>The model is able to estimate the impact of current actions on short-term and long-term results;
step S34: and (3) establishing a transfer model:
the transition model refers to the transition probability between a state and an action, namely the probability of transition to the next state after taking a certain action in the current state; usingThe strategy is used for balancing development and exploration, the development is correct for maximizing expected benefits at the current moment, and the exploration possibly brings about the maximization of total benefits in the long term; />Is a common strategy in reinforcement learning, which means that there is a very small positive number +.>Is selected randomly, leaving +.>The probability of selecting the action with the greatest action value among the existing actions, for example, at time step t, the agent has k possible actions, respectively denoted +.> Let->Representing actionsaIs>The policy may be expressed as:
step S35: determining an initial stateThe initial state refers to a state of the patient at the time of starting treatment, and is determined according to the basic condition of the patient and the medical history.
Step S4: constructing a network model comprising an online network and a target network, wherein the network model is used for calculating the Q value adjusted by each possible drug scheme under the current state of a patient;
building a fully connected neural network with 2 hidden layers, wherein each hidden layer comprises 64 neurons, and adopting batch normalization and a leakage-ReLU activation function; the input layer has 20 dimensions, the output layer has 4 dimensions, which correspond to the state vector and the action space, respectively, and the learning rateSet to 0.001, batch size 256, target network update parameter +.>Set to 0.01; to control the stability of the model, a discount factor λ=0.5 is set; training the model by using an Adam optimizer, wherein the maximum iteration number is 100,000; the online network and the target network have the same structure, but the parameter values are different; an empirical playback mechanism is used to store all experience and to randomly extract a number of samples therefrom for training to reduce estimation errors and high variance problems. And the parameters of the target network are used for updating the parameters of the online network at regular intervals so as to improve the stability and speed up the convergence rate of the model.
Step S5: the extracted clinical information of the patient is input into a network as a state space, and the network is subjected to continuous iterative updating to finally output actions, wherein the final output actions comprise four options of 0 no-prescription change, 1 increase of medicine dosage, 2 decrease of medicine dosage and 3 replacement of medicine.
Step S6: the random gradient descent method is used for updating network parameters, and the specific steps include:
step S61: calculating the return of each state-action pair, generating a transition sampleWherein->Is in state->Execution of action down->The obtained prize value forms an empirical replay memory of size N>
Step S62: initializing online network parametersAnd target network parameters->
Step S63: playback of memory from experienceExtracting a batch of history samples;
step S64: selecting an optimal action for each state transition process
Step S65: calculating expected action value from target network
Step S66: calculating action value of current drug adjustment by online network
Step S67: calculating a Q loss value L, and repeating S64-S67 for each sample in each batch of samples;
step S68: updating parameter values by a loss value L training network
Step S69: updating parameter values
Step S7: the intelligent agent continuously optimizes during the period of training and learning for a plurality of times, adjusts the related weight parameters, gradually increases the accumulated rewards, finally predicts the future drug adjustment scheme of the patient according to different conditions of different patients, changes the drug or adjusts the drug dosage, and assists doctors in adjusting the drug scheme for the patient.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (7)

1. A lung cancer patient medication prediction method based on deep reinforcement learning is characterized in that: the method comprises the following medicine use prediction steps:
step S1: collecting lung cancer patient data information;
step S2: extracting vital signs and related medical histories of a patient suffering from lung cancer for a period of time and preprocessing the vital signs and the related medical histories to construct a patient data set;
step S3: constructing a patient-based environmental model using the collected data for simulating a reward mechanism of a drug effect on a patient's body, comprising: patient state, medication action space, reward function, transfer model, and initial state;
step S4: setting up a network model comprising an online network and a target network, and calculating an adjustment value of each medication scheme under the current state of a patient;
step S5: taking the collected patient history treatment data as input, and outputting a predicted drug adjustment scheme;
step S6: updating network parameters by using a random gradient descent method;
step S7: through constantly interacting with the environment, the method carries out training and learning for multiple times, achieves the goal of rewarding maximization, and predicts and outputs the medication type and medication dosage adjustment scheme suitable for patients.
2. The method for predicting medication for a lung cancer patient based on deep reinforcement learning of claim 1, wherein the method comprises the steps of: the specific method for collecting the data information of the lung cancer patient in the step S1 comprises the following steps:
step S11: collecting personal basic information, medical history, physiological data and drug treatment scheme data of a lung cancer patient;
step S12: collecting data on the relationship between the type of medication, dosage and treatment effect of a lung cancer patient, comprising: the patient takes different drug types and the change data of the lung tumor size of the patient when the patient takes different doses.
3. The method for predicting medication for a lung cancer patient based on deep reinforcement learning according to claim 2, wherein the method comprises the following steps: the specific method for constructing the patient data set in the step S2 is as follows:
step S21: screening lung cancer patient data of a set age group;
step S22: preprocessing the screened data, including: removing repeated data, processing missing values and processing abnormal values;
step S23: and (3) dividing the data obtained in the previous step into a training set and a testing set according to the ratio of 8:2 after the data obtained in the previous step are stored.
4. The method for predicting medication for a lung cancer patient based on deep reinforcement learning according to claim 3, wherein the method comprises the steps of: the specific method for constructing the environment model based on the patient in the step S3 is as follows:
step S31: determining a patient state space S, including tumor size, pathological stage and physiological index data;
features defining the patient's condition of a visit include: demographic, medical history, disease risk, historical medicine, laboratory data, physical measurement, and establishing a state space according to the information to obtain a multidimensional state vector;
step S32: determining a medicine action space A, including adjustment of medicine types and dosages thereof;
according to the type and the dosage of the historical medication of the patient, determining the adjustment scheme of the medication, wherein the action space A comprises four-dimensional vectors: no prescription change 0, 1 increase of drug dose, 2 decrease of drug dose, 3 replacement of drug;
step S33: determining a reward function R, designing a reasonable reward function according to the disease condition, the drug dosage and the treatment effect factors of a patient, and feeding back the change of various indexes of the body when the patient takes different drugs and the doses thereof to mark the improvement or the deterioration of the tumor;
step S34: based on patient historical medication data, a probability model is built for transitioning to each medication action in the current patient stateCalculating the current state of the patient +.>Taking medication strategy->After transition to the next state +.>Probability of (2)And use +.>Strategy is used for balancing development and exploration, and expected benefits at the current moment are maximized;
step S35: for the state of the patient at the beginning of the treatment, an initial state is determined based on the patient's basic condition and medical history.
5. The method for predicting medication for a lung cancer patient based on deep reinforcement learning of claim 4, wherein the method comprises the steps of: the specific method for constructing the network model to calculate the adjustment value of the medication scheme of the patient in the step S4 is as follows:
step S41: setting up an online network for calculating the adjustment value of each personalized medication scheme under the current physical state of the patient, and updating the optimal scheme according to the adjustment value;
the parameter weight of the online network is updated in the process of each iteration to minimize the difference between the predicted value and the target value in the current state;
step S42: calculating a target value according to the action in the current state and the maximum adjustment value in the next state;
step S43: constructing a neural network model to calculate an action-cost function Q (s, a) for estimating the cumulative rewards of each adjustment scheme in the current treatment state, wherein the action-cost function Q is denoted as Q (s, a) and is used for calculating an expected return value Q caused by the change of the body index of a patient after taking the medication strategy a in the state s;
step S44: training the DQN model, two structurally identical neural networks were used: online networkAnd target network->For obtaining optimal dosing action decisions +.>And is->Training is carried out;
wherein,for the current state of the patient->Current medication strategy for the patient,/->For online network->Parameter of->For calculating in a given state->The lower part has the maximum->Action of value->,/>Expected return value for patient physical index change in online network, < >>The expected return value for the patient's physical index change in the target network,Lfor loss function, calculating the difference between the two to perform network training;
estimating expected action values using a target networkTo calculate a loss functionLBy tracking the online network parameters in each training iteration +.>To update the target network->Parameter->Finally, the optimal personalized medicine adjustment scheme suitable for the patient is obtained.
6. The method for predicting medication for a lung cancer patient based on deep reinforcement learning of claim 5, wherein the method comprises the steps of: the specific method for outputting the predicted drug adjustment scheme in the step S5 is as follows:
step S51: the method comprises the steps that five kinds of information including patient illness state, physiological indexes, laboratory examination results, image examination results and medication conditions are formed into a high-dimensional vector to serve as input data of a model;
step S52: the output of the model is a drug adjustment regimen based on the patient's condition, including four drug adjustment regimens: definition 0 indicates no prescription change, 1 indicates increased medication dose, 2 indicates decreased medication dose, and 3 indicates replacement medication, respectively.
7. A device for use in implementing the deep reinforcement learning-based lung cancer patient medication prediction method of claim 1, characterized in that: the system comprises a collecting computer for collecting lung cancer patient data information, a data server for collecting and storing the data information, and a prediction server for building a network model, training and learning and outputting a prediction scheme.
CN202311567874.0A 2023-11-23 2023-11-23 Deep reinforcement learning-based lung cancer patient medication prediction method and device Active CN117275661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311567874.0A CN117275661B (en) 2023-11-23 2023-11-23 Deep reinforcement learning-based lung cancer patient medication prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311567874.0A CN117275661B (en) 2023-11-23 2023-11-23 Deep reinforcement learning-based lung cancer patient medication prediction method and device

Publications (2)

Publication Number Publication Date
CN117275661A true CN117275661A (en) 2023-12-22
CN117275661B CN117275661B (en) 2024-02-09

Family

ID=89220067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311567874.0A Active CN117275661B (en) 2023-11-23 2023-11-23 Deep reinforcement learning-based lung cancer patient medication prediction method and device

Country Status (1)

Country Link
CN (1) CN117275661B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117637188A (en) * 2024-01-26 2024-03-01 四川省肿瘤医院 Tumor chemotherapy response monitoring method, medium and system based on digital platform
CN118039062A (en) * 2024-04-12 2024-05-14 四川省肿瘤医院 Individualized chemotherapy dose remote control method based on big data analysis
CN118280512A (en) * 2024-04-05 2024-07-02 泰昊乐生物科技有限公司 Personalized treatment scheme recommendation method and system based on artificial intelligence

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071388A (en) * 2019-06-10 2020-12-11 郑州大学第一附属医院 Intelligent medicine dispensing and preparing method based on deep learning
CN112420154A (en) * 2020-11-25 2021-02-26 深圳市华嘉生物智能科技有限公司 New coronary medication suggestion method based on deep learning neural network
CN113257416A (en) * 2020-12-09 2021-08-13 浙江大学 COPD patient personalized management and tuning method, device and equipment based on deep learning
CN113255735A (en) * 2021-04-29 2021-08-13 平安科技(深圳)有限公司 Method and device for determining medication scheme of patient
CN113270189A (en) * 2021-05-19 2021-08-17 复旦大学附属肿瘤医院 Tumor treatment aid decision-making method based on reinforcement learning
WO2021226064A1 (en) * 2020-05-04 2021-11-11 University Of Louisville Research Foundation, Inc. Artificial intelligence-based systems and methods for dosing of pharmacologic agents
WO2022067189A1 (en) * 2020-09-25 2022-03-31 Linus Health, Inc. Systems and methods for machine-learning-assisted cognitive evaluation and treatment
CN114330566A (en) * 2021-12-30 2022-04-12 中山大学 Method and device for learning sepsis treatment strategy
CN114388095A (en) * 2021-12-22 2022-04-22 中山大学 Sepsis treatment strategy optimization method, system, computer device and storage medium
CN114783571A (en) * 2022-04-06 2022-07-22 北京交通大学 Traditional Chinese medicine dynamic diagnosis and treatment scheme optimization method and system based on deep reinforcement learning
CN115050451A (en) * 2022-08-17 2022-09-13 合肥工业大学 Automatic generation system for clinical sepsis medication scheme
CN115831340A (en) * 2023-02-22 2023-03-21 安徽省立医院(中国科学技术大学附属第一医院) ICU (intensive care unit) breathing machine and sedative management method and medium based on inverse reinforcement learning
CN115985514A (en) * 2023-01-09 2023-04-18 重庆大学 Septicemia treatment system based on dual-channel reinforcement learning
CN116453706A (en) * 2023-06-14 2023-07-18 之江实验室 Hemodialysis scheme making method and system based on reinforcement learning
CN117010476A (en) * 2023-08-11 2023-11-07 电子科技大学长三角研究院(衢州) Multi-agent autonomous decision-making method based on deep reinforcement learning

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071388A (en) * 2019-06-10 2020-12-11 郑州大学第一附属医院 Intelligent medicine dispensing and preparing method based on deep learning
WO2021226064A1 (en) * 2020-05-04 2021-11-11 University Of Louisville Research Foundation, Inc. Artificial intelligence-based systems and methods for dosing of pharmacologic agents
WO2022067189A1 (en) * 2020-09-25 2022-03-31 Linus Health, Inc. Systems and methods for machine-learning-assisted cognitive evaluation and treatment
CN112420154A (en) * 2020-11-25 2021-02-26 深圳市华嘉生物智能科技有限公司 New coronary medication suggestion method based on deep learning neural network
CN113257416A (en) * 2020-12-09 2021-08-13 浙江大学 COPD patient personalized management and tuning method, device and equipment based on deep learning
CN113255735A (en) * 2021-04-29 2021-08-13 平安科技(深圳)有限公司 Method and device for determining medication scheme of patient
CN113270189A (en) * 2021-05-19 2021-08-17 复旦大学附属肿瘤医院 Tumor treatment aid decision-making method based on reinforcement learning
CN114388095A (en) * 2021-12-22 2022-04-22 中山大学 Sepsis treatment strategy optimization method, system, computer device and storage medium
CN114330566A (en) * 2021-12-30 2022-04-12 中山大学 Method and device for learning sepsis treatment strategy
CN114783571A (en) * 2022-04-06 2022-07-22 北京交通大学 Traditional Chinese medicine dynamic diagnosis and treatment scheme optimization method and system based on deep reinforcement learning
CN115050451A (en) * 2022-08-17 2022-09-13 合肥工业大学 Automatic generation system for clinical sepsis medication scheme
CN115985514A (en) * 2023-01-09 2023-04-18 重庆大学 Septicemia treatment system based on dual-channel reinforcement learning
CN115831340A (en) * 2023-02-22 2023-03-21 安徽省立医院(中国科学技术大学附属第一医院) ICU (intensive care unit) breathing machine and sedative management method and medium based on inverse reinforcement learning
CN116453706A (en) * 2023-06-14 2023-07-18 之江实验室 Hemodialysis scheme making method and system based on reinforcement learning
CN117010476A (en) * 2023-08-11 2023-11-07 电子科技大学长三角研究院(衢州) Multi-agent autonomous decision-making method based on deep reinforcement learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
PANAGIOTIS SYMEONIDIS等: "Deep Reinforcement Learning for Medicine Recommendation", 《2022 IEEE 22ND INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)》, pages 85 - 90 *
傅群: "基于机器学习模型的他克莫司在肾移植受者中的个体化给药研究", 《中国硕士学位论文全文数据库 医药卫生科技辑》, no. 02, pages 1 - 75 *
吴青等: "基于深度Q网络的BECT治疗药物左乙拉西坦用药剂量推荐", 《中国现代应用药学》, vol. 39, no. 12, pages 1585 - 1590 *
董云云: "基于医学影像和基因数据的肺癌辅助诊断方法研究", 《中国博士学位论文全文数据库 医药卫生科技辑》, no. 1, pages 072 - 136 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117637188A (en) * 2024-01-26 2024-03-01 四川省肿瘤医院 Tumor chemotherapy response monitoring method, medium and system based on digital platform
CN117637188B (en) * 2024-01-26 2024-04-09 四川省肿瘤医院 Tumor chemotherapy response monitoring method, medium and system based on digital platform
CN118280512A (en) * 2024-04-05 2024-07-02 泰昊乐生物科技有限公司 Personalized treatment scheme recommendation method and system based on artificial intelligence
CN118039062A (en) * 2024-04-12 2024-05-14 四川省肿瘤医院 Individualized chemotherapy dose remote control method based on big data analysis

Also Published As

Publication number Publication date
CN117275661B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN117275661B (en) Deep reinforcement learning-based lung cancer patient medication prediction method and device
CN109599177B (en) Method for predicting medical treatment track through deep learning based on medical history
CN110880362B (en) Large-scale medical data knowledge mining and treatment scheme recommending system
US9370689B2 (en) System and methods for providing dynamic integrated wellness assessment
CN109087706B (en) Human health assessment method and system based on sleep big data
JP7019127B2 (en) Insulin assessment based on reinforcement learning
CN111105860A (en) Intelligent prediction, analysis and optimization system for accurate motion big data for chronic disease rehabilitation
CN111798954A (en) Drug combination recommendation method based on time attention mechanism and graph convolution network
Javad et al. A reinforcement learning–based method for management of type 1 diabetes: exploratory study
CN116453706B (en) Hemodialysis scheme making method and system based on reinforcement learning
US20200203020A1 (en) Digital twin of a person
US20210089965A1 (en) Data Conversion/Symptom Scoring
JP6962854B2 (en) Water prescription system and water prescription program
CN114732402B (en) Diabetes digital health management system based on big data
Wang et al. Prediction models for glaucoma in a multicenter electronic health records consortium: the sight outcomes research collaborative
US11887736B1 (en) Methods for evaluating clinical comparative efficacy using real-world health data and artificial intelligence
Oroojeni Mohammad Javad et al. Reinforcement learning algorithm for blood glucose control in diabetic patients
CN116525117B (en) Data distribution drift detection and self-adaption oriented clinical risk prediction system
Dogaru et al. Big Data and Machine Learning Framework in Healthcare
US20230386656A1 (en) Computerized system for the repeated determination of a set of at least one control parameters of a medical device
Mohanty et al. A classification model based on an adaptive neuro-fuzzy inference system for disease prediction
CN118588226B (en) Training method, optimizing method and device of antiepileptic medicinal strategy optimizing model
Rad et al. Optimizing Blood Glucose Control through Reward Shaping in Reinforcement Learning
Rodriguez Leon et al. Prediction of Blood Glucose Levels in Patients with Type 1 Diabetes via LSTM Neural Networks
Ranganathan et al. Intelligent Inhalation Therapy for Cystic Fibrosis Using IoT and Machine Learning Solutions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant