The content of the invention
The purpose of the present invention is exactly to solve the above problems, there is provided a kind of disease automatic coding and system, according to
Doctor inputs diagnosis, and with reference to the parsing participle and semantic understanding to case history, automatic reference standard diagnostics library ICD-10 is encoded;
Its correct possibility can be assessed coding result.
To achieve these goals, the present invention adopts the following technical scheme that:
Disease automatic coding, comprises the following steps:
Step (1):Receive input data:The input data includes:Raw diagnostic data and patient file data;
Step (2):Raw diagnostic data and patient file data to input pre-process;
Step (3):The pre-processed results obtained with step (2) are in GB/T 14396-2016《Classification of diseases and code》And
Retrieved in international disease criterion sorting code number ICD-10, judge whether to obtain result, if obtaining result, direct exports coding
As a result;If it is not, into step (4);
Step (4):Word segmentation processing, association's conversion processing and search matching tree are carried out to pretreated raw diagnostic data
Processing, optimal result then is filtered out from the result of matching tree, judges whether it is optimal result, if so, then entering step
Suddenly (7);If not optimal result, is judged as whether current word segmentation processing result contains two and its above disease name, if
Then enter step (5), otherwise into step (6);
Step (5):The result of step (4) is split as several single disease names, for each single
Disease name carries out word segmentation processing, association's conversion processing and search matching tree processing, is then sieved from the result of matching tree
Optimal result is selected, into step (7);
Step (6):Patient file data progress word segmentation processing, unstructured data to step (1) input are converted to knot
Structure data, association's conversion processing and search matching tree processing, then filter out optimal result from the result of matching tree,
Into step (7);
Step (7):The coding accuracy of assessment result, exports coding result and accuracy evaluation result.
The pretreatment of the step (2) includes:Punctuation mark is removed, variant Chinese character is converted into upright letters, by double byte character
Be converted to half-angle character.
The word segmentation processing, refers to:Will sentence segmentation be segmented for several diagnosis keywords, the diagnosis keyword,
Including representing the qualifier for limiting modification and the primary keyword for representing disease;The qualifier refers to descriptive nature, position or journey
Spend the word of type;The primary keyword refers to the word for describing disease, abnormal structure, abnormal body or abnormal symptom;
Association's conversion processing, refers to:Obtained qualifier will be segmented and primary keyword is marked in medical semantic network
On, association's conversion is carried out respectively to qualifier and primary keyword using medical semantic network, by former primary keyword and former main key
The new primary keyword obtained after word association conversion is carried out with the new qualifier obtained after former qualifier and former qualifier association conversion
Permutation and combination, finally give all combinations between primary keyword and qualifier in raw diagnostic data;
Such as:Basal ganglia infarction, it is Basal ganglia and infarct after word segmentation processing, is converted by semantic network, Basal ganglia association
Basal ganglia, brain stem and brain are converted into, infarct association is converted into infarct and infraction, then the knot after this two keywords and conversion
The result of fruit combination includes:Combination 1:Basal ganglia _ infraction, combination 2:Basal ganglia _ infarct, combination 3:Brain stem _ infraction, combination 4:Brain
Dry _ infarct, combination 5:Brain _ infraction, combination 6:Brain _ infarct.Combination 1 to 6 is exactly all combinations.
The search matching tree processing, refers to:Institute between the primary keyword and qualifier that are obtained according to association's conversion processing
Have combination, searched out from matching forest and the matching tree of leaf is completely covered corresponding to each combination, result be a matching set,
Several matchings are set or without result;
It is described to filter out optimal result from the result of matching tree, refer to:
Step (a1):The keyword quantity on matching tree that will match to carries out descending arrangement and compared respectively, if ranking
First identical with second or have with the identical to rank the first it is multiple, then by the keyword quantity split with matching tree
Keyword ratio of number carry out ascending order arrangement compare;If ranking the first of obtaining identical with second or with ranking the first
Identical have it is multiple, then into step (a2);
Step (a2):Qualifier and primary keyword are overlapped in the conversion distance of medical semantic network, superposition is tied
Fruit carries out descending arrangement and compared, if rank the first identical with second or having with the identical to rank the first multiple, enters
Enter step (a3);
Step (a3):The matching degree of matching tree is calculated, the matching degree of the matching tree is equal to the main pass that participle obtains
The quantity of keyword and the primary keyword ratio of number for matching tree, descending arrangement is carried out to ratio of number and is compared;If rank the first
It is identical with second or have with the identical to rank the first multiple, then terminate;
In step (a1)-step (a3), if optimal result only has one, that is, the result to rank the first only has one
(second place and follow-up all different from first place), then it represents that current matching tree is Optimum Matching tree.
It is described to judge whether current word segmentation processing result contains two and its above disease name, basis for estimation be word it
Between conjunction whether there is, if conjunction be present, then it represents that current word segmentation processing result contains two and its above disease name, if
In the absence of conjunction, then it represents that current word segmentation processing result does not contain two and its above disease name.
In the step (6), case history text is non-structured text, after carrying out word segmentation processing to patient file data,
Unstructured data after word segmentation processing is converted into structural data, structural data is stored classifiedly according to generic,
Generic includes:Personnel, organ, time, place, frequency, symptom, operation, medicine, medical history, divide from structural data is corresponding
The information related to diagnosis is extracted in class as supplement keyword;Association's conversion processing and search are carried out to supplement keyword again
Handled with tree, then optimal result is filtered out from the result of matching tree, into step (7);It is described related to diagnosis
Information includes:Family history, Genetic history, disease property and time in pregnancy period;
The coding accuracy of the assessment result, exist from the matching degree of result and raw diagnostic data, diagnosis keyword
Converting for medical semantic network diagnoses order of the keyword with diagnosing keyword in standard diagnostics in distance, raw diagnostic data
Three angles of otherness are assessed;
The matching degree of the result and raw diagnostic data, it is:In all matching tree results matched, count first
Calculate every group of qualifier being syncopated as and primary keyword total quantity and set the ratio between the qualifier included and primary keyword total quantity with matching
Value, as the first ratio;Secondly the ratio of number of primary keyword of the primary keyword quantity being syncopated as with matching tree is calculated, is
Second ratio;The matching degree of second ratio and the first ratio, as result and raw diagnostic data.
The diagnosis keyword is in the conversion distance of medical semantic network:Keyword will each be diagnosed medical semantic
Network is transformed into the path length that diagnosis keyword corresponding to matching tree passes through and is denoted as a transformation ratio, calculates all match
Diagnosis keyword transformation ratio natural logrithm sum, as diagnose keyword medical semantic network conversion distance.
By taking basal ganglia infarction as an example, correspond in cerebral infarction, Basal ganglia is transformed into brain, and conversion weight is 0.3, infarct conversion
To infarct, conversion weight is 1, then using the form calculus weighted superposition result of natural logrithm sum as ln (0.3)+ln (1)=-
1.204。
The otherness of order of the keyword with diagnosing keyword in ICD-10 is diagnosed in the raw diagnostic data, is:It is first
First calculate same position of the diagnosis keyword in raw diagnostic data and in ICD-10 position sequence difference, then calculate
The absolute value sum of the sequence difference of all diagnosis keywords.
As a result coding accuracy evaluation formula:
Y=wTX+b;
Wherein, y is estimation accuracy, and X is vector (x1,x2,x3), wherein x1Represent of result and raw diagnostic data
With degree, x2Represent diagnosis keyword in the conversion distance of medical semantic network, x3Represent to diagnose keyword in raw diagnostic data
With the otherness for the order that keyword is diagnosed in ICD-10.
Represent three process datas, w in matching processTFor vectorial w transposition, w vectors are (w1,w2,w3), wherein, w1,
w2,w3, b is constant.
In the step (1),
Raw diagnostic data, scope include:The discharge diagnosis of diagnosis, first page of illness case in patient file, pathological diagnosis with
And the external cause diagnosis of Injuries and poisoning;
Patient file data, including:From first page of illness case, enter discharge record, progress note, operation record, pathological replacement or
Examine audit report, side information.
The side information includes:Age, sex, site of pathological change, disease property, peri-operation period, hospital infection disease, sheet
It is secondary diagnosis and treatment purpose, main diagnostic message, inspection, pathology, imaging information, familial, heredity, old, sequelae, congenital
Property disease, operation or the mode of production.
Qualifier, including:Position, disease property, orientation, disease parting, degree etc., for example, it is left side, right side, acute, first
Nature, icteric, leaf etc. on lung.
Keyword is diagnosed, including:Disease, abnormal structure's composition etc., such as pneumonia, deformity, wandering kidney etc..
The cutting structure formed between disease and disease, such as A diseases cause B diseases with B diseases (parallel construction), A diseases
Sick (modification limiting structure), A diseases (B diseases), (progressive structure) etc..
The word segmentation processing, refer to according to GB/T 14396-2016《Classification of diseases and code》And international disease criterion point
Class encodes ICD-10 and full cutting is carried out to the raw diagnostic data after data cleansing, and each word is as diagnosis in cutting result
Keyword;Keyword is diagnosed, including:Represent the qualifier for limiting modification and the primary keyword for representing disease;
Semantic network is a kind of structured way that knowledge is represented with figure;In a semantic network, information is expressed
For one group of node, node is connected with each other by the directed line of one group of tape label, for representing the relation between node.
The medical semantic network is the semantic network of medical field, and the node body of medical semantic network is medical domain
Concept, the medical domain concept node are connected with other medical domain concept nodes, and each medical domain concept node is again
It is connected with the disease concept form of expression node of itself;Each medical domain concept node is also general with property concept node, degree
Read node, position concept node or the connection of body concept node;Relation between the node of the medical semantic network is medical treatment
Relation between field concept.
The medical field concept, including:The physiology region of anatomy, body tissue, composition, disease exception, bacterial virus, disease
Reason, disease property;
Relation between medical domain concept, including:Correlation, transforming relationship, correlation weight, transforming relationship power
The relation of weight and concept to specific manifestation.
Correlation between medical domain concept and concept, including:The including, be affiliated of concept, abstract or specific manifestation;
Transforming relationship between medical domain concept and concept, including:Concept it is close or identical;
By medical semantic network, the association and conversion of real concept, so as to expand the hunting zone of concept and association's model
Enclose;And the corresponding specific manifestation of each concept is various informative, and not only include the title of written specification, also cover reality
The colloquial title used, thus specification term and the incompatible of practical application of official standard diagnosis of having prevented and conflicting is asked
Topic.
More than one for disease quantity in raw diagnostic data, diagnosis cutting result can be divided into two or more portions
Point, each part includes a primary keyword and corresponding qualifier.During matching, various pieces are searched as a group input
Rope matching tree.Some standard diagnostics include multiple diseases, are made up of so it matches leaf some, and each part has
Qualifier and keyword, and include the relation between each several part.Relation includes, with concurrently, leading between described each several part
Cause, not with exclusion etc..
Matching forest includes several matching trees, each matching tree, including:Tree root, trunk, branch and leaf;Described
Tree root with tree represents diagnosis concept, shows as ICD codings;The trunk of the matching tree represents the performance diagnosis name of diagnosis concept
Claim;(usual standard diagnostics only have one for the concrete composition part of the branch expression diagnosis concept performance diagnosis name of the matching tree
Individual disease, trunk is one;When shelves standard diagnostics include multiple diseases, trunk is respective amount);The tree of the matching tree
Leaf represents the qualifier and primary keyword of the concrete composition part of diagnosis concept performance diagnosis name.
Match the forming process of forest:Single standard diagnostics are a concepts, and concept includes several forms of expression;Often
Kind of the form of expression have the structure of oneself, the conceptual entity included and comprising each conceptual entity between correlation;Each
Concept that standard diagnostics represent, structure, the conceptual entity included and comprising each conceptual entity between correlation tree
Structure represents, is defined as matching tree, and the matching tree of all standard diagnostics forms matching forest, also, is referred to according to icd standard
South, in forest is matched, priority and inclusion relation be present between matching tree.
The form of expression, such as:Title.
Every kind of form of expression has the structure of oneself:Side by side, progressive explanation, cause and effect etc.;
Every kind of form of expression has the conceptual entity included of oneself:Symptom, disease, operation etc.;
Every kind of form of expression has the correlation between each conceptual entity included of oneself:Keyword and qualifier, limit
Periodical repair decorations etc.;
The root of the matching tree represents diagnosis concept, and its concept shows as ICD codings;Such as:What Meniere disease was stated
Concept is that a kind of pathological change is labyrintine hydrops, clinical manifestation be the rotatory vertigo of recurrent exerbation, fluctuation Hearing,
Tinnitus and the idiopathic disease of inner ear of the vexed swollen sense of ear.
Due to concept be abstract things, it is necessary to which a unique mark identifies to correspond to, title is one kind of concept
Performance, and exactly the corresponding of each disease identifies standard diagnostics coding ICD, also just turns into the unique of each concept naturally
Mark, that is, the performance of the concept of diseases.
Such as the concept of Meniere disease expressed above, in standard diagnostics, the ICD of Meniere disease is encoded to H81.000,
So this ICD codings H81.000 is exactly the performance of the concept of Meniere disease, meanwhile, the root as the matching tree of Meniere disease.
The trunk of the matching tree represents diagnosis concept performance diagnosis name;Such as:The performance title of Meniere disease concept
There are Meniere disease, auditory vertigo and labyrintine hydrops.
The branch of the matching tree represents the concrete composition part of diagnosis concept performance diagnosis name, such as mitral stenosis
With tricuspid insufficiency, there are two branches, respectively mitral stenosis and tricuspid insufficiency;
The leaf of the matching tree represents the qualifier of the concrete composition part of diagnosis concept performance diagnosis name and main pass
Keyword;Such as:Auditory vertigo, primary keyword are dizziness, and qualifier is auditory.
If conjunction be present, when the result of step (4) is split as into several single disease names, at conjunction pair
Word carries out cutting:
For example the structure containing multiple diseases is:
A) qualifier (0 is either multiple) _ primary keyword _ conjunction _ qualifier (0 or multiple) _ primary keyword
B) qualifier (0 is either multiple) _ primary keyword _ qualifier (0 or multiple) _ primary keyword
C) qualifier (0 is either multiple) _ primary keyword _ qualifier (0 or multiple) _ primary keyword
Continue to add multiple qualifiers (0 or multiple) structure as _ primary keyword below.
Word segmentation processing is carried out to medical record data file using natural language processing program ansj_seg, the result after processing is
The data of structuring;The word separated is stored classifiedly according to generic, such as is categorized as personnel, organ, the time, place,
Frequency, symptom, operation, medicine, medical history etc. store classifiedly the data as structuring;From the data of structuring, extract and examine
The information for correlation of breaking, such as:The cause of disease such as perinatal period, bacterium, fungi such as gestation, childbirth, puerperium, family's disease, genetic disease
Or the disease property such as congenital, posteriority, damage, the external cause of poisoning, the cytomorphology classification of cancer.
The search matching tree processing, there is three kinds of situations,
The first is no result, and the result of output is sky, then the reason for prompting is without matching result is original diagnostic information
Deficiency.
Second is to have result, and the optimal result selected that finally sorts only has one, then using this optimal result as
Final matching results export.
The third is that have result, and the optimal result selected has multiple, then output result is sky, and is prompted without matching result
The reason for be to have multiple matching degree identical results, and using multiple optimal results as prompt message a part export;Need
Re-enter diagnosis and more detailed information is provided in original basis.
Disease automatic coding system, including:Memory, processor and storage are run on a memory and on a processor
Computer instruction, when the computer instruction is executed by processor, complete following steps:
Step (1):Receive input data:The input data includes:Raw diagnostic data and patient file data;
Step (2):Raw diagnostic data and patient file data to input pre-process;
Step (3):The pre-processed results obtained with step (2) are in GB/T 14396-2016《Classification of diseases and code》And
Retrieved in international disease criterion sorting code number ICD-10, judge whether to obtain result, if obtaining result, direct exports coding
As a result;If it is not, into step (4);
Step (4):Word segmentation processing, association's conversion processing and search matching tree are carried out to pretreated raw diagnostic data
Processing, optimal result then is filtered out from the result of matching tree, judges whether it is optimal result, if so, then entering step
Suddenly (7);If not optimal result, is judged as whether current word segmentation processing result contains two and its above disease name, if
Then enter step (5), otherwise into step (6);
Step (5):The result of step (4) is split as several single disease names, for each single
Disease name carries out word segmentation processing, association's conversion processing and search matching tree processing, is then sieved from the result of matching tree
Optimal result is selected, into step (7);
Step (6):Patient file data progress word segmentation processing, unstructured data to step (1) input are converted to knot
Structure data, association's conversion processing and search matching tree processing, then filter out optimal result from the result of matching tree,
Into step (7);
Step (7):The coding accuracy of assessment result, exports coding result and accuracy evaluation result.
A kind of computer-readable recording medium, is stored thereon with computer instruction, and the computer instruction is held by processor
During row, following steps are completed:
Step (1):Receive input data:The input data includes:Raw diagnostic data and patient file data;
Step (2):Raw diagnostic data and patient file data to input pre-process;
Step (3):The pre-processed results obtained with step (2) are in GB/T 14396-2016《Classification of diseases and code》And
Retrieved in international disease criterion sorting code number ICD-10, judge whether to obtain result, if obtaining result, direct exports coding
As a result;If it is not, into step (4);
Step (4):Word segmentation processing, association's conversion processing and search matching tree are carried out to pretreated raw diagnostic data
Processing, optimal result then is filtered out from the result of matching tree, judges whether it is optimal result, if so, then entering step
Suddenly (7);If not optimal result, is judged as whether current word segmentation processing result contains two and its above disease name, if
Then enter step (5), otherwise into step (6);
Step (5):The result of step (4) is split as several single disease names, for each single
Disease name carries out word segmentation processing, association's conversion processing and search matching tree processing, is then sieved from the result of matching tree
Optimal result is selected, into step (7);
Step (6):Patient file data progress word segmentation processing, unstructured data to step (1) input are converted to knot
Structure data, association's conversion processing and search matching tree processing, then filter out optimal result from the result of matching tree,
Into step (7);
Step (7):The coding accuracy of assessment result, exports coding result and accuracy evaluation result.
Beneficial effects of the present invention:
1. solve doctor's raw diagnostic correspond to standard diagnostics can only be by being accomplished manually, mainly by coder
The medical knowledge and coding specification knowledge understood by itself, could complete the problem of this works.It is partial to breach needs
By language understanding, the difficulty thought deeply by medical knowledge.Solve the unfettered limitation of term that doctor inputs diagnosis, no doctor
Vocabulary standard, which can refer to caused same diagnosis concept, but several diagnosis names and substantial amounts of different specific literary styles, so as to
The problem of difficult is compareed with standard diagnostics.
2. efficiently solving the standard diagnostics that each medical institutions use encodes skimble-scamble problem.Autocoding is used
Afterwards, raw diagnostic is corresponded on same set of standard diagnostics coding, and criteria for classification is unified, is protected during medical institutions' data exchange
Card standard is unified.
3. criteria for classification is stable, it is unstable to solve coder's criteria for classification caused by the description word by raw diagnostic
It is fixed, the problem of coding is inconsistent is corresponded to before and after same diagnosis several times.
4. using computer program autocoding, the human resources of flood tide are not only saved, and the very big amplitude of efficiency carries
Height, accuracy compares h coding's raising and criteria for classification is unified.Diagnosis number caused by a province (such as Shandong Province) in theory
Amount can be completed to encode within several hours.
5. automatic diagnosis coding advantageously ensures that medical treatment, teaching, the data-searching accuracy of scientific research, and disease packet
DRGS development.Population health information platform, the health medical treatment data standard of the unified authority of structure, autocoding help performance
Power acts on.
6. because autocoding is quick and classification is stable, high-volume in a short time can be achieved original case history is encoded
Classification, can be that the big data application of medical field and artificial intelligence quickly prepare and arranges data, based on the field
Function plays the role of irreplaceable.
7. coding becomes more meticulous, in cataloged procedure, consider the middle disease for carrying conjunction, the disease with conjunction is torn open
Point, multiple single diseases are split into, the fineness of disease code result is so further ensured that, is provided for the use of scientific data
Sound assurance.
Embodiment
The invention will be further described with embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, disease automatic coding, comprises the following steps:
Step (1):Receive input data:The input data includes:Raw diagnostic data and patient file data;
Step (2):Raw diagnostic data and patient file data to input pre-process;
Step (3):The pre-processed results obtained with step (2) are in GB/T 14396-2016《Classification of diseases and code》And
Retrieved in international disease criterion sorting code number ICD-10, judge whether to obtain result, if obtaining result, direct exports coding
As a result;If it is not, into step (4);
Step (4):Word segmentation processing, association's conversion processing and search matching tree are carried out to pretreated raw diagnostic data
Processing, optimal result then is filtered out from the result of matching tree, judges whether it is optimal result, if so, then entering step
Suddenly (7);If not optimal result, is judged as whether current word segmentation processing result contains two and its above disease name, if
Then enter step (5), otherwise into step (6);
Step (5):The result of step (4) is split as several single disease names, for each single
Disease name carries out word segmentation processing, association's conversion processing and search matching tree processing, is then sieved from the result of matching tree
Optimal result is selected, into step (7);
Step (6):Patient file data progress word segmentation processing, unstructured data to step (1) input are converted to knot
Structure data, association's conversion processing and search matching tree processing, then filter out optimal result from the result of matching tree,
Into step (7);
Step (7):The coding accuracy of assessment result, exports coding result and accuracy evaluation result.
As shown in Fig. 2 medical semantic network is made up of the oriented relation between node and node, node includes conceptual entity
And point of specific manifestation, conceptual entity include the disease, region of anatomy, body tissue, composition, disease in the classification of medical field
Matter etc..Oriented relation between node include comprising, belong to, specific manifestation, abstract, nearly justice etc..Such as in figure:Uveitis is
One conceptual entity, is inclusion relation with conceptual entities such as posterior uveitis, anterior uveitis;It is scorching with pigmented film eye and tunica vasculose
It is nearly justice/synonymy;Uveitis belongs to illness in eye simultaneously.
As shown in figure 3, each standard diagnostics matching tree construction includes:Concept is diagnosed, specific manifestation form is standard diagnostics
Coding, can be using pictute as tree root;Diagnosis concept specifically show title, can be one be also likely to be it is multiple, can be with shape
As being described as trunk;For each title, multiple diseases or referred to as some, each disease or part may be wherein included
Can using pictute as trunk, between disease and disease in other words part part between pass of the relationship description between branch
System;And keyword possessed by each disease or part can be described as leaf.When the combination of some keywords can be covered completely
The keyword (leaf) of lid disease (branch), then mean that matching has suffered this disease;Match simultaneously between each disease
Relation character standardization diagnosis in each disease or partial relation, then represent matching in this disease title, Jin Erbiao
What is reached is the content of this concept of diseases statement or the subclass of a subdivision.
System and method is included with lower module and algorithm:
1. the characteristic matching network in standard diagnostics storehouse:
Single standard diagnostics essence is a concept, and concept has many forms, is segmented again under every kind of form of expression
Connecting each other between conceptual entity, and the conceptual entity of subdivision and structure, the conceptual entity of subdivision is in medical semantic network
In there is same or similar conceptual entity to associate and convert again, so, the concept and its knot that each standard diagnostics represent
Structure and comprising concept can be represented with the structure of tree, form a matching tree, and the matching tree of all standard diagnostics is formed
Match forest.
Matching forest is combined with medical semantic network, constitutes new characteristic matching network:By semantic network, realize
Association and conversion, so as to expand search and association's scope, and can completes the matching of standard diagnostics by characteristic matching.
2. diagnosis is split and conceptual entity identification module:
Natural language processing is carried out to raw diagnostic, after character pre-processing, medical concept entity is identified, gives standard
Diagnostic characteristic matching network is used to be marked on matching network.
In identification process, medical concept dictionary refines from actual traffic data, thus in medical field ratio
In general dictionary is related to more professional and deep.The structure of diagnosis is parsed in medical concept entity procedure is identified, judges to examine
Disconnected is reasonable and normative, for lifting accuracy in the matching process.
3. the matching algorithm of diagnosis:
Keyword and structure after splitting will be diagnosed, will be projected on matching network, by the association and conversion of semantic network,
The conceptual entity that will likely be expressed is tagged in matching network on semantic network, and then the conceptual entity of these marks passes through matching
Tree finds the standard diagnostics for the condition that meets.Standard diagnostics to meeting condition, establishing criteria are diagnosed to original diagnostic information and knot
The degree of structure covering, conversion pathway length, and the priority and belonging relation of standard diagnostics, selection are associated on semantic network
Go out most suitable matching diagnosis.
4. case history side information extraction module:
Key message and the side information that diagnosis needs to code are extracted from patient file.For example, age bracket, sex, disease
Property, peri-operation period etc., and this diagnosis and treatment purpose, main diagnostic message, inspection, pathology, imaging information etc., also hand
The information such as art, the mode of production.These information are further as side information all in the case where raw diagnostic obscures or lacks
Clarify a diagnosis use.
5. encode accuracy evaluation module:
During matching and diagnosing, by optimal matching result, by coupling path, and raw diagnostic and the mark that matches
The information can be caused to cover degree and similarity degree of quasi- diagnosis are recorded.By above-mentioned each factor with different weights collect calculating one it is credible
Value, as the foundation for assessing this time correctness of matching.
6. diagnosis and case history input and result output module:
Input module is diagnosed, from interactive interface or electronic health record medical record or enters directly to obtain original examine in case history of leaving hospital
It is disconnected.
Diagnosis side information is obtained, it is necessary to which non-structured patient file is segmented from case history, is converted into structuring
Patient file, extract wherein necessary information.
As a result output module, it is output in interactive interface or specified file or in database.
The step of autocoding of the present invention, is as follows:
1a obtains the diagnosis of input from interface.
1b is obtained from database and diagnosed, and corresponding diagnosis and therapy recording, patient file.If there is patient file, then hand over
Handled by participle program,
Raw diagnostic by diagnostic analysis and conceptual entity identification module, is carried out natural language processing, with medical science semantic net by 2
Based on network, by all possible cutting and identification conceptual entity mode list, and by cutting and recognition result it is unreasonable or
Incomplete result progress beta pruning, each conceptual entity then analyzed to rational cutting and recognition result, and in concept
Whether the syntactic structure that diagnosis is judged in the structure of entity composition is rational structure, verifies cutting identification conceptual entity in turn
Reasonability.
Standard diagnostics matching module below is transferred to be matched from different matching schemes according to the structure after cutting.
3 standard diagnostics matching modules are by the cutting recognition result and structural information of raw diagnostic, in the matching of standard diagnostics
On network, scanned for according to matching algorithm.Each diagnosis concept and modification limit concept in search procedure, successively by tool
Body surface now arrives concept, concept to association and approximation, includes concept, concept and conceptual combinations to standard diagnostics specific manifestation, standard
Diagnose search and conversion pathway of the specific manifestation to standard diagnostics concept.
Meanwhile the degree that includes that can record conversions concepts is searched in conversion process, searching route length, raw diagnostic is split
The concept that the concept that goes out includes with end product meet and level of coverage.
If 4 because matching module caused by necessary information missing in raw diagnostic does not reach result, or draws multiple
Level of coverage is identical but conceptually differ more diagnosis if, now just need to extract side information from patient file.
Patient file is segmented by participle program and is converted into the document of structuring, therefrom extracts the necessary letter related to diagnosis
Breath, it is supplemented in the concept of raw diagnostic fractionation, is scanned for again in matching network.
5 accuracy evaluation modules will be searched in matching process, be associated the path of conversion, be searched for the path of matching, original to examine
The disconnected concept matching degree for splitting concept and standard diagnostics, the reasonability of raw diagnostic structure and journey similar to standard diagnostics structure
Degree, collects calculating by different weights, and the accuracy to code is assessed according to result of calculation.
Although above-mentioned the embodiment of the present invention is described with reference to accompanying drawing, model not is protected to the present invention
The limitation enclosed, one of ordinary skill in the art should be understood that on the basis of technical scheme those skilled in the art are not
Need to pay various modifications or deformation that creative work can make still within protection scope of the present invention.