1 Introduction

Appendicitis is an infection of the appendix [24]. Appendicitis usually develops acutely within a few hours and usually needs to be operated on quickly [32]. Chronic or subacute appendicitis can also be treated with antibiotics. The appendix is in the right lower abdomen [31]. It forms a 6 to 12 cm bowel at the beginning of the large intestine, which ends blindly in the so-called appendix [25, 51]. In this worm process, food residues or faces and bacteria can accumulate [45]. This favours the development of inflammation [52]. Therefore, appendicitis is one of the most common intestinal infections. In principle, it can occur at any age, with the frequency peak in adolescence and young adulthood [6]. Usually, there is no specific cause that can be identified as the cause of appendicitis. It is suspected that any food particles or stool particles block the appendix [35]. Occasionally, fruit kernels such as cherry kernels are held responsible for laying the appendix [49]. Acute appendicitis usually begins with discomfort and diffuse abdominal pain [53]. The pain typically moves quickly to the right lower abdomen and can become very severe. Coughing or tensing the abdominal wall intensifies the pain. In addition to the pain, nausea, constipation or fever can also occur. Acute appendicitis is diagnosed based on the characteristic symptoms [7]. A blood test and an ultrasound scan can provide additional information about appendicitis. If there is a sufficiently high probability of appendicitis, surgery is usually carried out quickly [41]. If you wait too long with the operation, there is a risk of the inflammation breaking through and developing life-threatening peritonitis.

The process of diagnosis is the usual way known to doctors, where the doctor asks the patient about his symptoms. The doctor uses typical pressure and pain points during the physical examination to check for signs of appendicitis. An ultrasound examination (sonography) can be helpful in the diagnosis [1, 42]. It should be noted, however, that normal sonographic findings do not rule out appendicitis. In case of doubt, a reliable diagnosis can only be achieved by a mirror examination. If confirms the inflammation, an immediate operation can take place. This process that doctors use may be inaccurate to determine the causes of the illness, is the need for immediate surgery, or just the use of treatment. Physicians prefer to resort to artificial intelligence techniques, which are more reliable than traditional methods to determine the type of disease in humans and give accurate decisions about the patient. Figure 1 shows the difference between normal and inflammation appendix [48].

Fig. 1
figure 1

(a) Normal appendix, (b) Appendix inflammation

Due to the above issues, specialists in the clinical field suffer from the problem of quickly and accurately identifying appendicitis and whether this disease is of the acute or subacute type, and have the issue of deciding the reason for this kind of illness. In spite of the fact that there are specialists with experience in this field, the utilization of computers has become a necessity to determine the type of illness and encourages them to make exact decisions.

2,In recent times, Artificial intelligence (AI) has become popular in medical studies because of its ability to detect illness cases fast and with high accuracy [2, 17, 38, 39]. AI can make medicine the best medicine ever. It will save millions of people suffering and make health systems more equitable, more humane, more efficient and safer worldwide [9]. One of the most growing fields of AI thanks to its wide spectrum of application is machine learning [37, 50]. The implementation of machine learning techniques is the goal of this study to evaluate this case through a simple, quick, and accurate estimation method for the early diagnosis of acute appendicitis in youth and kids. The effectiveness of the method of this study for uncomplicated appendicitis and the delivery of complicated forms is operated early.

There are a few types of research on predicting appendicitis with data mining applying a computer. In [8], Alvarado proposes a clinical rating system consisting of laboratory findings, symptoms, and signs for 305 patients. Based on Alvarado’s clinical scoring system (ACSS), many clinical evaluation systems have been improved and modified. Nevertheless, some scholars have noted that the execution of these clinical evaluation systems lacks diagnosis. Image analysis techniques [21] such as those calculated from ultrasound and tomography have significantly greater enforcement than other diagnostic techniques but have some limitations. The nature of a tomographic image depends heavily on radiation exposure, and the diagnostic achievement of ultrasound depends heavily on the operator and cannot be utilized outside of business times. In addition, the image analysis procedure is irregularly the cause of a limited diagnosis of acute appendicitis. Park et al. [44] mention another study on the AI app system for diagnosing appendicitis (acute) with a support vector machine (SVM). A total of 760 patients are used in this research. The performance of this app has been compared with ACSS and MLNN. The performance of this app is greater than that of the ACSS and MLNN with accuracy higher than 99%. In a study by [54], machine learning prediction systems (multilayer perceptron, Bayesian networks, and radial basis function) have been introduced to support doctors to make the right decision and get accurate results as to whether surgery is performed for the patient or not. This study has obtained an accuracy of 95%. Reismann et al. [46], AI and machine learning techniques are applied to distinguish inflammation and behaviour of the appendix on 590 German citizens (0–17 aged). The results obtained by this study have proved the accuracy of the biomarker signature for diagnosis of appendicitis is 90%, while the accuracy to perfectly identify complicated inflammation is 51% on validation data. The closest study to the current study is that by Akmese et al. [3] when the data consisted of 595 medical records and machine learning techniques are applied to predict appendicitis disease and also determine whether or not surgery is needed. The accuracy in this study is 95.31% by the gradient enhancement algorithm. In a study conducted by Marcinkevics et al. [33], they are able to analyse blood samples of children and adolescents with appendicitis applying machine learning techniques (Logistic regression, random forests, and gradient boosting machines). The database of this study consists of 430 cases and aged 0–18 years. The most dependable accuracy result reached by this study is 94% using the random forest classifier.

This scenario attempts to evaluate the exigency of surgery by using a blood specimen’s data of patients as well as the data in blood specimens are analysed to determine the necessity of surgery by using machine learning techniques based on a clinical examination, laboratory parameters, and abdominal ultrasonography. Consequently, it is an endeavour to test the accuracy of the diagnosis associated with the condition, to reduce resource consumption, and to contribute more precise employment of the medical specific assistance. Most of the patients suspected of acute appendicitis, the diagnostic process is achieved based on blood values. This paper explores the rapport between acute appendicitis and statistical methods.

The rest of the article structure is as follows. In the next Section, materials and methods are discussed. In Section 3, experimental outcomes of ML techniques are. The last section discusses the conclusions and future work for improvements.

2 Materials and methods

Machine learning [15, 26] is the area of computer science that concentrates on the analysis and understanding of patterns and data structures that make learning, reasoning, and decision-making possible decisions without user communication. Moreover, machine learning allows the user to feed a computer algorithm with a large number of data, from which the computer analyses all the information and is capable of making decisions and making recommendations based solely on the data entered. In the case of identifying corrections, the algorithm can incorporate that information to enhance future decision making. On the other hand, data mining concentrates on searching large databases to obtain valuable information for decision making [18, 40]. Data mining mechanisms are employed by several scholars for prediction schemes [30]. The main benefit of data mining is the process of data analysis for a large number of scenarios, e.g., medical data and they are executed as follows: Prediction: Predictions about the nature of the patient’s disease. Probability: Determining the best treatment for the patient, either by surgery for him/her or simply by using the medication for treatment. Sequence analysis: An analysis of the results of the surgery or treatment that the patient has used.

Data mining techniques use both evaluation and classification to facilitate data training. Classification is a classic data mining mission, with roots in machine learning and it is used to classify each element in a data collection into a predefined set of categories or classes, as well as measuring classification performance. The data mining processes is carried out in five phases as shown in Fig. 2.

Fig. 2
figure 2

Data mining phases

Now, the five phases will be described:

  1. 1-

    Objective and data collection: The first thing is to focus on the type of data to be obtained.

  2. 2-

    Data processing and management: After collecting data, this data is entered into work. This is perhaps the most complex part. It requires selecting the typical sample on which to carry out the analysis. Once the specimen has been taken, it must be analysed with respect to what type of variables or regression model is to be performed on the specimen.

  3. 3-

    Model Selection: in this phase, a model is designed that gives us a great possible result and a comprehensive analysis of the variables to be incorporated into this model. This task becomes a complicated task because it depends on the type of data to be analysed. Hence, data miners perform various tests of the algorithm, e.g., linear regression, decision tree, time series, neural network, etc.

  4. 4-

    Analysis and review of results: An analysis of all results that are registered with the created model.

  5. 5-

    Updating the model: This phase is the last and most valuable because the data has to be updated regularly and the latest information has to be retrieved and saved in the database.

In this paper, the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology is applied to data mining. It is considered very vital in data science management in ensuring the highest possible quality of data collected. It is a hierarchical process made up of many tasks that offer organizations of the essential structure to achieve better and quicker results in data mining. This methodology includes six principal stages as shown in Fig. 3 [12]: business understanding, data understanding, data preparation, modelling, evaluation, and deployment. Besides, this methodology, which should serve as a guide for raw data mining specialists, is a general model that can be adapted to the needs of a particular company or sector. The main merits of utilising CRISP-DM are long-term strategy, functional templates, and flexibility. The drawbacks that may result from the application of CRISP-DM are the lack of certainty in some of the extracted results that require to be re-implemented again or the neglect of these results. This study is performed using the data from the clinical records obtained from different laboratories between 2016 to 2019. These data confirm that the sufferers with suspected appendicitis have gender, laboratory indicators, length of hospital stay and whether or not they have had surgery. This data is collected from different medical laboratories, which contain patient data, and it is collected and stored in one excel file. In addition, normal values are obtained from Google data. Based on the data collected, it is examined to estimate the need for the patient’s surgery or not. The models in data mining can be categorized into two principal themes: predictive & descriptive. Table 1 shows the difference between these two models.

Fig. 3
figure 3

CRISP-DM stages

Table 1 The difference between predictive and descriptive model

In the design, the data are classified into 2 sets, 70% as training data and 30% as testing data, and this is done after defining the dependent variable. The nature of the data has a significant impact on the result of the estimation. This explains that the pre-processing stage has an important and powerful role in the performance of the above models. This stage is employed to improve the nature of the data. Table 2 presents the data name, type, and description for each record. As for the weights in Table 2, the chi-square weighting technique is utilised to determine them. In addition, the higher the weight of an attribute, the more elevated its significance. Figure 4 shows the traditional surgery (Appendicitis Surgery) for a person 27 yr-old.

Table 2 Dataset (Attributes) with their Summary and weighting
Fig. 4
figure 4

Appendicitis Surgery (These images are downloaded from google image& are free for modify, use and can be shared)

Blood or lifeblood carries O2 and nutrients to all parts of the body so that they can continue to function [22]. It also transports carbon dioxide and other waste materials to the lungs, digestive system, and kidneys, which are responsible for evacuating them outside [16]. Another job, it fights infection and carries hormones throughout the body. Consequently, the exposure of blood cells is important to help diagnose diseases. All cells in the blood are the result of the differentiation and maturation of stem cells, also called hematopoietic precursors [36]. Figure 5 presents blood cell in the body.

Fig. 5
figure 5

Blood cells images (downloaded from google images)

The mean age is 21.15±0.23.

Table 3 shows the blood test result for all collection (HGB, NEU, LYM, MCV, MPV, HTC, DVT, PLT, CRP, and WBC) with their normal value, mean standard deviation for all collection in this study, and p value. These collections reported in Table 2. Besides, Fig. 6 presents the number of females who needed surgery, which is 150, while the number of males who needed are 184, the number of females that do not require surgery is 104 and males are 159. Mean aged for people who need surgery is 21±0.34, while for people who do not need surgery is 21±0.31, and p value is 0.320. A priori power analysis is completed applying the independent collections t-test to reach 95% of the power with an error of 0.05. A power analysis applying the Cohen effect of 0.35 is involved.

Table 3 Blood test result
Fig. 6
figure 6

The number of males and females who need surgery as well as those who do not need

3 Experimental outcomes

In this section, the outcomes of the current study are covered, all specimens are collected from laboratories. This work is completed by software environment, Python and SPSS 22.0 for statistics on Windows 10 operating system and hardware environment: Intel(R) Core (TM) i5 2430M CPU @3.4 GHz, 16Gb DDR4 3200 RAM, and NVIDIA 1080ti 11G graphics card to train the network with 70% as training data and 30% as testing data. One of these models that have been applied has achieved higher accuracy is the Random Forest (RF) [5, 10]. It’s a flexible, easy-to-use ML algorithm that usually delivers an excellent result even without hyper-parameter tuning. A great advantage of RF is that it can be applied for both classification and regression problems that make up the majority of ML systems today. It is one of the supervised learning processes and uses the results of a large number of different decision trees to make the best possible decisions or predictions. This model is essentially based on training the dataset on the bootstrap aggregating technique.

figure a

Accuracy (Acc.): the average of perfectly evaluated specimens to the total no. of specimens. That is, the analysis is the rate of overall correct diagnosis. It can be calculated with the formula 1. Precision (Pre.): is the average of accurately predicted positive specimens to the no. of specimens evaluated in the positive category. It can be calculated with the formula 2. Specificity (Spe.): is defined as the feature of diagnostic test to find healthy specimens among actually healthy specimens. It can be calculated with the formula 3. Sensitivity (Sen.): is the average of accurately predicted positive specimens to the no. of specimens evaluated in the real positive category. It is described as the feature of the diagnostic test to detect health problems among specimens that actually have health problems. It can be calculated with the formula 4.

$$Acc.=\frac{TP+ TN}{TP+ FP+ TN+ FN}$$
(1)
$$Pre.=\frac{TP}{TP+ FP}$$
(2)
$$Spe.=\frac{TN}{TN+ FN}$$
(3)
$$Sen.=\frac{TP}{TP+ FN}$$
(4)

True Positives (TP) is the result of people who are actually suffering, those who are found patients in the diagnostic test (positive class). True Negatives (TN) is the result of people who are actually healthy, meaning that the diagnostic test does not find any patient (negative class). False Positives (FP) A diagnostic test turns out to be false positive if a positive test result is detected even though the person examined does not actually suffer from the disease to be checked. False Negatives (FN) A diagnostic test turns out to be falsely negative if a negative test result is found, although the person examined is suffering from the disease to be checked. Table 4 shows a confusion matrix for diagnostic testing appendicitis.

Table 4 Confusion matrix for diagnostic testing appendicitis

To test the performance of the techniques and their implementation correctly, a reasonable threshold is taken, which is 0.1460. Table 5 exhibits the effects of all techniques. The confusion matrix of the best technique is shown in Table 6.

Table 5 Performance of the techniques with threshold of 0.1460
Table 6 Results of Random Forest Analysis (Confusion matrix)

Now, from the above table, we note the following:

  • The right estimate for people suffering from appendicitis is 88.26% and error is 11.74%.

  • The right estimate for non-surgical patients is 79.25% and error is 20.75%.

  • In addition, the results showed that the percentage of healthy people identified as a patient (Class: no.1_error) was 11.74%, while people identified as healthy (Class: no.2_error) is 20.75%.

Figure 7 presents the accuracy results for all machine learning techniques applied in distinguishing appendicitis for ages 10 to 30 yr.This figure is clear that the proposed random forest acquired a maximal accuracy compared with the comparative methods.

Fig. 7
figure 7

Accuracy percentages of applied ML techniques

Table 7 exhibits a comparison of the results of this work with other works that applied the random forest technique. The main difference between this scenario and reference [3] is in the normal value of the blood specimens and the weights of each specimen.

Table 7 comparison between current study and previous studies

4 Conclusions and future work

Today, Machine learning is an extensively growing technology in medical research areas. In the proposed work, the implementation of machine learning techniques to predict appendix surgery for people, as this study included 625 disease cases with giving results at a high speed and with an accuracy of random forest of 83.75% is optimal performance, while 64.74% for generalized linearis worst performance. Table8 gives the performance and the execution time(seconds) that each technique practised to perform the data analysis.

Table 8 Execution time for each technique

In other words, the current study managed to improve the results of machine learning techniques to determine the presence of appendicitis in people of both genders from blood specimens and to determine whether surgery is needed or just use the treatment, and also helps specialists or doctors in clinics and hospitals to make the right decision in every case of illness. In the future, these techniques will be utilised to analyse a collection of images of people with appendicitis. Additionally, this system can be configured to discover infection with the Coronavirus (COVID-19) pandemic through blood specimens or images.