Classification of Diseases Using a Hybrid Binary Bat Algorithm with Mutual Information Technique

Many features change the results of calculations and change this may be a negative impact on the accuracy of the results, especially if the data used is large. Evolutionary algorithms are used to find the fastest and best way to perform these calculations, such as the bat algorithm (BA) by reducing the dimensions of the search area after changing it from continuous to discrete. In this paper, we will propose a method of linking and conclude the selection of the best and most influential features on the results by neglecting the negative impact features through three stages: the first will be the arrangement of the features columns according to their importance Starting of the most important using mutual information technology and the second stage the process of cutting these features into A certain limit and content with the most important and the calculations using the workbook NAVI_BAIS and then the final stage using the bat algorithm (BBA). The proposed algorithm describes speed, efficiency, and accuracy so that it produces high-precision results based on fewer features . © 2019 Firas Ahmed Yonis AL_Taie. Hosting by Science Repository. All rights reserved


Introduction
Often preferable to find the best and simplest features and extract them to describe the problem of searching in different patterns. The process of selecting the best features can be considered a very important step and also depends on the nature of the problem [1]. Several algorithms for the behavior of physical or biological systems in nature have been proposed as powerful ways to improve global problems. Kennedy and Eberhard proposed a binary version of the known Optimization of a Swarm Particle (PSO) called BPSO, where the conventional PSO algorithm was modified in order to address binary optimization problems, Rashedi et al. proposed a binary version of the Gravitational Search Algorithm (GSA) called (BGSA) for feature selection, and Ramos et al. presented their version of the Harmony Search (HS) for the same purpose in the context of theft detection in power distribution systems [2]. This research contains three stages in the work start from the information technology Mutual information MI, which reduces several features and is very useful if the data is large by arranging and then cut at a certain extent and then start the next stage using the binary bat algorithm BBA that performs a random selection of partial totals Of the feature by the value vector (0,1) Then in the final stage the internal classification process is done using the NAVI_BAIS workbook on partial features [1,3].

Bat Algorithm
Considered as Bats are one of the fascinating creatures and have been of interest to many scientists and researchers for their ability to echolocation, this property can be considered a natural sonar type [4,5]. All bats especially the small bats produce high sound pulses but it is very fast and short then waits the returned of these pulses to it after colliding in the body or a creature, the bat then calculates the distance between it and the body and also has a magnificent guiding technic to be able to recognize the Barriers and the creatures to be able to hunt at night and dark [1]. The completeness of this feature makes it able to distinguish between fixed and moving targets, and this operation is done in a small fraction of time [6,7] .
The author of this algorithm YANG considers the behavior of the bat to be an ideal method of selection and path improvement. He then developed them to resemble a group of bats looking for food/prey using echo-positioning [8]. Some improvement rules have been developed as follows : 1. All bats have echo position property to estimate distance and also know the difference between (obstacles and prey) in an amazing way .
2. Bats fly in randomly velocity ( ) and position( ) with constant frequency ( ) changing in wavelength (λ) and sound loudness ( 0 ), bats have the ability to adjust wavelength or pulses frequency produced from it and is able to adjust the rate of pulse r ϵ [0,1] According to the prey dimension. 3. The loudness of bats can vary in many ways, but YANG assumes that the loudness can change within a given curve from a large positive value 0 to a minimum value ( ) and be constant.
The velocity and the initial position and the frequency for each bat . And each time the T move is determined as the maximum number of iterations, then the default movement of the bat is given by updating the position and updating the velocity using the following mathematical model: Where ( ) represents the velocity of the new bat, and ( ) represents the new position, ̅ represents the best current global solution for a site that can be considered the best for the decision variable and is a random number whose value is between [0, 1]. YANG proposed the idea of using the random paths feature to improve the variance of possible solutions by selecting one of the best current solutions and then applying it to create a new solution for all bats in the group as: Where ε is a random number and has a value of (-1, 1) and ̅ (t) is the average of all sounds coming from all bats in this step. The technique is balanced by adjusting the loudness and pulse emission rate as follows: (α ) and ( ) are constants, of the algorithm and in the first steps the emission rate (0) and loudness (0) are chosen randomly and within the period (0) ϵ [0, 1] and (0) ϵ (0, 1).from the previous equations, it will be noted that bats follow the frequency change, which works to accelerate their access to the best solution.

Binary Bat Algorithm
A binary search space is a separate space where molecules can move to all corners of a hypercube using only 0 or 1 values[9, 10]. This means that the bat algorithm (BBA) is somewhat similar to the basic algorithm (BA), but depending on the search space where the BA operates at continuous distances while BBA search is based on the area restricted by 0 and 1 this process requires a function that changes values between 0 and 1 and is called the transfer function [11] .The process of choosing this function depends on certain conditions in which it must be met: 1-The scope of this function must be related to the range (0,1), as it represents the probability of a particle changing its position. 2-The transfer function should provide the greatest possible change of position to change the absolute velocity value. Some particles with a great absolute value of their velocity can be moved away from the optimal solution and therefore must be switch repositioned in the following iterations 3-The previous phrase applies for the smallest probability to change position even if the absolute velocity was too small. The efficiency of the transfer function depends on its ability to increase the value of return for each increase in the speed of the molecules [12,13]. in other words, whenever the probability of moving away from the best solution must increase its ability to change its position to the right direction and reduce it by reducing speed [14]. In this paper, we rely on the (sigmoid function) as a transport function by which the locations of all molecules are converted to binary values : According to this function can be replaced Eq. (2) with Eq. (8): Where ∼ U (0, 1) . Now equation Eq.8 can provide binary values for the position of any individual from the bat group in the network.

Mutual Information (MI)
There is a lot of information for two interrelated random variables and we can learn about this information and its importance through Mutual information: Where H be the entropy of the discrete random variable X conditioned on the discrete random variable Y taking a certain value [15].
That means the information about y that we got and is meant to be Mutual information ( ; ). Otherwise, ( ; ) = 0 if X and Y are completely unrelated. The information exchanged was mainly applied to filter feature settings to measure the correlation between specific features and some class classifications [16]. There is a classical implementation of the mutual information method in several of the feature classification metrics [17] .
One of the most important applications of this theory is to choose the feature as it will evaluate the importance of the features and arrange them according to their importance, in other words this theory will arrange the features in ascending order according to the impact of each feature on the final results of the data and thus will reduce the number of features included in the calculations, which accelerates the work of the program [18]. In the following relationship, it will be indicated by a wide range of features and classifications: I(F, G) = ∬ P(F, G)log P(F, G) P(F) * P(G) dfdc … … . . (10) Some methods depend on the evaluation of mutual information (MI) between a particular feature and the class label, this procedure is not a problem [14]. The difficulty is if the method evaluates the entire "feature set" [14]. The reason for the need to evaluate these totals of the whole feature in a multivariate way is the communication or correlation between the features. Two individual features may not give enough information about a particular part, but it is possible to obtain sufficient information by combining them. Between N variables (X1, X2 …XN) and the variable (Y), the chain rule is: It is possible to specify the exchanged information as a type of measure to reduce uncertainty about classification labels, where the "fitness function" maximizes the values of information exchanged.

Naïve Bayes Classifier N_B
Naïve Bayes is one of the great models used in the classifications, which calculates the probability that a particular feature belongs to a particular category It is assumed that the constituent features of the research space are restricted and dependent on certain conditions [19]. Naïve Bayes performs mostly well in terms of simplicity of construction and ease of implementation. From the example that describes the "characteristic vector" which is (x_1, x_2, ...., x_n), we will look for the class G which increases the probability of: Naïve Bayes will allow "conditional independence" between features by expressing that conditional probability

Propose Algorithm
Researchers tend to always have dependence on the experience of the fastest methods taking into account the accuracy in the results in this research consists of proposed method MI _BBA_NB consists of three parts, the first part it is done the use of MI technique, which that are working on reduces the size of bigdata by arranging the features according to their importance in progressive order, which starts with the most influential features on the output and finish with the least impact ,at this stage the process of selecting the most important features is done by cutting the series of features after a certain number, which thus neglecting a large part of the features that negatively effect on the accuracy of the results. in the second stage, a binary bat algorithm is applied, in which a partial set of features is randomly selected by a vector of values (0,1). vector includes a string of binary values of 1 and 0 represents a subset of features whereas features that correspond to the number 0 are ignored and a Features that corresponds to the 1 number is chosen. The last stage is where the classification operations are performed within the features selected via BBA by the classifier NB.

Experimental Results
The suggested algorithm MI_BBA_NB is evaluated and compared with another evolutionary algorithm.

I Datasets
The application of the proposed algorithm to a group of bigdata is one way to verify its efficiency in solving classification problems. Table 1 is illustrated applying the algorithm to some data obtained from the UCI repository [20]. The target variable is a binary variable representing a negative or sick state = 0 and a positive or healthy state = 1.

II Evaluation Criteria
Classification efficiency is measured by quality (SP), matthew correlation coefficient (MCC), classification accuracy (AC) and sensitivity (SE) these metrics are defined by the following: Where TP, TN, FP, and FN be the numbers of true positive, true negative, a false positive and false negative of the confusion matrix, respectively, where the values of these criteria represent the strength of the classification process and the proportionality between them is direct.

III Discussion and Analysis
The algorithm proposed in this paper is compared with both the binary genetic algorithm (BGA) and the BBA original algorithm. The data set is divided into 30% as a test group in our experiments and the rest is used as training data. a 20-fold is set to obtain the best reliable rating. in (Table  2), the evaluation criteria are compared with some other evolutionary algorithms. it shows that fewer than the total number of features have been selected and the accuracy of the proposed algorithm classification compared to other classification algorithms. Table 2 shows that the data for both the training group and the test group achieved the best results classification. for example, in the leukemia dataset, the accuracy (AC) of the data for the test group represented 95% in the proposed algorithm MI_BBA and 90% in BBA. And 92% in BGA. By comparing the previous implementation between BBA and BGA with the proposed algorithm MI-BBA_NB, it shows us that it has a great ability in terms of accuracy of classification and also efficiency by applying them to four groups of bigdata.

Conclusion
The MI_BBA_NB method is proposed in this paper in order to improve the performance of the classification of large data, which is based on the selection of important features via the MI method and then apply BBA to the remaining features randomly and then sent to the workbook NB. The results of the proposed MI_BBA method were compared with the results of both BBA and BGA through the (Table2). Experimental results from the dataset in the (Table2) indicate that the proposed MI_BBA method has fewer features and has a classification performance that is higher than both the BBA and BGA.