Complexity and Non-Linearity of Cardiovascular Risk Factors in Older Patients With Multimorbidity and Reduced Renal Function

The aim of this study was to reveal the complex pathophysiology network that may underly associations between chronic kidney disease (CKD), defined with mildly to moderately decreased renal function, and increased CV risk. For this purpose, we used a set of parameters indicating biochemical disorders, known to be associated with CKD. This set of parameters was taken from the larger dataset, where clinical characteristics of older patients with multimorbidity (the existence of two or more chronic diseases at the same person) and mildly to moderately decreased renal function, have been described with multiple parameters. On the selected set of parameters, we applied Machine Learning (ML) methods, to demonstrate relationships between the parameters. We used SMOreg algorithm for developing regression model. At first, we applied the SMOreg algorithm on the dataset to predict C-reactive protein (CRP) and then we used same algorithm to discover pairwise nonlinear relationships between variables such as Age-fglu, Chol-HTC and HB-FE, Age-Homcis, Clear-Homcis, Homcis-TG, CRP-TG. The assessment of non-linear relationships among multiple parameters indicating confounding factors of renal function decline, in older people with multimorbidity, has revealed the close associations between insulin resistance and serum albumin and homocystein levels. Several hypotheses are arising from these study with the potential to facilitate research on the concept of the reverse epidemiology. Although this analytical approach is far from being sufficient to provide the full understanding of the relationships between cardiovascular risk factors and decreased renal function, in older population, by means of the reverse epidemiology, this study emphasizes a need for more integrated and dynamical approaches, when assessing these factors. © 2020 Ljiljana Trtica Majnarić. by Science Repository. All rights reserved.


Introduction
Classical risk factors for atherosclerotic cardiovascular disease (CVD): older age, obesity, hypertension, smoking, elevated serum cholesterol, diabetes and low physical activities, have been known for a long time [1]. Many of these factors are related to unhealthy lifestyles and can be modified. They are widely used as the basis for CV risk assessment. In the last decades, a number of new CV risk factors have been emerging [2]. Some of them indicate: low-grade inflammation, chronic latent infections, hypercoagulability, impaired fibrinolysis, elevated serum homocysteine (a sulfur-containing amino acid) and dyslipidemia characterised with increased serum triglycerides and decreased HDLcholesterol [3,4]. Because the knowledge of the mechanisms and factors underlying CVD rapidly grows, CV risk estimation scores become more and more difficult to prepare [5].
Moreover, an awareness is increasing that CV risk factors have a tendency to appear in clusters. That means that some factors appear in the same person more frequently than it is expected according to their frequency in the population. The example is metabolic syndrome (MS), a cluster of several CV risk factors, including: abdominal obesity, hypertension, diabetes/glucose intolerance and dyslipidemia characterised with increased serum triglycerides and decreased HDLcholesterol. Insulin resistance (an impaired cell utilisation of insulin) has been proposed as the mechanism underlying this syndrome [6]. Prevalence of MS is increasing worldwide, due to ageing of the population and the epidemic spread of obesity, both factors known to be associated with urbanisation and modern lifestyles [7].
Another major public health concern, prevalence of which is increasing due to aging of the population, is chronic kidney disease (CKD) [8]. CKD is one of the major factors associated with unsuccessful aging, that is, aging burdened with comorbidities, polypharmacy and geriatric syndromes, coexisting with disability and frailty [9]. Reduced kidney function is known as a multiple-risk medical condition, with disproportionately high risk for development of CVD [10,11]. An important fact is that CV risk progressively increases as the renal function declines and is already elevated at early stages of renal function impairment [12]. For this reason, the CV risk assessment scores, in many guidelines, have been modified for the impact of renal function decline [13,14].
Despite the strong association between CKD and CVD, the exact mechanisms and nature of relationships between factors that link these conditions, are not well understood. For example, it is known that hypertension, MS and/or diabetes, are the leading causes of CVD [5,15]. When these conditions are taken as alone, they are not sufficient to explain the excess of CV risk in patients with CKD [14,16]. In contrast to what is expected according to the knowledge about the impact of CV risk factors on the development of CVD, some studies showed that decreased, rather than increased values of these factors, can provide explanation for associations between CKD and CVD [17].
These paradoxical relationships can be explained with weight loss and muscle wasting, due to protein-energy malnutrition and chronic inflammation, two shared pathophysiology disorders which occur in majority of patients with CKD [18,19]. Many factors have been identified as causes of this complex condition, in patients with CKD, including: comorbid illnesses, increased oxidative stress, anorexia and low nutrient intake, together with decreased clearance of toxic metabolites and inflammatory cytokines [20]. In this regard, evidence suggest that, in patients with CKD, a variety of confounding factors may influence variations in some CV risk factors, such as serum lipids, lipoproteins and homocystein levels. Some of these confounding factors have been identified and include: chronic inflammation, hypoalbuminemia, deficit of vitamin B12 and folic acid, small residual renal function, disturbed BMI measures and comorbid disordes, in particular diabetes and CVD [18,21].
More is known about CV risk factors in end-stage chronic renal failure than in earlier stages of renal function decline, characterised with mildly to moderately decreased renal function. The aim of this study was to reveal the complex pathophysiology network that may underly associations between CKD, defined with mildly to moderately decreased renal function, and increased CV risk. For this purpose, we used a set of parameters indicating biochemical disorders, known to be associated with CKD. This set of parameters was taken from the larger dataset, where clinical characteristics of older patients with multimorbidity (the existence of two or more chronic diseases at the same person) and mildly to moderately decreased renal function, have been described with multiple parameters. On this selected set of parameters, we applied Machine Learning (ML) methods, to demonstrate relationships between the parameters.

I Data Source
Data for this analysis was used from the multicomponent dataset, which was specifically performed for practicing ML methods. This dataset has been collected during the time period of three years (2007)(2008)(2009)(2010). Data were taken from 93 participants, 35 M/58 F, 50-89 years old (median 69 years). The majority of patients had the diagnosis of hypertension and one third of them had the diagnosis of diabetes or impaired fasting blood glucose (Table 1).
Patients were recruited from several general medicine practices, in the town of Osijek (about 80 000 inhabitants), eastern Croatia, the region with high burden of CV disease, exceeding the average of Croatia. Only participants who gave their signed informed consent were included in the study. The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Faculty of Medicine, University of Zagreb (04-76/2006-396).

II Data Description
Data were in a great part used from primary care (PC) electronic health records (eHRs). For specific biochemical and hematological tests, patients were referred to the Central Laboratory of the Clinical Hospital Osijek, for a venipuncture. All laboratory tests were performed according to the standard procedures.
Creatinine clearance and serum homocysteine were used as measures of renal function decline [22]. Increased serum homocysteine (hyperhomocysteinemia) is an established CV risk factor, which mechanisms of action include increased inflammation and oxidative stress. This condition is associated with impaired folic acid and vitamin B12 metabolism. There are also studies where associations were found between low serum homocysteine concentrations and increased CV risk, which may be the effect of protein energy malnutrition and increased inflammation [23]. Insulin measured in fasting state, glycosilated haemoglobin (HbA1c) and fasting glucose, were used as measures of insulin resistance and glucose intolerance [24]. The level of inflammation was indicated with C-reactive protein (CRP) [25].

III Machine Learning Algorithms
In machine learning area, the aim is prediction of output values based on input values. If the input values are labels, this problem is named as a classification problem. If the output values are numeric values, it is named as a regression problem. Both of classification and regression problems construct a prediction model from the training set. The model performance is evaluated on the test set. In this study, the outputs of our data set are numeric. So, we have considered the Support Vector Regression [26].
The regression approximation focuses the problem of estimating a function = ( ) based on a given dataset = {( , )} =1 , where = { 1 , 2 , . . . , } is input vector, is the real value is the estimated value, is the estimation function, is the number of observations, is the number of features [26].

IV Support Vector Machines (SVMs)
The support Vector Machines(SVMs) are types of learning machines based on statistical learning theory [27]. SVMs are supervised learning methods that have been widely and successfully used for pattern recognition in different areas [28]. Especially in recent years SVMs with linear and nonlinear kernels have become one of the most promising learning algorithms for classification as well as regression [29].
One of the main reasons for the popularity of SVMs is its ability to model complex nonlinear relationships by selecting a suitable kernel function. Briefly, kernel function transforms the input space into a high dimensional feature space where nonlinear relationships can be represented in a linear form.Some popular kernels are linear, polynominal and Gaussian(Radial Basis Function(RBF)) [30].
SVMs offer a novel approach for classification based research. The problem that SVMs try to solve is to find an optimal hyperplane that correctly data points by separating the points of two classes as much as possible [31,26]. Let (for 1 ≤ i ≤ Nx) be the input vectors in input space, with corresponding binary labels ∈ {−1, 1}. Let ⃑ = ( ) be the corresponding vectors in feature space, where ( ) is the implicit kernel mapping, and let ( , ) = ( ). ( ) be the kernel function, implying a dot product in the feature space [32]. ( , ) represents the desired notion of similarity between data X and Y.
( , ) needs to satisfy Mercer's condition in order for to exist [31].
There are a number of kernel functions which have been found to provide good generalization capabilities [33]. The kernel function that has been used in SVM is a linear function and the details of the function are given below The optimization problem for a soft-margin SVM is ⃗⃗ , subject to the constraints (⃗⃗⃑ + ) = 1 − and ≥0, where ⃗⃗ is the normal vector of the separating hyperplane in feature space, and C > 0 is a regularization parameter controlling the penalty for misclassification. Equation above is referred to as the primal equation.
From the Lagrangian form of the equation, we derive the dual problem: Subject to 0 ≤ ≤ C. This is a quadratic optimization problem that can be solved efficiently using algorithms such as Sequential Minimal Optimization(SMO) [34]. Typically, many g oto zero during optimization, and the remaining corresponding to those > 0 are called support vectors. To simplify notation, from here on we assume that all nonsupport vectors have been removed, so that Nx is now the number of support vectors, and >0 for all i.With this formulation the normal vector of the separating plane ⃗⃗ is calculated as Note that because = ( ) is defined implicity, ⃗⃗ exists only in feature space and cannot be computed directly. Instead, the classification ( ) of a new query vector can only be determined by computing the kernel function of with every support vector Where the bias term b is the offset of the hyperplane along its normal vector, determined during SVM training [26,32].

V Support Vector Regression(SVR)
SVMs offer Support Vector Machine for regression (SVR). SVR is carried out with two steps: first, the SVR maps the samples from the input space with a low dimension into a much higher dimensional space with a kernel function, and then searches for the global optimum solution to the corresponding problem using the quadratic programming [35].
SVR attempts to minimize the generalization error bound so as to achieve generalized performance. The idea of SVR is based on the computation of a linear regression function in a high dimensional feature space where the input data are mapped via a nonlinear function. The model generated by SVR only depends on a subset of the training data because the cost function for building the model ignores the training data that is close to the model prediction. SVR has been successfully applied in various fields such as bioinformatics, engineering and financial research [36].

VI Sequential Minimal Optimization (SMO) Algorithm
Sequential minimal optimization(SMO) algorithm is an algorithm for efficiently solving the optimization problem which arises during the training of support vector machines. It was invented by John Platt in 1998 at Microsoft Research. SMO is an iterative algorithm for solving optimization problem. SMO breaks this problem into a series of smallest possible sub-problems, which are then solved analytically [37]. In this study, we used SMOreg algorithm for developing regression model. SMOreg module of Weka allows to implements support vector machine for regression with arbitrary kernel functions. The SMOreg algorithm transforms nominal attributes into binary form and it replaces all missing values globally. This algorithm have number of features that includes fast learning and beter scaling properties [38][39][40].

Results
Our study consists of two stages. At first, we applied the SMOreg algorithm on the dataset to predict C-reactive protein (CRP) and then we used same algorithm to discover pairwise nonlinear relationships between variables such as Age-fglu, Chol-HTC and HB-FE (      WEKA 3.6.8 software was used for analysis. WEKA is a collection of machine learning algorithms for data mining tasks and it is an open source software. The software contains tools for data pre-processing, classification, regression, clustering, association rules and visualization [41]. We selected the Poly Kernel method of SMOreg algorithm with 10 cross-validation to predict C-reactive protein (CRP   In this study, the comparison of the results obtained from several kernel functions is shown in ( Table 6). The comparison parameters are the correlation coefficient (R 2 -value), the Mean Absolute Error (MAE), the Root Mean Squared Error (RMSE) and can be described as follows: Where n is the number of data patterns, yp,m indicates the predicted, tm,m is the measured value of one data point m and ̄, is the mean value of all measure data points [43]. As can be seen from (Table 5), Poly Kernel generated good results. This function obtained a high correlation coefficient of 0.1834. Also, the algorithm has lower error rates among the other algorithms with a mean absolute error of 2.0214 and a root mean square error of 3.9527.

Discussion
Patients with multiple chronic medical conditions and mildly to moderately decreased renal function are frequent in older population. In this study, we performed non-linear regression models on the set of biochemical parameters which indicate pathophysology disorders of such patients. This analysis has enabled some interesting observations about the relationships between parameters in the dataset. As the first observation, a majority of pairwise relationships showed correlations of some degree (correlations with the correlation coefficients which is distinct to zerro), although the power of these relationships were low (Tables 2-4). These results indicate the network association between the parameters. As the second observation, the graphical presentations of some pairwise correlations have revealed the non-linear nature relationships ( Figure 1).
The patterns of these relationships can be described as follows: 1) in the reverse association, an independent variable takes a limited value range; its distribution is scared towards the lower value range of the dependent variable (the HOMCIS-Clear variable pair), 2) an independent variable is grouped tightly around the median value, which is placed at the low part of the value range; the dependent variable takes very close values at the lower part of the value range (the CRP-TG variable pair), 3) uniformely distributed independent and dependent variables, but being placed within the limited value range (the HOMCIS-Age variable pair), and 4) distributions of both, independent and dependent variables, are scared towards lower parts of the value ranges, this tendency being more marked for independent, than dependent variable (the HOMCIS-TG variable pair).
Described non-linear relationships indicate that both, the type of data distribution and the position of data distribution at the value range, should be taken into account, in addition to the power and the direction of correlations, when non-linear relationships between paired parameters are described. As the third observation, some pairwise relationships showed stronger correlation power than others (Tables 2-4). These stronger associations may be used to indicate parameters which are linked to each other within the common network, thus forming the compact subgroups or clusters.
We used the correlation power ≥0.30 as a criterium of parameter clustering. This way, we could recognise several clusters. The first one includes parameters indicating components of MS: fasting blood glucose (Fglu), HbA1c (a measure of average blood glucose concentrations in the recent past), triglycerides (TG), HDL-cholesterol (HDL) and fasting insulin (INS) ( Table 2-4). These components are related to each other (e.g. Fglu showed associations with HbA1c and TG; HDL showed association with HbA1c and reverse associations with TG and INS; TG showed association with INS), which indicates the existence of MS in older patients with decreased renal function. Also evidence supports the coexistence of MS and CKD. These two interrelated disorders, in turn, are known to act synergistically on the development of CVD [15].
As suggested with our results, the link between these two disorders goes via impaired glucose metabolism and the weakened nutritional status (as indicated with associations between the parameters: ALB, PROT and Clear, and parameters indicating components of MS: Fglu and HbA1c) ( Table 2). Although inflammation is a known mechanism of the weakened nutritional status, in this analysis, the parameter CRP, a measure of chronic inflammation, showed only weak correlations with parameters indicating nutritional status: ALB and PROT, as well as with all other parameters. As suggested with these results, in older people with partially reduced renal function, changes (decrease) in serum albumin level is a dominant expression of inflammation-malnutrition, with low level inflammation having only a confounding (mediating) role. This conclusion is a little bit different from the general view on inflammation-malnutrition syndrome in CKD, but is in line with our previous results on the role of inflammation in generating age-related comorbidities [19,44].
This assumption, on the mediating role of chronic, low level inflammation, in the development of malnutrition, in older people with partially reduced renal function, is further supported with another potential cluster, identified with this analysis. This is a cluster which includes increased serum homocystein and decreased serum vitamin B12 and folic acid concentrations, in the context of reduced renal function (as indicated with reverse associations between the parameters VITB12, FOLNA and Clear and the parameter HOMCIS) ( Table 4). According to the knowledge, homocysteine is a sulphuric amino acid, generated from methionine, an intermediate metabolite of protein degradation, but is also converted back to methionine, in the parallel cycle which includes folic acid and vitamin B12 [22]. This metabolic pathway, known as being vitally important, in the processes of DNA methylation and cell regeneration, is emphasized with our results, as well as the connection of this pathway with protein degradation and muscle wasting, which usually occurs in older people with reduced renal function.
Namely, the mechanisms of serum homocystein elevation, in people with CKD, are not completely understood. Disturbations are proposed to exist in different compartments of homocystein metabolism, including: 1) the cell homocystein metabolism, 2) the circulation, due to changes in free to protein-bound homocystein ratios, or homocystein to other sulphuric amino acids ratios, and 3) the renal mechanisms of homocystein disposal [22,45,46]. Our results add value to this puzzle, suggesting the network reaction, which is likely to exist in older people with CKD, including variations in serum albumin and homocystein concentrations, along with increase in insulin resistance and the level of inflammation.
This assumption is supported with two groups of correlated parameters. The first one includes positive correlations between the independent parameters Fglu, INS and ALB and the dependent parameter Clear and reverse correlations between the independent parameters CRP and HOMCIS and the dependent parameter Clear (Table 4). The second one includes the reverse correlations between the independent parameters Fglu, INS and Clear and the dependent parameter HOMCIS (Table 4). A hypothesis which arises from these results is that changes in serum albumin and glucose concentrations may be reflective of disarrangement in homocystein metabolism and/or disposal, when renal function is reduced. Consequently, all these parameters, indicating serum albumin and homocystein levels, insulin resistance and inflammation, as well as residual renal function, should be analysed together, by using a multivariate analytical approach.
What is furter suggested with our results is an assumption that changes in serum glucose and insulin levels, rather than changes in lipids, triglycerides and HDL-cholesterol, are those components of MS which in older people with CKD may explain variations in serum homocystein levels. We hypothesize that the mechanisms which orcestrate insulin and glucose metabolism are more sensitive, than mechanisms of lipid metabolism, on confouding factors which in older people with CKD and MS might underlie changes in serum homocystein concentrations. More precisely, as based on the reverse correlations between the independent parameters INS and Fglu and the dependent parameter HOMCIS, and the positive correlations between the parameters ALB, Fglu and INS and the dependent parameter Clear ( Previous studies reported correlations between serum homocystein and albumin concentrations, in patients with end-stage-renal-disease, but differently to our results, these correlations were found positive [46]. In many later studies, significant correlations were found between serum homocystein and insulin concentrations and insulin resistance, but these results were inconsistent, so that also positive and negative and no correlations have been found [47]. These discrepancies have been explained with differences in study design and population selection procedure [23]. Our study goes a step forward, showing that the better understanding of these relationships is possible by exploring the non-linear correlations between multiple parameter pairs. Namely, until some degree of renal function decline, serum homocystein levels may linearly increase, mostly depending on the homocystein cell-related or circulation-related confounding factors. When conditions of homocystein transition through the kidneys begin to change, due to increased renal permeability, serum homocystein levels start to turn down, rather than to continuously rise. This assumption, in our results, is suggested by the skewed (rather than linearly dependent) pattern of the Clear-HOMCIS parameter pair ( Figure  1). Also knowledge provides a support for this hypothesis. Thus, homocystein disposal by the kidneys starts to increase when either of the following mechanisms become to be disturbed: the integrity of the glomerular filtration membrane (the result is albumin leak from the circulation to urine), renal plasma flow and tubular reabsorption mechanisms [22,23,48].
One of these mechanisms is suggested with our results. As indicated with positive correlations between the independent parameters: HB, Fe, ALB, VITB12, INS and Clear, and the dependent parameter HTC (haematocrit, an indicator of blood viscosity) ( Table 3) and reverse correlations between the parameters HOMCIS and Clear (Table 4), a hypothesis may arise that due to decrease in the level of insulin resistance (as indicated with decreased serum insulin levels), in older people with reduced renal function (and increased serum homocystein concentrations), the blood viscosity (haematocrit) is also likely to decrease, thus influencing changes in blood flow [22].
Results of this study may be put into a wider context of the concept of the reverse epidemiology [49]. Low serum cholesterol levels is considered a hallmark of this state, characterised with muscle wasting and frailty. According to our results, the parameter Chol (indicating serum cholesterol levels) is positively corellated with a range of parameters, including: TG, HDL, ALB, PROT and FOLNA ( Table 3).
As also suggested with our results, decreased renal function (the parameter Clear), followed by increased serum homocysteine levels (the parameter HOMCIS), is associated with decreased serum albumin concentrations (the parameters ALB).
Taken together, these results implicate that the parameter Chol is negatively correlated with the parameter ALB, but positively with some others, including TG, HDL and FOLNA. A hypothesis arises that the state of reverse epidemiology is a dynamical system, the patterns of which may vary depending on the grade of renal function decline. As suggested with our resuts, a decrease in the insulin resistance level, which to some degree is likely to occur in older patients with reduced renal function and icreased serum homocystein levels, may influence the changes of the classical definition of MS. Accordingly, in our results, serum triglycerides values are maintained low (in stead to increase), regardless of changes in some confounding factors, such as increase in serum CRP or homocystein levels ( Figure 1, the correlation patterns of the parameter pairs: CRP-TG and HOMCIS-TG).

Conclusions
The assessment of non-linear relationships among multiple parameters indicating confounding factors of renal function decline, in older people with multimorbidity, has revealed the close associations between insulin resistance and serum albumin and homocystein levels. Several hypotheses are arising from these study with the potential to facilitate research on the concept of the reverse epidemiology. Although this analytical approach is far from being sufficient to provide the full understanding of the relationships between cardiovascular risk factors and decreased renal function, in older population, by means of the reverse epidemiology, this study emphasizes a need for more integrated and dynamical approaches, when assessing these factors.