Global Gene Expression Profiling of Body-Mass Index in Middle-Aged Danish Twins

Objective: The body mass index (BMI) measured as weight in relation to height is an important monitor for obesity and diabetes, with individual variation under control by genetic and environmental factors. In transcriptome-wide association studies on BMI, the genetic contribution calls for controlling of genetic confounding that affects both BMI and gene expression. We performed a global gene expression profiling of BMI on peripheral blood cells using monozygotic twins for efficient handling of genetic make-ups. Methods: We applied a generalized association method to genome-wide gene expression data on 229 pairs of monozygotic twins (age 56-80 years) for detecting diverse patterns of correlation between BMI and gene


Introduction
The body mass index (BMI) quantifies the amount of tissue mass including muscle, fat and bone in an individual. It is highly associated with cardiovascular disease and diabetes and has profound influences on life quality and mortality [1][2][3]. In clinical application, BMI is a simple and widely used metric for defining overweight (25≤BMI<30, kg/m²) and obesity (BMI≥30 kg/m²), which are conditions tightly linked to the metabolic syndrome (MetS). Many epidemiological and molecular studies have been conducted to find the genetic and non-genetic (environmental) mechanisms underlying individual BMI variation in order to identify potential causes of MetS and eventually to find strategies for mitigating its burden to public health. Multiple genetic variants have been reported to affect BMI in genome-wide association studies, albeit with the proportion of BMI variation accounted for remaining far from the overall genetic contribution estimated in twin studies, although recent effort based on whole genome sequencing has largely increased the contribution by genomic sequence variations [4,5]. Instead of focusing on the genetic polymorphisms which are static across life span, analysis of gene expression profiles can directly depict the dynamic activity of functional genes in regulating the variation of BMI, especially when controlling for genetics.
In transcriptome-wide association studies (TWAS), genetic variations can confound the relation between MetS related health conditions and gene expression (through functioning as cis-or trans-eQTLs). This is particularly crucial for BMI giving the high genetic contribution [4]. Compared to the ordinary case-control design using unrelated individuals, the use of twins has been proven a valuable approach in controlling genetic background, due to the shared genetic makeups in twins leveraging enriched statistical power [6,7]. Moreover, current TWAS assumes linear relationship between gene expression and traits of interest, thus ignoring the diverse patterns of association in biology. By focusing on twins, we have investigated gene expression profiles in peripheral blood cells in association with BMI by introducing a generalized correlation method to capture different patterns (both linear and nonlinear) of association between gene expression and BMI while controlling for genetic confounding.

I Samples
This study is based on a twin cohort, the Middle Aged Danish Twins (MADT), from the Danish Twin Register [8]. There are in total 229 complete monozygotic twin pairs included in the analysis, 254 male and 204 female twins, with age at sampling ranging from 56 to 80 and BMI from 15.77 to 38.15. Anthropometric measures and whole blood samples were taken over the period from 2008 to 2011. Blood cell counts of 449 individuals were available and the blood cell counts of the other 9 individuals were imputed by the estimate CellCounts function in the R package minfi using DNA methylation data collected on the same blood samples [9,10].

II Global Gene Expression Analysis
Whole blood samples were collected in PAXgene Blood RNA Tubes (PreAnalytiX GmbH, Hombrechtikon, Switzerland) and total RNA extracted using the PAXgene Blood miRNA kit (QIAGEN) according to the manufacturer's protocol. Concentration of the extracted RNA was determined using a NanoDrop spectrophotometer ND-8000 (NanoDrop Technologies), and the quality was assessed by the Agilent 2100 Bioanalyzer (Agilent Technologies).
Gene-expression analysis was performed using the Agilent SurePrint G3 Human GE 8×60K Microarray (Agilent Technologies), a dual-colour high-definition array containing 60K high quality probes of 60-mer. Sample labeling and array hybridization were carried out according to the 'Two-Colour Microarray-Based Gene Expression Analysis -Low Input Quick Amp Labeling'-protocol (Agilent Technologies). Samples were labeled with Cy5 and the reference consisting of a pool of 16 samples was labeled with Cy3. Hybridization, washing, scanning, and quantification were performed according to the array manufacturer's recommendations.

III Data Pre-Processing
The raw intensity data was background-corrected using the NormExp method and was then within-array normalized by Loess normalization method and between-array normalized by quantile normalization [11].
The normalized values were used to calculate log2-transformed Cy5/Cy3 ratios. Missing expression values were imputed by k-nearest neighbors averaging, and replicate probes were collapsed calculating the median. Data pre-processing was performed using the R packages limma [12]. All the probes on the Agilent SurePrint G3 array were re-annotated using GENCODE v.25 gene annotation database (Link).

IV Statistical Analysis
After normalization, we adjusted for covariates including age, sex, cell composition and first two PCs from the PCA (principal component analysis) on the gene expression data. We then applied a generalized measure of association, the generalized correlation coefficient (GCC), to investigate the association between intra-pair difference of BMI and intra-pair difference of expression to control the genetic and shared environmental effects, as proposed by Tan et al. [13]. GCC was computed using a ratio of maximum likelihoods for the marginal distribution and maximum weighted likelihoods for the joint distribution using the R package matie [14]. The mRNA probes with p<0.05 were used for gene set enrichment analysis (GSEA) to detect biological pathways over-represented by the listed probes for functional interpretation [15].

Discussion
The high genetic contribution to BMI variation as estimated in twins, and in family studies, suggests the need to study the relation between transcriptional activity and BMI while controlling for genetics, given the fact that genetic variations could affect both BMI and gene expression [4,[17][18][19]. Instead of using unrelated samples, focusing on genetically identical individuals (e.g. MZ twins) can control for genetic variations in the study samples. However, traditional statistical testing on dependent samples requires modeling the relatedness in sample clusters using, for example, the mixed effects modeling assigning sample correlation as random effect variables with increased model complexity and reduced power in statistical testing. The application of GCC on twin data provides a simple but efficient way for handling related samples in TWAS. Most importantly, generalized correlation is a non-parametric method by nature meaning that its assessment of correlation between gene expression and BMI was done without imposing any assumption such as the linear relationship in regression modeling. The latter is sensitive to outlier observations in model fitting. As a result, our reported gene expression markers detected by the model-free approach are biologically meaningful and significantly enriched in functional pathways closely implicated in metabolism, although, many other pathways e.g. related to blood cell functioning (immune biology and hemostasis) and neuro function were also enriched.
In (Figure 1), the top probes (red coloured) show upward deviation from p values under the null hypothesis suggesting their non-random association with BMI. In accordance, functional annotation and published literature confirm their biological relevance. In (Table 1), the most significant probe (A_33_P3289204, p=2.83e-06) is from AAK1 gene. The gene has been shown as a positive regulator of the Notch pathway which is a novel regulator of metabolism [20,21]. Both AAK1 and Notch signaling are implicated in neurological impairments, which have been shown to relate to metabolic disorders [22][23][24]. The number 2 probe (A_23_P79094, p=7.83e-06) belongs to LILRA3 for which a SNP (rs367070) has been reported to associate with HDL-C [25]. The human LILRA gene family has diverse functions characterized by regulation of inflammation and immune tolerance [26]. The LILRA3 regulated immunity alteration could be involved in obesity, a condition characterized by chronic low-grade inflammation [27]. For the third gene in (Table 1), PAX8, differential DNA methylation was identified to associate with gestational famine exposure and metabolic traits [28].
Other interesting genes linked to top probes of ( Table 1) include PPP1R3A whose SNP variation has been associated with risk of type 2 diabetes and CYP4F12 which belongs to the cytochrome P450 4 (CYP4) family implicated in various biological functions including inflammation, cardiovascular health, and cancer [29,30].
Biological pathway analysis was performed using KEGG and Reactome databases with both revealing significantly enriched functional pathways implicated in metabolic health together with other biological functions. For example, the top pathways in (Table 2) include pathways in cancer and insulin signaling. In the literature, development of insulin resistance and hyperinsulinemia has been shown as a clear link between adipose tissue expansion and etiology of diseases like obesity, type-2 diabetes and cancer [31]. Likewise, the implication of MAPK signaling in obesity-related immune paralysis and cancer has been reviewed very recently [32]. Another top significant pathway, JAK/STAT signaling pathway in (Table 2) is a highly conserved functional pathway required for normal homeostasis which, when dysregulated, contributes to the development of obesity and diabetes [33].
Like the KEGG pathways (Table 2), the importance of immunity is also reflected in the top significant Reactome pathways shown in (Table 3) (immune system with FDR 6.34e-09 and adaptive immune system with FDR 9.40e-06). The phenomenon can be explained by the target tissue used in this study, i.e. whole blood, which comprises of immune cells. Moreover, results in (Table 3) emphasize high implication of GPCR signaling in BMI variation. This is highly interesting as the melanocortin-4 receptor (MC4R), a GPCR embedded in the membranes of nerve cells in the brain's appetite control center, has been shown to provide clues to obesity treatment [34]. Another study has found that pathway of GPCR was overrepresented and it is associated with pediatric obesity [35]. As a potential therapeutic target for intervention in cognitive deficits, the association of GPCR signaling pathway with BMI could reflect the intrinsic connection between cognition and obesity [36,37]. Table 3 also contains significant pathways directly related to metabolism of lipids, lipoproteins and carbohydrates, etc. which are closely related to obesity development as well as pathways overlapping with ( Table 2) including axon guidance reportedly to implicate in earlyonset obesity [38], and olfactory signaling/transduction shown to regulate lipid metabolism through neuroendocrine signaling in Caenorhabditis elegans [38,39]. Methylation of olfactory pathway genes has been associated with dietary intake and obesity features [40].
In summary, by introducing generalized correlation coefficient for assumption-free association analysis and using monozygotic twins to control genetic confounding, this transcriptome-wide association study on BMI using peripheral blood identified differentially expressed genes and their enriched biological pathways implicated in multiple biological functions including immune biology, hemostasis, neural function, cancer, and metabolism. Findings from this study merit replications using independent samples to verify expression markers and functional pathways for characterizing and determining non-genetic etiology of obesity and health conditions.

Statement of Ethics
The survey was approved by The Regional Scientific Ethical Committees for Southern Denmark (S-VF-19980072) and conducted in accordance with the Helsinki II declaration, with informed consent to participate in the survey obtained from all participants.

Funding
This study was jointly supported by the Lundbeck Foundation