Integrated bioinformatics approach reveals methylation-regulated differentially expressed genes in obesity

ABSTRACT Objective: To identify DNA methylation and gene expression profiles involved in obesity by implementing an integrated bioinformatics approach. Materials and methods: Gene expression (GSE94752, GSE55200, and GSE48964) and DNA methylation (GSE67024 and GSE111632) datasets were obtained from the GEO database. Differentially expressed genes (DEGs) and differentially methylated genes (DMGs) in subcutaneous adipose tissue of patients with obesity were identified using GEO2R. Methylation-regulated DEGs (MeDEGs) were identified by overlapping DEGs and DMGs. The protein–protein interaction (PPI) network was constructed with the STRING database and analyzed using Cytoscape. Functional modules and hub-bottleneck genes were identified by using MCODE and CytoHubba plugins. Functional enrichment analyses were performed based on Gene Ontology terms and KEGG pathways. To prioritize and identify candidate genes for obesity, MeDEGs were compared with obesity-related genes available at the DisGeNET database. Results: A total of 54 MeDEGs were identified after overlapping the lists of significant 274 DEGs and 11,556 DMGs. Of these, 25 were hypermethylated-low expression genes and 29 were hypomethylated-high expression genes. The PPI network showed three hub-bottleneck genes (PTGS2, TNFAIP3, and FBXL20) and one functional module. The 54 MeDEGs were mainly involved in the regulation of fibroblast growth factor production, the molecular function of arachidonic acid, and ubiquitin-protein transferase activity. Data collected from DisGeNET showed that 11 of the 54 MeDEGs were involved in obesity. Conclusion: This study identifies new MeDEGs involved in obesity and assessed their related pathways and functions. These results data may provide a deeper understanding of methylation-mediated regulatory mechanisms of obesity.


INTRODUCTION
O besity is a chronic metabolic disorder characterized by an excessive accumulation of body fat resulting from an imbalance between energy intake and expenditure (1,2).Despite nutritional intervention and physical education programs, the prevalence of obesity is increasing in most countries (3).This disease has been considered the 21st century epidemic (1) as it affects more than 650 million adults and 124 million children worldwide (2).Obesity is associated with a number of comorbidities, such as type 2 diabetes mellitus (T2DM), cardiovascular diseases, non-alcoholic fatty liver disease, and some types of cancers, which reduce the quality of life and life expectancy of the affected individuals (4,5).
Obesity is the result of complex and not completely understood pathological processes (6,7).Adipose tissue produces several endocrine factors, cytokines, and chemokines that regulate physiological processes and immune system functions.Moreover, individuals with obesity have increased pro-inflammatory adipokines and chemokines, which contributes to a systemic low-grade chronic inflammation and the development of obesityrelated comorbidities (8).Several agents influence the development of obesity, including environmental factors (especially diet quality and physical activity), gut microbiota composition, endocrine disruptors, and drugs.Moreover, the development of obesity is influenced by a crosstalk between environmental factors, genetic susceptibility, and epigenetic mechanisms (9,10).
Large-scale genome-wide association studies (GWAS) showed more than 500 loci associated with obesity and other related traits (11).DNA methylation -the transfer of a methyl group to the 5-carbon of cytosine in CpG dinucleotides -is a major epigenetic mechanism that regulates gene expression according to the influence of different environmental factors and hormones (6,12,13).Evidence for the role of DNA methylation in obesity has come mostly from animal models (1,13,14).Zhang and cols.( 14) identified 178 differentially methylated genes (DMGs) in the liver of mice with high-fat diet (HFD)-induced obesity, showing that HFD changes the epigenetics of hepatocytes and, thus, contributes to the pathophysiology of obesity.In humans, an epigenomewide DNA methylation association study identified 278 CpG islands associated with variation in body mass index (BMI) in 5,387 individuals (15).
In the last years, gene expression and epigenome profiling arrays have been used to study thousands of genes for a given disease at the same time.Moreover, integrative bioinformatics analysis of the profiling arrays, using systems biology methodology, emerged as a promising approach to identify and classify differentially expressed genes (DEGs) and DMGs (16,17).Therefore, an integrative analysis using gene expression and DNA methylation datasets of patients with obesity, documented in previous studies, could facilitate the identification of new and potential molecular pathological pathways related to this disease.
Separate DEG and DMG analyses have been used to evaluate genes associated with diseases at different regulation levels.However, by identifying methylationregulated DEGs (MeDEGs), an integrated network analysis of DNA methylation and gene expression profiling data may provide a deeper understanding of obesity than individual disconnected analyses (18,19).Thus, in this study, we used an integrative analysis approach to identify MeDEGs in the subcutaneous adipose tissue (SAT) from subjects with obesity and assess in which biological functions and pathways they are involved.We implemented a systems biology approach to analyze the protein-protein interaction (PPI) network.Our analysis may provide new ways to understand the mechanisms and pathways underlying obesity.

Microarray data
Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo) is an international public data repository of microarrays, next-generation sequencing, and other forms of high-throughput functional genomics.The search strategy adopted in GEO was: "obesity" [MeSH terms] AND "Homo sapiens" [Organism] AND ("DNA methylation" OR "Expression profiling" [Filter]).Figure 1 shows that five datasets were retrieved from GEO and included in this study: three of gene expression (GSE94752, GSE55200, and GSE48964) and two of DNA methylation (GSE67024 and GSE111632) analyses in SAT of cases with obesity and lean individuals.Regarding gene expression datasets, GSE94752 (GPL11532; Affymetrix Human Gene 1.1 ST Array) included isolated adipocytes from abdominal SAT samples from nine lean controls and 21 subjects with obesity (20).GSE55200 (GPL17692; Affymetrix Human Gene 2.1 ST Array) included SAT samples from seven lean subjects and 16 metabolic healthy patients with obesity (21).GSE48964 (GPL6244; Affymetrix Human Gene 1.0 ST Array) analyzed gene expression in adipose-derived stem cells (hASCs) from three patients with severe obesity and three individuals without obesity (22).

GEO2R
(http://www.ncbi.nlm.nih.gov/geo/ geo2r/) is an interactive web tool that allows users to compare two or more groups of samples in a GEO Series.This tool was used to analyze the selected datasets and identify DMGs and DEGs in SAT of individuals with obesity compared with lean controls.When a gene symbol corresponded to multiple probe IDs, the average value of these probes was estimated as the representative expression level of this gene.Gene identifiers were mapped according to the HUGO Gene Nomenclature Committee (HGNC) (25) and only valid identifiers were maintained.
DEGs were defined based on an absolute log2-Fold Change (log FC) > 2.0 and P < 0.05.To identify DMGs, |t|>2 and P < 0.05 were used as cutoff points.Using the Excel LOOKUP function (VLOOKUP), datasets were overlapped according to the expression/ methylation profile.Hypomethylated-high expression genes were identified after overlapping upregulated and hypomethylated genes and hypermethylatedlow expression genes were identified after overlapping downregulated and hypermethylated genes (Figure 1).Hypomethylated-high expression genes and hypermethylated-low expression genes were identified as MeDEGs.Results were presented in Venn diagrams, which were constructed in the InteractiveVenn website (26).

Protein-protein interaction network
The PPI network of obesity-related MeDEGs was constructed using the free web-available Search Tool for the Retrieval of Interacting Genes (STRING version 11.0; http://string-db.org/)database (27).An interaction score > 0.4 and P < 0.05 were considered as statistically significant.Results from STRING were imported into Cytoscape 3.8.1 (28) for the network analysis and visualization.

Systems biology approach to analyze the PPI network based on MeDEGs
Complex biological systems may be represented and analyzed as computable networks.Nodes and edges are the basic components of a network.Nodes are connected by edges (which are also called links or lines) and edges show the relationships between nodes (29).In this study, the nodes were the genes and the edges were lines showing the interaction force between them, thus forming a PPI network.
The relevance of each node for the network was assessed by two centrality measures: degree and betweenness.Degree quantifies the number of connections from a given node (30).Highly connected nodes are called hubs and tend to be important control points in the network.Betweenness is the number of minimum non-redundant paths between two nodes that cross a given node (31).Nodes with high betweenness are called bottlenecks and tend to act as major intersections between modules in networks (27,32).In this study, hubs and bottlenecks were defined as nodes in the top 10% of degree and betweenness distributions (33), respectively, with a minimum of two interactions.Hub and bottleneck genes were identified using the CytoHubba plugin version 0.1 (34) for Cytoscape.
Cluster analysis to identify modules within the PPI network was performed using the Molecular Complex Detection (MCODE) version 2.0.0 plugin (35) and considering a number of nodes > 3.0 as a cutoff point.This algorithm identifies in the PPI network densely connected regions that are likely to represent functional interaction complexes, since proteins in the same cluster tend to enrich common biological functions (36).

Functional enrichment analysis
The Gene Ontology (GO) analysis is extensively used to identify the characteristic biological attributes of genes, gene products, and sequences, including biological processes (BP), cell components (CC), and molecular function (MF) (37).The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a collection of databases on genomes, biological pathways, diseases, and chemical substances (38).In this study, GO terms and KEGG pathway enrichment analyses were performed using STRING version 11.0.A hypergeometric test was used to estimate the statistical significance of enriched pathways and P-values were corrected for multiple tests using the Benjamini-Hochberg procedure, which provides a false discovery rate (FDR)-adjusted-P-value (q-value).GO terms and KEGG pathways associated with q < 0.05 were considered significantly enriched.

MeDEG candidate selection
To prioritize and identify candidate genes for obesity, the DisGeNET database (http://www.disgenet.org)was used.The obesity-related MeDEGs identified in this study were compared with experimentally validated and computationally predicted obesity-related genes (C0028754), according to DisGeNET.This database is a comprehensive platform that integrates information on human disease-associated genes and includes data from expert curated repositories, text mining data extracted from scientific literature, experimentally validated data, and referred data (39,40).

DMGs and DEGs in obesity
Figure 1 presents the flowchart of the analysis strategy used in this study.Regarding DEGs, we identified 131 upregulated and 143 downregulated genes after overlapping the three gene expression datasets.Regarding DMGs, we identified 6,083 hypermethylated and 5,473 hypomethylated genes after overlapping the two datasets.After overlapping the five datasets, we identified 29 hypomethylated-high expression (Figure 2A) and 25 hypermethylated-low expression genes (Figure 2B), totalizing 54 MeDEGs.

PPI network construction, module analysis, and identification of hub and bottleneck genes
The PPI network was constructed in the STRING database and includes 19 nodes (MeDEGs) and 17 edges (connecting lines) (Figure 3).The nodes represent the proteins encoded by each gene and the edges represent protein-protein interactions.Of the 54 MeDEGs, 35 had no interconnection within the network, thus, we excluded them from the further analysis.Among the 19 MeDEGs present in the network, 11 were hypomethylated-high expression genes and eight were hypermethylated-low expression genes.Moreover, the   highest interaction score (0.9) was between RNF144B, UBR2, CDC23, and FBXL20, and between HK2 and PMM1 (Figure 3).
We used two algorithms to measure centrality in the PPI network: degree (hub genes) and betweenness (bottleneck genes).We considered the three most connective nodes as hub-bottleneck genes: prostaglandin-endoperoxide synthase 2 (PTGS2), tumor necrosis factor α-induced protein 3 (TNFAIP3), and F-box and leucine rich repeat protein 20 (FBXL20), which might play a critical role in obesity.These three genes participate in several KEGG pathways, including TNF, NF-κB, IL-17, ubiquitin mediated proteolysis, Wnt signaling pathway, and the regulation of lipolysis in adipocytes (Figure 4).
We performed the further analysis of the PPI network to identify the most significantly connected and tightly clustered subnetworks.We selected from the PPI network a significant module using MCODE (Figure 5A).In this module, we identified four genes (CDC23, FBXL20, RNF144B, and UBR2).This module was mainly enriched for cell cycle and ubiquitin mediated proteolysis signaling pathways.Regarding GO terms, these genes were enriched in biological processes and molecular functions related to ubiquitination and nuclear division (Figure 5B).

Functional enrichment analysis of MeDEGs
Table S1 presents the full results of the GO terms and KEGG pathways enrichment analysis for the 54 MeDEGs.Results showed that the 29 hypomethylatedhigh expression genes were involved in biological processes of positive regulation of fibroblast growth factor (FGF) production.Regarding molecular function, MeDEGs were also related to arachidonic acid binding.The 25 hypermethylated-low expression genes were not significantly enriched in GO terms.
Regarding KEGG pathways, the 29 hypomethylatedhigh expression genes participate especially in the IL-17 signaling.The 25 hypermethylated-low expression genes participate in fructose and mannose metabolism, amino sugar and nucleotide sugar metabolism, lysine degradation, and glycolysis/gluconeogenesis pathways.
All 54 MeDEGs were mainly involved in biological processes of regulation of FGF production (Table S1).Moreover, these MeDEGs were related to the molecular function of arachidonic acid and ubiquitin-protein transferase activity (Table S1).

Comparison between the 54 MeDEGs and obesityrelated genes identified in the DisGeNET database
To identify key obesity-related genes, we accessed the DisGeNET database.Of the 54 MeDEGs identified in our study, 11 were also identified in the search in DisGeNET (ALDH2, ALOX5AP, BAMBI, DOCK5, FUZ, HK2, LRRFIP1, OLR1, PTGS2, PTPRJ, and S100A8) (Table 1).Moreover, of the three hubbottleneck genes (PTGS2, TNFAIP3, and FBXL2), only PTGS2 was obesity-related, according to DisGeNET.Thus, FBXL20 and TNFAIP3 could be new candidate genes to be studied in the context of obesity.

DISCUSSION
Previous studies have mainly focused on the association between the expression of individual genes and DNA methylation profiles in the context of obesity.However, bioinformatics analyses allow the combination of data from a large number of DEGs and DMGs and thus the identification of key genes that are simultaneously differentially expressed and methylation-regulated (18).Regarding obesity, studies showing the interaction between genes and epigenetic factors are specially interesting, as several environmental factors influence the development of this disease.
In this study, we identified 54 MeDEGs, of which 29 are hypomethylated-high expression genes and 25 are hypermethylated-low expression genes.Functional enrichment analysis showed that these MeDEGs participate in many biological processes and molecular functions, including the positive regulation of FGF production, arachidonic acid binding, and ubiquitinprotein transferase activity.
FGFs are secreted signaling proteins with various functions in cell proliferation, development, and wound healing (41,42).In the last years, several FGFs have been involved in the regulation of glucose and lipid metabolism (43) and FGF1, FGF19, and FGF21 are the main factors associated with energy metabolism (43,44).Evidence from studies with mice showed that Fgf1 is involved in the expansion of adipose tissue during HFD feeding, which suggests that this protein promotes preadipocyte proliferation and differentiation (45).In accordance with these findings, individuals with obesity have higher FGF1 secretion in SAT compared with lean individuals (46).The beneficial effects of FGF19 and FGF21 on metabolism are more established than those of other members of this family.These two proteins increase energy expenditure and decrease body weight, glucose intolerance, and blood glucose (43).Thus, FGFs and their derivatives might have great potential as new therapies to treat metabolic conditions (47).
Regarding the enrichment of MeDEGs in arachidonic acid binding, few studies reported arachidonic acid acts as a precursor of proinflammatory metabolites, including prostaglandin E2, leukotrienes, hydroxy-eicosatetraenoic acids, and diacylglycerol, which are molecules also involved in insulin resistance (48).Thus, arachidonic acid may also contribute to obesity.MeDEGs were also enriched in ubiquitin-protein transferase activity, which seems to have significant effects on obesity, mainly due to its involvement in inflammation, cholesterol, and glucose metabolism (49).Our results provide new information to understand obesity.
The PPI network analysis identified three hubbottleneck genes: PTGS2, TNFAIP3, and FBXL20.Hubs and bottlenecks are crucial components in signaling networks, as they are relevant genes for biologically significant processes in the disease development (32,48).PTGS2 and TNFAIP3 were identified as hypomethylated-high-expression genes and FBXL20 as a hypermethylated-low expression gene in SAT of individuals with obesity.Moreover, functional enrichment analysis showed that these three hub-bottleneck genes play an essential role in several signaling pathways, including TNF, NF-κB, IL-17, ubiquitin mediated proteolysis, Wnt, and the regulation of lipolysis, suggesting that these genes participate in obesity-related pathways and could be essential to understand the pathophysiology of obesity.
TNFAIP3 encodes TNFAIP3/A20, which is a ubiquitin-modifying enzyme (50,51).TNFAIP3 negatively regulates NF-κB activity, which mediates the effects of pro-inflammatory cytokines, including TNF and IL-1β (52,53).In mice, the overexpression of TNFAIP3 inhibits NLRP3 inflammasome complex, preventing lupus inflammation and renal injury (53).NLRP3 is negatively associated with obesity due to increased inflammation in adipose tissue (54).Accordingly, Rendo-Urteaga and cols.(55) observed a downregulation of TNFAIP3 in high-responder children with obesity to a dietary intervention program, possibly leading to a better inflammatory state by decreased TNF secretion.
PTGSs are rate-limiting enzymes in the conversion of arachidonic acid, a product of damaged cell membranes, into prostaglandins (56).The expression of the PTGS2 isoform (COX-2) is induced by IL-1β, IL-6, and TNF; thus, it is upregulated during inflammation (56).In obesity, adipocyte COX-2 activation seems to upregulate the macrophage migration inhibitory factor (MIF) production via NF-κB activation.In turn, during early stages of inflammation, MIF secretion by adipocytes recruits pro-inflammatory macrophages (M1), leading to the secretion of several pro-inflammatory cytokines, which characterizes a feedback cycle that results in chronic inflammation associated with insulin resistance in obesity (57).
FBXL20 is an E3 ubiquitin ligase that plays a key role in ubiquitin-mediated proteolysis.Evidence show that FBXL2, a protein of the same family, preserves cardiac homeostasis in the face of HFD-induced obesity (58).However, FBXL20 has not yet been associated with obesity or its related comorbidities.Thus, further studies are needed to assess its role in the development of obesity.
Further analysis of the PPI network using MCODE showed a module with the highest interaction between nodes, including four genes (CDC23, FBXL20, UBR2, and RNF144B).These genes were enriched for cell cycle and ubiquitin-mediated proteolysis, which is the major pathway for regulated degradation of intracellular proteins to prevent the accumulation of damaged proteins (59).The upregulation of FTO has been robustly associated with increased BMI and predisposition to obesity and T2DM (60).Zhu and cols.(61) showed that FTO undergoes active ubiquitination on the evolutionarily conserved Lys-216 residue, which leads FTO to proteasomal degradation.This might have a protective effect on energy metabolism, food intake regulation, fat metabolism, and body weight.Taking that into account, the lack of ubiquitin-mediated proteolysis could potentially lead to obesity.
Comparing our results with data available at DisGeNET, 11 MeDEGs identified in our study were also previously associated with obesity in the database.Interestingly, our study also suggested other genes that have a role in obesity that were not included in the database.Therefore, this set of genes might play an important role in the development/progression of obesity.
Although we identified potential candidate genes for obesity using bioinformatics techniques, our study has a few limitations.First, the DNA methylation and gene expression datasets had relatively small sample sizes, which might limit the application of our findings.Moreover, the samples of all DNA methylation datasets and one gene expression dataset (GSE94752) had only women, which might limit the application of our findings for men.Second, we analyzed both DNA methylation and gene expression only in SAT; thus, the results could be different if we had analyzed in other tissues.Lastly, we did not validate our bioinformatics data in SAT samples from patients with obesity and lean individuals due to unavailability of samples.Thus, further studies are needed to confirm our data.Despite these limitations, our findings are important to understand the complex regulation of gene expression in obesity and present new potential targets genes and pathways.
In conclusion, we mapped 54 MeDEGs possibly related to obesity by combined gene expression and DNA methylation profile data analyses.By a series of bioinformatics analyses, we identified hub-bottleneck genes and key pathways involved in obesity.We also presented a list of candidate genes for obesity using DisGeNET.These results may deepen the understanding of epigenetic regulation mechanisms involved in the development/progression of obesity.Finally, molecular biological experiments are needed to confirm the function of the identified genes in obesity and their interactions.

Figure 1 .
Figure 1.Flowchart of the identification of methylation-regulated differentially expressed genes (MeDEGs) and their function in obesity.DMG: differentially methylated gene; DEG: differentially expressed gene.

Figure 3 .
Figure 3. PPI network formed by 19 MeDEGs: protein-protein interaction (PPI).Green nodes represent hypomethylated-high expression genes in obesity and red nodes represent hypermethylated-low expression genes.Diamond nodes represent the hub-bottleneck genes identified in the network analysis.Nodes represent the proteins encoded by each gene and the edges represent the protein-protein interactions.Interaction score refers to "combined score" in the STRING database.The thicknesses of the edges was associated with the combined scored.This score ranges from 0.0 (weak interaction) to 1.0 (strong interaction).

Figure 4 .
Figure 4. Significant KEGG pathways in which participate each hub-bottleneck gene.The color of the circle represents statistical significance, according to the q-value.

Figure 5 .
Figure 5. Module identification within the PPI network.(A) Significant module identified by MCODE.Green nodes represent hypomethylated-high expression genes in obesity and red nodes represent hypermethylated-low expression genes.(B) Significant GO terms and KEGG pathways in which genes participate.The y-axis represents GO terms or KEGG pathways and the x-axis refers to the number of genes enriched on GO terms or KEGG pathways.