Artikkeliväitöskirja (G5)

Computational methods for studying epigenomic regulation

Julkaisun tekijätFaux Thomas

KustantajaUniversity of Turku







In the nucleus, DNA is tightly wrapped around proteins in a structure called chromatin in order to protect it from degradation. Chromatin is composed of nucleosomes which are a structure of eight histones around which the DNA is wrapped. Nucleosomes can be modified by enzymes on amino acids located on their N-terminal tails. These modifications allow the chromatin to open and close in targeted regions, providing control over gene expression.

At present, chromatin immuno-precipitation (ChIP) and assay of transposase-accessible chromatin (ATAC) combined with high-throughput sequencing (ChIP-seq and ATAC-seq) are the major high-throughput methods allowing the study of histone modifications and genome-wide chromatin openness, respectively. Typically, ChIP-seq targets one histone at a time by enriching the histone-bound regions of the genome using immuno-precipitation, while ATAC-seq uses a transposase enzyme to cut the open chromatin into fragments of DNA. The DNA fragments obtained from both techniques can be sequenced and aligned against a reference genome. Once the location of the fragments is determined, the genome is scanned for significant enrichment in a process called peak calling. Differential analysis is then used to compare local enrichment-level variations between different biological conditions. Combining ChIP-seq and ATAC-seq data with other information, such as RNA-seq–derived transcriptomics data, can further help to build a comprehensive picture of the complex underlying biology. This work therefore focuses on the development of computational tools to help with the analysis of epigenomics research data.

In this thesis, a robust workflow for the differential analysis of ChIP-seq and ATAC-seq data is developed and evaluated against existing tools using one synthetic dataset, two biological ChIP-seq datasets and two biological ATAC-seq datasets. RNA-seq data is then further correlated with the detected peaks. An efficient replicate-driven visualisation tool is also proposed to visualise coverage of DNA fragments on the genome, which is compared to two existing tools, highlighting its efficiency. Lastly, two studies are presented showcasing the usefulness of the differential analysis approaches in extracting knowledge in a real-life biological setting.

Last updated on 2023-22-03 at 09:30