A1 Refereed original research article in a scientific journal

TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species




AuthorsBarker, Harlan R.; Parkkila, Seppo; Tolvanen, Martti E.E.

PublisherInforma UK Limited

Publishing placePHILADELPHIA

Publication year2025

JournalTranscription

Journal name in sourceTranscription

Journal acronymTRANSCR-AUSTIN

Volume16

Issue2-3

First page 204

Last page223

Number of pages20

ISSN2154-1264

eISSN2154-1272

DOIhttps://doi.org/10.1080/21541264.2025.2521764

Web address https://doi.org/10.1080/21541264.2025.2521764

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/499343308


Abstract

Background: Transcription factor (TF) proteins play a critical role in the regulation of eukaryotic gene expression via sequence-specific binding to genomic locations known as transcription factor binding sites (TFBSs). Accurate prediction of TFBSs is essential for understanding gene regulation, disease mechanisms, and drug discovery. These studies are therefore relevant not only in humans but also in model organisms and domesticated and wild animals. However, current tools for the automatic analysis of TFBSs in gene promoter regions are limited in their usability across multiple species. To our knowledge, no tools currently exist that allow for automatic analysis of TFBSs in gene promoter regions for many species.

Methodology and Findings: The TFBSFootprinter tool combines multiomic transcription-relevant data for more accurate prediction of functional TFBSs in 317 vertebrate species. In humans, this includes vertebrate sequence conservation (GERP), proximity to transcription start sites (FANTOM5), correlation of expression between target genes and TFs predicted to bind promoters (FANTOM5), overlap with ChIP-Seq TF metaclusters (GTRD), overlap with ATAC-Seq peaks (ENCODE), eQTLs (GTEx), and the observed/expected CpG ratio (Ensembl). In non-human vertebrates, this includes GERP, proximity to transcription start sites, and CpG ratio.

TFBSFootprinter analyses are based on the Ensembl transcript ID for simplicity of use and require minimal setup steps. Benchmarking of the TFBSFootprinter on a manually curated and experimentally verified dataset of TFBSs produced superior results when using all multiomic data (average area under the receiver operating characteristic curve, 0.881), compared with DeepBind (0.798), DeepSEA (0.682), FIMO (0.817) and traditional PWM (0.854). The results were further improved by selecting the best overall combination of multiomic data (0.910). Additionally, we determined combinations of multiomic data that provide the best model of binding for each TF. TFBSFootprinter is available as Conda and Python packages.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
This work was supported by the Finnish Cultural Foundation and Fimlab to HB, and Academy of Finland and Jane & Aatos Erkko Foundation to SP.


Last updated on 2025-19-08 at 07:39