Assessing the enrichment significance of a possition weight matrix (PWM) along a DNA sequence
Résumé
We present a novel statistical approach to evaluate if a given Position Weight Matrix (PWM) is significantly enriched in a given sequence. We define the weighted count of a PWM in a sequence without choosing any arbitrary threshold and we propose a compound Poisson approximation for the weighted count distribution, which appears more accurate than a Gaussian distribution. Our method, called PWMstat, is based on an efficient algorithm to simulate the ad-hoc compound Poisson distribution and provides then an enrichment p-value. By comparison with Bioprospector, an existing motif discovery tool, we obtained that Bioprospector scores do not generally reflect well the enrichment significance of a PWM. Our method is illustrated on the Noc binding site in Bacillus subtilis.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...