arXiv:1503.04992 Reliable scaling of Position Weight Matrices for binding strength comparisons between transcription factors
What started as a nuisance that many bioinformaticians face every day turned into an exciting question for Xiaoyan: How can we compare scores from PWMs? While it is clear that a PWM score is somehow related to the affinity of a transcription factor to DNA, through a scaling factor called lambda, it’s not easy to determine the actual value of that parameter. Experimental methods to do so are complex and expensive, and theoretical methods to approximate lambda are usually based on a multitude of noise genomic data.
In this paper, we introduce two simple methods to derive lambda. Both are based on different assumptions, but produce very similar parameter ranges. While the first method focusses on the top hits that are deemed to be ‘reliable’ (on the basis of genome statistics), the second method takes the calculated residence time of the factor on the DNA into account. That latter method is particularly useful in cases where two alternative PWMs exist for the same protein, and lambda serves to scale their scores so that they’re comparable.