PWM, PFM, and Content Logo.

Source code deployed on Github.

This project creates svg file of information content logo (or sequence logo)1 for PWM of DNA-binding proteins.

Logo creation algorithm

Here we don’t consider about the experiments like microarray binding assay, data collection, and all other possible processes such as sequence alignments. Our assumption is that we already have the frequency matrix $M_f$.

The information content of position $i$ is given by2: $$ IC_{i} = 2 + \sum f_{b,i} \times \log_2 f_{b,i}$$ where $f_{a,i}$ is the relative frequency of base $b$ at position $i$. Here we use $\log_2$ so that the information content is measured in bits.

In case that the expected average frequency of each base type is $25\%$, information content can also be written as: $$IC_i = \sum f_{b,i}\times \log_2 \frac{f_{b,i}}{0.25}.$$

The height of letter $b$ in column $i$ is then given by $$H_{b,i} = f_{b,i} \times IC_i.$$


The figure format is svg. All the information (height of base type characters) was “manually” written into the svg file.

  1. Sequence logo in Wikipedia. ^
  2. Schneider, T. D., Stormo, G. D., Gold, L. and Ehrenfeucht, A. Information content of binding sites on nucleotide sequences. J. Mol. Biol., 1986, 188, 415–431. ^