Genome Rearrangement

Genome rearrangements shuffle and manipulate gene locations within a genome. Genome rearrangements often harm an individual, but very occasionally, can create an adaptation or even species break. Genome rearrangements are significantly more powerful at modifying the “genomic code” than pointwise mutations.

Comparisons

A synteny block is a region of the genome encoding particular genes. Synteny diagrams can be used to compare chromosome location between two organisms.

For example, the synteny block diagram below depicts mouse genes location (right) to the corresponding homologs in their human counterpart (left). Synteny diagrams also typically have direction on the gene, since the genes are encoded in order.

Comparing the X chromosome in mammals can be especially interesting, because it is highly conserved.

Human-mouse synteny diagram

If each synteny block has a unique ID, and a sign (for direction), an arrangement of $n$ synteny blocks can occur in $n! \times 2^n$ ways.

Reversal

Reversal changes the direction of genes in a certain interval. The bounds for a reversal are called breakage points; when comparing two genomes, one may wish to find a set of reversals mapping one genome to the other.

The number of reversals possible for $n$ synteny blocks is the number of breaks that can be chosen, i.e.

$${n + 1 \choose 2} = \frac {(n + 1)!} {2! (n - 1)!} = \frac {n(n + 1)} 2$$

One may find the number of revesals to permute an arrangement into another.

Finding the number of reversals is pretty fast: $2n - 1$ at most reversals would be required in the worst case: move all elements to the correct position ($n$) + flip all elements to the correct sign ($n$) - 1 because the last element was already in the correct place after already moving $n - 1$ elements.

One can define adjacencies and breakpoints in the permutation of synteny blocks. Adjacencies are when a pair of consecutive elements is in-order ($x_n + 1 = x_{n+1}$ in the sequence $x$). A breakpoint is considered a pair of consecutive elements that do not form an adjacency. To differentiate between the identitity permutation ($+1, +2, +3, …$) and its reversal $…, -3, -2, -1$, we include two implicit positions when counting adjacencies: $x_0 = 0$ and $x_{n+1} = n + 1$.

The reversal distance (number of reversals to transform to the identity permutation) must be at least $\frac {Count(breakpoints)} 2$, because each reversal maintains interior adjacency distance due to the sign flip: $a_i - a_{i - 1} = -(a_{i - 1} - a_i)$.

Graph Representation

We can construct a graph where each edge represents a synteny block, with the edge’s direction (head->tail, tail->head) denoting the sign of the synteny block.

2-break Sorting

Chromosomes can either split apart (fission) or fuse together (fusion). Fusion, fission, and reversal are all considered 2-break operations, because in the graphical representation, these operations break two edges, and reconnect the corresponding nodes to each other.

2-break sorting constructs the sequence of 2-breaks needed to transform one synteny block permutation into another. Given two synteny graphs $P$ and $Q$, one can construct a $BreakpointGraph$ by merging the edges of $P$ and $Q$, find a non-trivial alternating cycle in the $BreakpointGraph$, then apply a 2-break such that fission of the cycle occurs.