Zoonomia
The Zoonomia project compares genomes across many mammals across the animal kingdom. Paper is here with citation:
Zoonomia Consortium. A comparative genomics multitool for scientific discovery and conservation. Nature 587, 240–245 (2020). https://doi.org/10.1038/s41586-020-2876-6
The website is also a cool resource.
Approach
The key contributions of the Zoonomia project are to sequence many mammal genomes, and align them to find orthologous genes across species.
The authors used DISCOVAR used to assemble sequences de-novo, and used the Cactus pipeline (paper, code) to assemble large-scale aligned datasets. Tools for reading/modifying format are provided in the hal toolkit. In my opinion, this work is an impressive accomplishment of computational scaling and algorithmic innovation to push the limits of number of genomes processed, and makes smart use of evolutionary trees to make the many-to-many alignment problem more tractable.
The genome10k project seems to be a continuation and future direction for the Zoonomia project.
Browsing Output
(For more information on file formats for NGS, see note on Sequencing File Formats).
The data investigated here is called 241-mammalian-2020v2.hal
(browser link). It is 806GB and contains the genomes for 241 mammals.
A naive implementation of alignment would require pairwise alignments, or roughly 30k alignments to do. The Cactus alignment algorithm provides significant optimization as it only requires about 500 alignments, by building an alignment tree - rather than aligning every single genome to all the others, inferred ancestor genomes can be aligned to other ancestor genomes, which allows computing pairwise alignments on-demand in a relatively quick manner. The HAL toolkit provides mechanisms for modifying this tree, as well as simple operations traversing the tree such as liftover
to convert gene annotations from one genome to other genomes by traversing the alignment graph (BED, WIG).
The overall tree can be found on the the Zoonomia website. I made a copy with the Tree of Life project here for searching/browsing different branches in this post.
Analysis with HAL and BLAST
The following section attempts to perform some simple analysis on a subset of the Zoonomia dataset.
TP53
The Zoonomia paper hints that large mammals may contain extra copies of the tp53 gene, which could be part of the solution to Peto’s Paradox that cancer rates do not necessarily scale to number of cells in an organism.
The p53 protein is the “guardian of the genome” and it helps to slow reproduction and destroy damaged cells. The gene that codes p53 in humans is TP53 which is given here. P53 is one of the most studied genes, which is why we look at it here in the Zoonomia dataset.
Limiting analysis to Seals using halExtract
Limiting analysis to two similar seals can give us a sense for using HAL tools, and also test a hypothesis that larger sea mammals may have multiple copies of this gene.
Using halExtract
we pull out a sub-tree with just two seals, Mirounga angustirostris (Northern Elephant Seal, genome assembly link) and
Leptonychotes weddellii (Weddell Seal, arctic, genome assembly link):
$ halExtract --root fullTreeAnc216 241-mammalian-2020v2.hal seals.hal
$ halStats seals.hal
confirms we extracted the data correctly.
GenomeName | NumChildren | Length | NumSequences | NumTopSegments | NumBottomSegments |
---|---|---|---|---|---|
fullTreeAnc216 | 2 | 2269286127 | 10692 | 44020772 | 25826238 |
Leptonychotes_weddellii | 0 | 3156886159 | 16710 | 25164644 | 0 |
Mirounga_angustirostris | 0 | 2407319723 | 321256 | 26521233 | 0 |
Finding tp53 in Large Mammal genomes BLAST
Whole-genome analysis can be quite slow overall, so ideally we want to look at some subset of the genome of interest. The “long” way to do this is to do a manual BLAST search for a particular gene (in this case, tp53). We will use BLAST to find the gene within the Homo sapiens Zoonomia genome, then use HAL to compare across species. The main species of interest are the two seal genomes mentioned above, as well as the savanah elephant which is known to have multiple copies of this gene.
To find this gene (or a possibly orthologous one), we start by downloading 25 isoform genes for TP53 from here) and put them into the file tp53.fasta
for querying.
To do the actual querying, we create a BLAST databases for each of the genomes of interest. Then we’ll go ahead and search the databases for the 25 isoform genes of tp53 downloaded above.
function make_blast_db() {
# Pull out a genome into a FASTA file and index it into a BLAST DB.
GENOME_NAME=$1
echo "Extracting $GENOME_NAME FASTA File"
hal2fasta 241-mammalian-2020v2.hal "$GENOME_NAME" > "${GENOME_NAME}.fasta"
mkdir -p "blast/${GENOME_NAME}"
makeblastdb -dbtype nucl \
-in "${GENOME_NAME}.fasta" \
-out "blast/${GENOME_NAME}/db"
}
function search_blast_db() {
# Search an existing blast DB for sequences in a given FASTA file.
GENOME_NAME=$1
QUERY_FASTA_FILE=$2
QUERY_NAME=$(basename $QUERY_FASTA_FILE | cut -d. -f1)
echo "Searching $GENOME_NAME genome for $QUERY_FASTA_FILE"
blastn \
-db "blast/${GENOME_NAME}/db" \
-query "$QUERY_FASTA_FILE" \
-out "${GENOME_NAME}_${QUERY_NAME}.out"
}
# Index and search for tp53.
$ make_blast_db Mirounga_angustirostris
$ make_blast_db Leptonychotes_weddellii
$ search_blast_db Mirounga_angustirostris tp53.fasta
$ search_blast_db Leptonychotes_weddellii tp53.fasta
# Check matches in human and elephant genomes to confirm positive controls.
$ make_blast_db Homo_sapiens
$ search_blast_db Homo_sapiens tp53.fasta
$ make_blast_db Loxodonta_africana
$ search_blast_db Loxodonta_africana tp53.fasta
The results for searching for isoform g (NM_001276761.3
) are:
- For positive control, as expected, the human genome (Homo sapiens) has perfect many perfect alignments within chomosome 17, which matches the expected location of
17p13.1
. There are also many smaller sub-matches across other choromosomes. The matching position is onchr17
chromosome from positions7669699 to 7668421
and positions1230 to 2509
on the Plus(query)->Minus(`chr17) direction. There are also many hits on the same range of tp53 that is in the seal genome. - The African Savana Elephant (Loxodonta africana) is said to have multiple copies of tp53, so we use it as a second positive control. I confirmed 16 copies of the gene with regions of ~800bp in length, as well as the same 100 length sequence found in the arctic seal and northern elephant seal.
- The arctic seal (Leptonychotes weddellii) has good alignment in the
KB715705.1
800kbp scaffold, with a match length of 115bp identity. The position in the seal genome isKB715705.1
scaffold from indexes333689 to 333803
, which matches positions807 to 921
in the tp53 isoform g. - The northern elephant seal (Mirounga angustirostris) has good alignment in the
MirAng_flattened_line_76382
15kbp scaffold, with a match length again of 115 with 91% identity. The position isMirAng_flattened_line_76382
scaffold from index2706 to 2820
, which matches positions807 to 921
in the tp53 isoform g.
Finding tp53 in Seal genomes with halLiftover
The HAL format is optimized for finding similar sequences across genomes.
From the TP53 site we can create a BED file to annotate the Homo_sapiens for the gene in question. (Note that the BED file is expected to be tab-separated). For negative-strand matches returned in BLAST in the form (end, start) we return (start, end) and keep the strand as - per blast_to_bed.py.
# Put the location we found using BLAST into BED format. Use Python to print tabs.
python -c 'print(\
"\t".join(["chr17", "7668420", "7669699", "tp53", "1000", "-"]))' \
> tp53_homo_sapiens.bed
# Run Liftover from Homo_sapiens to Leptonychotes_weddellii.
halLiftover --bedType 6 \
241-mammalian-2020v2.hal \
Homo_sapiens tp53_homo_sapiens.bed \
Leptonychotes_weddellii Leptonychotes_weddellii_tp53_liftover.bed
The result is that halLiftover
found a much longer alignment along, 336374 - 335437, but allowed for more gaps. It’s likely that halLiftover
was more permissive than with BLAST.
Retrying for positive control with the African elephant, Loxodonta_africana, gives large overlap spanning about 1kbp, but I think due to the way HAL handles the duplicated genomes it only finds one single chromosome rather than the 16 that BLAST found.
halLiftover --bedType 6 \
241-mammalian-2020v2.hal \
Homo_sapiens tp53_homo_sapiens.bed \
Loxodonta_africana Loxodonta_africana_tp53_liftover.bed
Looking at Seal Mutation Stats
Let’s look at the mutations across the species with halSummarizeMutations seals.hal
:
GenomeName | ParentName | BranchLength | GenomeLength | ParentLength | Subtitutions | Transitions | Transversions | Matches | GapInsertions | GapInsertedBases | GapDeletions | GapDeletedBases | Insertions | InsertionBases | Deletions | DeletionBases | Inversions | InvertedBases | Duplications | DuplicatedBases | Transpositions | TranspositionBases | Other |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Leptonychotes_weddellii | fullTreeAnc216 | 0.01 | 3156886159 | 2269286127 | 21591222 | 8662095 | 3832333 | 2172960184 | 723272 | 2072770 | 1406051 | 11984200 | 134433 | 167483329 | 38978 | 2957705 | 325 | 1059036 | 52888 | 15135856 | 1918 | 3560549 | 174259 |
Mirounga_angustirostris | fullTreeAnc216 | 0.00218 | 2407319723 | 2269286127 | 16091284 | 10763391 | 4749067 | 2239260145 | 1057598 | 3212915 | 1648709 | 7759364 | 213569 | 61815883 | 48405 | 3791930 | 299 | 461796 | 236031 | 25962771 | 3073 | 6330304 | 29965 |
Total | 0.01218 | 5564205882 | 4538572254 | 37682506 | 19425486 | 8581400 | 4412220329 | 1780870 | 5285685 | 3054760 | 19743564 | 348002 | 229299212 | 87383 | 6749635 | 624 | 1520832 | 288919 | 41098627 | 4991 | 9890853 | 204224 | |
Average | 0.00609 | 2782102941 | 2269286127 | 18841253 | 9712743 | 4290700 | 2206110164 | 890435 | 2642842 | 1527380 | 9871782 | 174001 | 114649606 | 43691 | 3374817 | 312 | 760416 | 144459 | 20549313 | 2495 | 4945426 | 102113 |
To be honest, I’m not really sure what to make of these stats, but we can just keep them here for future reference.
Finding tp53 mutations in all of Zoonomia using halSnps
The analysis now concludes with a search for tp53 across all of Zoonomia mammals.
First, we need to confirm that the alignments we got from BLAST are matching the tp53 gene. To perform a quick sanity check, we export the sequence using hal2fasta
and confirm that the resulting FASTA file’s reverse-complement (per bioinformatics.org) is matching the expected region for tp53 within the BLAST output.
hal2fasta --sequence chr17 --start 7668420 --length 1279 --upper \
--outFaPath Homo_sapiens_tp53_in_hal_file.fasta \
241-mammalian-2020v2.hal Homo_sapiens
Comparing the beginning of the match, we see:
BLAST:
Query 1230 CTCACTCCAGCCACCTGAAGTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCA 1289
||| || |||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 7669699 CTC-CTACAGCCACCTGAAGTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCA 7669641
RC FAST: CTC CTACAGCCACCTGAAGTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCA
The end of the match is the same:
BLAST:
Query 2490 ACAATAAAACTTTGCTGCCA 2509
||||||||||||||||||||
Sbjct 7668440 ACAATAAAACTTTGCTGCCA 7668421
RC FASTA: ACAATAAAACTTTGCTGCCA
With more confidence that the extracted region is what we’ve intended it to be, we can run halSnps
on our 4 previously studied genomes. Assuming halSnps
handles strandedness as BLAST does, it should be fine to search for the reverse complement (seems like it does in the code).
halSnps
will report the number of nucleotide matches (totalSnps) and cross-species mismatches (totalCleanOrthologousPairs) for orthologous genes to the target.
We run the command and check the output for positive control:
halSnps --refSequence chr17 --start 7668420 --length 1279 \
241-mammalian-2020v2.hal Homo_sapiens \
Homo_sapiens,Loxodonta_africana,Mirounga_angustirostris,Leptonychotes_weddellii
Predictably, Homo sapiens has a perfect match (so it has zero SNPs with the entire sequence length matching). The other species do contain more matches, which was not something discovered using BLAST. It seems like the orthologous genes may be “forced” in the alignment.
targetGenome | totalSnps | totalCleanOrthologousPairs |
---|---|---|
Homo_sapiens | 0 | 1279 |
Loxodonta_africana | 212 | 872 |
Mirounga_angustirostris | 221 | 842 |
Leptonychotes_weddellii | 216 | 844 |
Now that we’ve confirmed reasonable output, we can run halSnps
on all the zoonomia mammals. We test that at least 10% of the species (24) must have a mutation at the position for it to be labeled as a SNP, for smoothing. The 10% threshold was selected somewhat arbitrarily but should account for some noise in the sequencing.
function all_genome_names() {
hal_file=$1
echo $(halStats $hal_file | grep -v -e "fullTreeAnc\|hal v\|^$\|GenomeName" | cut -d "," -f1 | sort | uniq | paste -s -d, -)
}
halSnps --refSequence chr17 --start 7668420 --length 1279 --minSpeciesForSnp 24 \
241-mammalian-2020v2.hal Homo_sapiens \
"$(all_genome_names 241-mammalian-2020v2.hal)" | sed "s/ /,/g"
Certain primates share nearly identical copies of this gene. Future directions may plot similarity of this gene and others with expected lifespan. It could be interesting to cluster genes detected here, as there seems to be a modal peak with ~800bp in common with the human TP53 gene.
The full output for all 241 mammals is:
targetGenome | totalSnps | totalCleanOrthologousPairs |
---|---|---|
Myotis_lucifugus | 2 | 3 |
Homo_sapiens | 0 | 1279 |
Myotis_myotis | 33 | 207 |
Acinonyx_jubatus | 226 | 770 |
Myrmecophaga_tridactyla | 43 | 217 |
Acomys_cahirinus | 179 | 497 |
Nannospalax_galili | 35 | 120 |
Ailuropoda_melanoleuca | 228 | 833 |
Nasalis_larvatus | 65 | 1263 |
Ailurus_fulgens | 248 | 834 |
Neomonachus_schauinslandi | 223 | 844 |
Allactaga_bullata | 231 | 822 |
Neophocaena_asiaeorientalis | 201 | 885 |
Alouatta_palliata | 2 | 16 |
Noctilio_leporinus | 4 | 17 |
Ammotragus_lervia | 256 | 879 |
Nomascus_leucogenys | 50 | 1262 |
Anoura_caudifer | 25 | 143 |
Nycticebus_coucang | 24 | 97 |
Antilocapra_americana | 244 | 877 |
Ochotona_princeps | 145 | 420 |
Aotus_nancymaae | 2 | 10 |
Octodon_degus | 232 | 827 |
Aplodontia_rufa | 0 | 0 |
Odobenus_rosmarus | 219 | 829 |
Artibeus_jamaicensis | 27 | 145 |
Odocoileus_virginianus | 163 | 536 |
Ateles_geoffroyi | 0 | 3 |
Okapia_johnstoni | 253 | 878 |
Balaenoptera_acutorostrata | 194 | 885 |
Ondatra_zibethicus | 195 | 678 |
Balaenoptera_bonaerensis | 193 | 885 |
Onychomys_torridus | 70 | 209 |
Beatragus_hunteri | 256 | 879 |
Orcinus_orca | 205 | 885 |
Bison_bison | 250 | 875 |
Orycteropus_afer | 144 | 640 |
Bos_indicus | 253 | 875 |
Oryctolagus_cuniculus | 7 | 33 |
Otolemur_garnettii | 29 | 99 |
Bos_mutus | 249 | 875 |
Ovis_aries | 257 | 879 |
Bos_taurus | 252 | 875 |
Ovis_canadensis | 258 | 879 |
Bubalus_bubalis | 175 | 625 |
Callicebus_donacophilus | 1 | 6 |
Pan_paniscus | 10 | 1269 |
Callithrix_jacchus | 2 | 16 |
Panthera_onca | 245 | 861 |
Camelus_bactrianus | 218 | 884 |
Panthera_pardus | 245 | 861 |
Camelus_dromedarius | 219 | 884 |
Panthera_tigris | 227 | 758 |
Camelus_ferus | 219 | 884 |
Pantholops_hodgsonii | 258 | 877 |
Canis_lupus | 205 | 796 |
Pan_troglodytes | 8 | 1279 |
Canis_lupus_familiaris | 205 | 796 |
Papio_anubis | 57 | 1263 |
Capra_aegagrus | 257 | 879 |
Paradoxurus_hermaphroditus | 23 | 121 |
Capra_hircus | 257 | 879 |
Perognathus_longimembris | 0 | 0 |
Capromys_pilorides | 233 | 825 |
Peromyscus_maniculatus | 38 | 78 |
Carollia_perspicillata | 39 | 191 |
Petromus_typicus | 46 | 168 |
Castor_canadensis | 16 | 49 |
Phocoena_phocoena | 202 | 885 |
Catagonus_wagneri | 144 | 477 |
Piliocolobus_tephrosceles | 58 | 1264 |
Cavia_aperea | 237 | 806 |
Pipistrellus_pipistrellus | 7 | 54 |
Cavia_porcellus | 161 | 496 |
Pithecia_pithecia | 1 | 7 |
Cavia_tschudii | 161 | 496 |
Platanista_gangetica | 195 | 885 |
Cebus_albifrons | 2 | 11 |
Pongo_abelii | 28 | 1277 |
Cebus_capucinus | 2 | 11 |
Procavia_capensis | 0 | 0 |
Ceratotherium_simum | 220 | 884 |
Propithecus_coquereli | 53 | 257 |
Ceratotherium_simum_cottoni | 220 | 884 |
Psammomys_obesus | 117 | 354 |
Cercocebus_atys | 58 | 1263 |
Pteronotus_parnellii | 37 | 172 |
Cercopithecus_neglectus | 59 | 1263 |
Pteronura_brasiliensis | 251 | 834 |
Chaetophractus_vellerosus | 174 | 780 |
Pteropus_alecto | 191 | 814 |
Pteropus_vampyrus | 185 | 814 |
Cheirogaleus_medius | 35 | 140 |
Puma_concolor | 121 | 482 |
Chinchilla_lanigera | 210 | 819 |
Pygathrix_nemaeus | 64 | 1263 |
Chlorocebus_sabaeus | 58 | 1263 |
Rangifer_tarandus | 163 | 536 |
Choloepus_didactylus | 18 | 129 |
Rattus_norvegicus | 57 | 229 |
Choloepus_hoffmanni | 19 | 129 |
Rhinolophus_sinicus | 193 | 798 |
Chrysochloris_asiatica | 153 | 593 |
Colobus_angolensis | 41 | 930 |
Rhinopithecus_bieti | 65 | 1263 |
Condylura_cristata | 15 | 54 |
Rhinopithecus_roxellana | 64 | 1263 |
Craseonycteris_thonglongyai | 30 | 105 |
Rousettus_aegyptiacus | 197 | 809 |
Cricetomys_gambianus | 2 | 4 |
Saguinus_imperator | 4 | 21 |
Cricetulus_griseus | 60 | 195 |
Saiga_tatarica | 0 | 0 |
Crocidura_indochinensis | 25 | 66 |
Saimiri_boliviensis | 2 | 12 |
Cryptoprocta_ferox | 242 | 847 |
Scalopus_aquaticus | 12 | 50 |
Ctenodactylus_gundi | 249 | 855 |
Semnopithecus_entellus | 61 | 1263 |
Ctenomys_sociabilis | 233 | 830 |
Sigmodon_hispidus | 86 | 248 |
Cuniculus_paca | 221 | 837 |
Solenodon_paradoxus | 98 | 388 |
Dasyprocta_punctata | 216 | 837 |
Sorex_araneus | 237 | 800 |
Dasypus_novemcinctus | 195 | 786 |
Spermophilus_dauricus | 2 | 24 |
Daubentonia_madagascariensis | 40 | 203 |
Spilogale_gracilis | 238 | 842 |
Suricata_suricatta | 235 | 820 |
Delphinapterus_leucas | 200 | 885 |
Desmodus_rotundus | 27 | 144 |
Sus_scrofa | 144 | 494 |
Tadarida_brasiliensis | 18 | 137 |
Dicerorhinus_sumatrensis | 208 | 881 |
Tamandua_tetradactyla | 38 | 184 |
Diceros_bicornis | 216 | 884 |
Tapirus_indicus | 191 | 885 |
Dinomys_branickii | 219 | 817 |
Tapirus_terrestris | 191 | 885 |
Dipodomys_ordii | 2 | 12 |
Thryonomys_swinderianus | 0 | 1 |
Dipodomys_stephensi | 6 | 22 |
Tolypeutes_matacus | 70 | 307 |
Dolichotis_patagonum | 242 | 814 |
Tonatia_saurophila | 33 | 151 |
Echinops_telfairi | 223 | 689 |
Tragulus_javanicus | 260 | 858 |
Eidolon_helvum | 200 | 815 |
Trichechus_manatus | 198 | 854 |
Elaphurus_davidianus | 250 | 879 |
Tupaia_chinensis | 155 | 608 |
Elephantulus_edwardii | 8 | 15 |
Tupaia_tana | 145 | 558 |
Ellobius_lutescens | 2 | 3 |
Ellobius_talpinus | 79 | 264 |
Tursiops_truncatus | 206 | 885 |
Enhydra_lutris | 250 | 835 |
Uropsilus_gracilis | 9 | 48 |
Eptesicus_fuscus | 32 | 224 |
Ursus_maritimus | 231 | 833 |
Equus_asinus | 204 | 887 |
Vicugna_pacos | 222 | 884 |
Equus_caballus | 205 | 878 |
Vulpes_lagopus | 209 | 796 |
Equus_przewalskii | 205 | 878 |
Xerus_inauris | 171 | 788 |
Erinaceus_europaeus | 257 | 806 |
Zalophus_californianus | 224 | 838 |
Erythrocebus_patas | 55 | 1261 |
Zapus_hudsonius | 33 | 113 |
Eschrichtius_robustus | 191 | 885 |
Ziphius_cavirostris | 195 | 882 |
Eubalaena_japonica | 195 | 885 |
Eulemur_flavifrons | 29 | 122 |
Eulemur_fulvus | 31 | 124 |
Felis_catus | 237 | 824 |
Felis_nigripes | 0 | 0 |
Fukomys_damarensis | 210 | 825 |
Galeopterus_variegatus | 13 | 72 |
Giraffa_tippelskirchi | 249 | 876 |
Glis_glis | 195 | 817 |
Gorilla_gorilla | 8 | 1274 |
Graphiurus_murinus | 207 | 828 |
Helogale_parvula | 238 | 825 |
Hemitragus_hylocrius | 254 | 879 |
Heterocephalus_glaber | 201 | 818 |
Heterohyrax_brucei | 0 | 0 |
Hippopotamus_amphibius | 210 | 862 |
Hipposideros_armiger | 186 | 799 |
Hipposideros_galeritus | 200 | 818 |
Hyaena_hyaena | 236 | 837 |
Hydrochoerus_hydrochaeris | 232 | 822 |
Hystrix_cristata | 214 | 823 |
Ictidomys_tridecemlineatus | 18 | 70 |
Indri_indri | 32 | 135 |
Inia_geoffrensis | 41 | 220 |
Jaculus_jaculus | 17 | 49 |
Kogia_breviceps | 206 | 885 |
Lasiurus_borealis | 221 | 797 |
Lemur_catta | 27 | 113 |
Leptonychotes_weddellii | 216 | 844 |
Lepus_americanus | 5 | 12 |
Lipotes_vexillifer | 201 | 885 |
Loxodonta_africana | 212 | 872 |
Lycaon_pictus | 205 | 796 |
Macaca_fascicularis | 55 | 1257 |
Macaca_mulatta | 52 | 1175 |
Macaca_nemestrina | 54 | 1013 |
Macroglossus_sobrinus | 193 | 803 |
Mandrillus_leucophaeus | 54 | 1262 |
Manis_javanica | 201 | 830 |
Manis_pentadactyla | 203 | 829 |
Marmota_marmota | 13 | 64 |
Megaderma_lyra | 22 | 110 |
Mellivora_capensis | 251 | 836 |
Meriones_unguiculatus | 114 | 350 |
Mesocricetus_auratus | 65 | 202 |
Mesoplodon_bidens | 195 | 883 |
Microcebus_murinus | 38 | 175 |
Microgale_talazaci | 189 | 610 |
Micronycteris_hirsuta | 25 | 136 |
Microtus_ochrogaster | 195 | 680 |
Miniopterus_natalensis | 30 | 198 |
Miniopterus_schreibersii | 30 | 198 |
Mirounga_angustirostris | 221 | 842 |
Mirza_coquereli | 58 | 246 |
Monodon_monoceros | 200 | 885 |
Mormoops_blainvillei | 23 | 129 |
Moschus_moschiferus | 231 | 770 |
Mungos_mungo | 236 | 822 |
Murina_feae | 53 | 265 |
Muscardinus_avellanarius | 59 | 277 |
Mus_caroli | 75 | 242 |
Mus_musculus | 28 | 77 |
Mus_pahari | 71 | 246 |
Mus_spretus | 77 | 252 |
Mustela_putorius | 248 | 822 |
Myocastor_coypus | 236 | 806 |
Myotis_brandtii | 32 | 200 |
Myotis_davidii | 14 | 103 |
Summary
Overall, it seems like HAL and Zoonomia are interesting places to start with comparative genomics. Simply having all the genomes aligned and in one file is quite powerful and combined with AlphaFold (for certain proteins) might yield some results to better understand how mutations may affect function.
Some interesting directions might be to try to take the Fauna Bio approach to compare how hibernating mammals and reptiles can survive in sub-freezing temperatures, and try to understand using halLiftover
if there are any genes that these reptiles and mammals share in common. For example, according to this paper on Reptile Freeze Tolerance, reptiles express genes related to iron binding, antioxidant defense, and serine protease inhibitors in their heart and liver after being exposed to freezing temperatures.