I presented a talk to the Plant Chromosome Biology Conference in Vienna in September 2025. The abstract was
Cytogenomics of oat (Avena) genome evolution: Intra- and inter-genomic chromosomal translocations, repetitive DNA dynamics, and reconstruction of ancestral grass karyotypes show genome evolution in action
Pat Heslop-Harrison1,2, Trude Schwarzacher1,2, Paulina Tomaszewska1,3, Qing Liu2
1University of Leicester, Institute for Environmental Futures and South China Botanical Garden, 2South China Botanical Garden, Guangzhou, China, 3University of Wroclaw, Wroclaw, Poland
Chromosome-scale genome assemblies, sequence reads, and fluorescence in situ hybridization (FISH) of Avena species reveal key evolutionary mechanisms in the Aveneae tribe, complementing and often contrasting with the Triticeae wheats. We demonstrate how Avena’s 10-15-fold genome expansion from rice/Brachypodium enables visualization of the ancestral grass karyotype, with uniform chromosome arm size expansion, conserved synteny, repetitive DNA divergence with homogenization, and chromosomal rearrangements including nesting. Notably, Avena diploid and polyploid species show large (>50Mb) distal translocations during evolution, contrasting with fewer, often centromeric translocations in wheat. Avena polyploids exhibit frequent distal intergenomic exchanges, mapped in tetraploids, hexaploids (AACCDD), and octoploids using repetitive probes and in genome assemblies. Repetitive DNA, comprising some 80% of the genome (dominated by Ty3/Gypsy retroelements), shows constrained family amplification and genome-specific homogenization, driving sequence evolution. Structural variation is critical to characterize, and the patterns have implications for both speciation in Aveneae and the ways the biodiversity in the Tribe can be exploited in breeding, through targeted strategies for introgressing wild Avena traits into hexaploid oats.
We are grateful to our co-authors and collaborators; further information and references are at http://www.molcyt.org
THIS POST WILL BE UPDATED TO HAVE MORE THAN THE ABSTRACTS AND DOWNLOADABLE POWERPOINT TALK FILE LATER!
Here is the download link for the 55Mb talk:
Here is the whole presentation: I am uploading at this point, but will amplify my comments later! Looking forward to many discussions about some of these thoughts!
I will be giving another talk focussing on the Large Language Models LLMs and Generative PreTrained Tranformers later in the year: here is the abstract:
Keynote lecture at https://modeldata2025.casconf.cn/ conference
Pat Heslop-Harrison – two page abstract
AI and Multi-Omics Data for Genomics and Sustainable Crop Improvement: From sequences, phenotypes, and foundational LLMs to conservation, biodiversity and breeding
ALTERNATIVE title and keywords: AI-Driven Genomic Interaction Detection: Digital Twins and Deep Learning for Sustainable Agriculture and Crop Improvement OR ELSE The Grammar of Genomes: From Foundational Analysis of Plant Evolution to AI-Powered Predictive Models
Pat Heslop-Harrison, Trude Schwarzacher and Qing Liu
Genetics and Genome Biology, Institute for Environmental Futures, University of Leicester, UK and South China Botanical Garden SCBG, Chinese Academy of Sciences CAS, Guangzhou, China.
Phh4@le.ac.uk or phh@molcyt.com; ts32@le.ac.uk; liuqing@scib.ac.cn
Keywords: Smart agriculture, deep learning, epistasis, structural variation, genomics, sequencing, genome evolution, Avena, multi-modal data, crop modelling, climate resilience, LLMs, AI, transformers.
Deep learning, digital twins, and foundational large language models for genomes offer enormous opportunities for changing our use of genetics and biodiversity, leading to better approaches to reach plant breeding targets, increase agricultural sustainability, and enabling on-farm management optimization. Currently, we have major limitations in the ways we can understand and exploit genomic data (DNA and RNA sequences), and there is a ‘missing heritability’ form most complex traits with genotype only giving weak predictions. The non-linear nature of biological systems – from the interaction between genes (epistasis, pleiotropy, heterosis), the impact of structural variation, to the contribution of the high proportion of non-coding regions of the genome – present challenges that current analyses struggle to resolve.
My talk will give examples of the new frontier and work towards improved use of genome data for evolutionary and predictive models. Genome sequences have parallels with natural languages, encoding short-range information which is modified behaviour (meaning) by distant information. As with generative pre-trained transformers, short k-mers or tokens can be exploited to extract features from genomes. k-mers are a critical part of whole genome assembly, and we been extracting them from short-read DNA sequences to give genome-wide models of diversity, measuring the diversification of repetitive DNA and evolutionary divergence of species. Both k-mer analysis and graph-based read clustering to identify genomics-specific features of grass genomes (Avena, oats; and Urochloa, a tropical forage) and their evolution in diploids and polyploids. With whole genome assemblies, and particularly long-read sequencing spanning chromosomal breakpoints, transposon insertions and fragments of tandemly repeated arrays, we can reconstruct features of chromosomal evolution. I will show examples within single plants – haplotype variation – and between species in the banana family, grasses and sheep. Structural variation with phenotypic consequences can be characterized with these approaches in some cases, and there are potential new analytical tools. A combination of long read and short read genome sequence data helps understand genome size variation in plant families. In the grasses, we show that the 10- to 15-fold genome size increase from an ancestral grass to Avena was not due to interspersed blocks of repeats but occurred via a relatively uniform expansion along chromosome arms. This work also identified evolutionarily-conserved synteny alongside unexpected and extensive translocations of terminal, gene-rich regions, highlighting a dynamic mode of chromosomal evolution in some grass genera. Modulation of gene expression or penetrance though epigenetic mechanisms, genome interactions in polyploids, or epistasis and pleiotropy are fundamental to the evolution of complex traits, contributing to “missing heritability.” We identified the need for advanced statistical models and machine learning algorithms to dissect these non-allelic interactions. We can connect genome mining approaches to important agricultural outcomes, such as identifying genetic targets in forages like high-lipid forage grasses to reducing ruminant methane emissions. Modelling and digital twins can inform improved breeding approaches for methane reduction and animal management, with future data-rich studies enabling the interactions of the forage, animal and rumen microbiomes to be characterized.
With the opportunity for foundational genome models, with data from the base-pair level to gene expression and species comparisons, it is likely that deep learning and AI can define genome structure and evolution into the predictive models required for breeding, smart agriculture, sustainability and conservation exploiting structural and sequence variation and modifications.
Further information and publications are available from www.molcyt.org.
References
Dwivedi SL, Vetukuri RR, Kelbessa BG, Gepts P, Heslop-Harrison P, Araujo ASF, Sharma S, Ortiz R. 2025. Exploitation of rhizosphere microbiome biodiversity in plant breeding. Trends in Plant Science online-first. https://doi.org/10.1016/j.tplants.2025.04.004.
Dwivedi SL, Heslop-Harrison P, Amas J, Ortiz R, Edwards D. 2024. Epistasis and pleiotropy-induced variation for plant breeding. Plant Biotechnology Journal 22: 2788-2807. https://doi.org/10.1111/pbi.14405
Cui D, Xiong G, Ye L, Gornall R, Wang Z, Heslop-Harrison P, Liu Q. 2024. Genome-wide analysis of flavonoid biosynthetic genes in Musaceae (Ensete, Musella, and Musa species) reveals amplification of flavonoid 3’,5’-hydroxylase. AoB Plants 16(5): plae049.
Liu Q, Ye L, Li M, Wang Z, Xiong G, Ye Y, Tu T, Schwarzacher T, Heslop-Harrison JS. 2023. Genome-wide expansion and reorganization during grass evolution: from 30 Mb chromosomes in rice and Brachypodium to 550 Mb in Avena. BMC Plant Biology 23:627.
Liu Q, Cui D, Rouard M, Heslop-Harrison JS, Schwarzacher T, Wang Z. 2025. Haplotype-resolved T2T genome assemblies of Musella lasiocarpa characterize the mechanisms of chromosomal and genome evolution in Musaceae, and provide genetic insights into cold adaptation. Submitted
Masters LE, Tomaszewska P, Hackel J, Zuntini AR, Schwarzacher T, Heslop-Harrison JS, Vorontsova MS. 2024. Phylogenomic analysis reveals five independently evolved African forage grass clades in the genus Urochloa. Annals of Botany. 2024 Feb 14:mcae022. https://doi.org/10.1093/aob/mcae022
Rathore P, Schwarzacher T, Heslop-Harrison JP, Bhat V, Tomaszewska P. 2022. The repetitive DNA sequence landscape and DNA methylation in chromosomes of an apomictic tropical forage grass, Cenchrus ciliaris. Frontiers in Plant Science 13: 952968. https://doi.org/10.3389/fpls.2022.952968.
Ran Li, Mian Gong, Xinmiao Zhang, Fei Wang, Zhenyu Liu, Lei Zhang, Mengsi Xu, Yunfeng Zhang, Xuelei Dai, Zhuangbiao Zhang, Wenwen Fang, Yuta Yang, Huanhuan Zhang, Weiwei Fu, Chunna Cao, Peng Yang, Zeinab Amiri Ghanatsaman, Niloufar Jafarpour Negari, Hojjat Asadollahpour Nanaei, Xiangpeng Yue, Yuxuan Song, Xianyong Lan, Weidong Deng, Xihong Wang, Ruidong Xiang, Eveline M. Ibeagha-Awemu, Pat (J.S.) Heslop-Harrison, Johannes A. Lenstra, Shangquan Gan, Yu Jiang. 2023. A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes. Genome Research 2023 Mar 1;33(3):463-77. https://genome.cshlp.org/content/33/3/463.short
Anhalt UC, Heslop-Harrison JS, Piepho HP, Byrne S, Barth S. Quantitative trait loci mapping for biomass yield traits in a Lolium inbred line derived F2 population. Euphytica. 2009 Nov;170(1):99-107.
Liu Q, Li X, Zhou X, Li M, Zhang F, Schwarzacher T, Heslop-Harrison JS. 2019. The repetitive DNA landscape in Avena (Poaceae): chromosome and genome evolution defined by major repeat classes in whole-genome sequence reads. BMC Plant Biology Dec;19(1):226. pp 1-17 doi:10.1186/s12870-019-1769-z
AND HERE is the text for the poster on grass diversification
Cytogenomics of pasture grasses identifies rapidly evolving repetitive DNA to reveal features of chromosome evolution and genomes in polyploids for phylogenetics and exploitation of biodiversity
Negar Karimi¹, Lizo Masters²,³, Priyanka Rathore⁴, Hojjatollah Saeidi¹, Maria S. Vorontsova², Paulina Tomaszewska⁵,³, Qing Liu⁶, Pat (J.S.) Heslop-Harrison³,⁶, Trude Schwarzacher³,⁶
¹University of Isfahan, Department of Plant and Animal Biology, Isfahan, Iran, ²Royal Botanic Gardens, Accelerated Taxonomy, Kew, UK, ³University of Leicester, Department of Genetics and Genome Biology, Leicester, UK, ⁴University of Delhi, Daulat Ram College, Delhi, India, ⁵University of Wroclaw, Department of Genetics and Cell Physiology, Wroclaw, Poland, ⁶South China National Botanic Garden, Guangzhou, China
Introduction
Grasslands represent the majority of the world’s farmland, often growing in biodiverse areas with a poor environment and low inputs. We overview phylogenomic and cytogenomics findings from our work in the broadly-defined genera Elymus (Triticeae), Urochloa, and Cenchrus. Raw reads from high-throughput survey sequencing were used for cluster and k-mer analysis with RepeatExplorer (Novak et al. 2010) to identify the major repetitive components of genomes including retroelements and satellite sequences. Fluorescent in situ hybridization (FISH) was used to determine the chromosomal distribution of selected repeats. We also looked at epigenetic effects and genome interactions, including DNA methylation.
Urochloa
The African Urochloa (syn. Brachiaria) grasses include the most important forage crops across the global tropics. Polyploids, hybrids and apmicts are frequentwith unclear species boundaries. Phylogenomic identified the U. eminii as a close relative to the cultivated U. ruziziensis and U. decumbens. Chromosome identification and inheritance using repetitive DNA sequences showed diploids to be highly homologous, and polyploids to have various genome compositions and introgressions. We found a repeat from U. brizantha at the centromeres of U. eminii chromosomes, but near the telomeres of a subset of tetraploid U. brizantha and U. decumbens chromosomes. The latter is an allotetraploid hybrid formed between the ancestors of diploid U. brizantha and a combination of other diploid genomes.
[Image of U. eminii and U. brizantha chromosomes]
Hybrid network and morphological analysis confirm the cytogenomics.
[Phylogenetic tree]
U. eminii (2x=18) U. brizantha (4x=36)
U. decumbens (4x=36)
Elymus
Elymus s.l., wheatgrass, belongs to the tribe Triticeae (including wheat and barley). Hybridization and polyploidization has resulted in complex genome compositions, with reticulate evolution among members and both monophyletic and polyphyletic taxa. Thus, the taxonomy of the Elymus has frequently changed and included Thinopyrum, Kengyilia and Pseudoregneria and other genera. All genomes have a base number of seven and St, H, P, W, E, J, B, and Y have been identified, with the St genome the only one found in all Elymus taxa.
We found one St-specific sequence, but others were also present in the H and Y genomes, but with variable chromosomal sites. ESat4 was at centromeres in the H genome of barley and Elymus, but sub-telomeric in the Y genome. ESat1 was sub-telomeric in half of the St genome chromosomes and lacking in the Elymus H genome, but strongly hybridized to centromeres in barley.
[Image of H. vulgare (H) chromosomes with FISH signals]
[Image of E. transhyrcanus (StStH) chromosomes with FISH signals]
[Image of E. ciliaris (StY) chromosomes with FISH signals]
We conclude that the H genome of barley and Elymus have independently evolved since the split of the two genera, and there are St sub-genomes, distinct to the Y genome.
Cenchrus
Cenchrus ciliaris is an allotetraploid widespred tropical pasture grass. With apomictic propagation, various aneuploid genotypes are found, and here, we analyzed a 2n = 4x + 3 =39 accession. Universal retrotransposon probes did not distinguish genomes of C. ciliaris showing signals in pericentromeric regions of all chromosomes, and most of the retroelement sites overlapped with 5-methylcytosine signals. Abundant repetitive DNA motifs, though, gave strong in situ hybridization signals on about half of the chromosomes indicating they differentiate the two ancestral genomes.
[Image of C. ciliaris chromosomes with FISH signals for different repeats and methylation]
References
- Heslop-Harrison JS, Schwarzacher T, Liu Q. 2023. Polyploidy: its consequences and enabling role in plant diversification and evolution. Annals of Botany 131:1-10.
- Masters LE, Tomaszewska P, Schwarzacher T, et al. 2024. Phylogenomic analysis reveals five independently evolved African forage grass clades in the genus Urochloa. Annals of Botany 133:725-742.
- Rathore P, Schwarzacher T, Heslop-Harrison JS et al. 2022. The repetitive DNA sequence landscape and DNA methylation in chromosomes of an apomictic tropical forage grass, Cenchrus ciliaris. Frontiers in Plant Science 15;13:952968.
- Saeidi H, Rahiminejad MR, Heslop-Harrison JS 2008 Retroelement insertional polymorphisms, diversity and phylogeography within diploid, D-genome Aegilops tauschii (Triticeae, Poaceae) sub-taxa in Iran. Annals of Botany 101:855-861.
- Tomaszewska P, Pellny TK, Hernández LM, et al. 2021. Flow cytometry-based determination of ploidy from dried leaf specimens in genomically complex collections of the tropical forage grass Urochloa sl. Genes 23:957.
- Tomaszewska P, Vorontsova MS, Renvoize SA, et al. 2023. Complex polyploid and hybrid species in an apomictic and sexual tropical forage grass group: genomic composition and evolution in Urochloa (Brachiaria) species. Annals of Botany 131:87-108.
Cytogenomic Data
Genome 1
Genome 2
Genome 3
[Diagram showing Repeat A and Repeat B distribution]
Conclusions
Rapidly evolving repetitive DNA families let us characterize relationships and patterns of evolution at species, chromosome and genome level. We found repeats that are genome specific, or show defined chromosomal distributions, either dispersed along chromosomes, or enriched near centromeres or telomeres. Our work showed phylogenomic and cytogenomic relations and the evolutionary divergence of genomes in hybrids and polyploids providing vital information for the selection of wild species to introduce into breeding programs.
