A new study to be presented at the American Association for the Advancement of Science (AAAS) in February 2020 will report on the generation of the world’s first artificially created bacterial genome using a digital design algorithm along with the synthesis of DNA building blocks on a large scale. This genome takes form by chemical rather than template-based synthesis. The research is published in the journal Proceedings of the National Academy of Sciences.
Beat Christen, Assistant Professor of Experimental Systems Biology and Dr. Matthias Christen in the Christen Lab at ETH Zurich. Image Credit: ETH Zurich / Agnieszka Wormus
Synthesizing genomes chemically
In the field of synthetic biology, synthetic genomics has taken a front seat recently. However, the genome of the poliovirus and the bacteriophage phiX174 virus were synthesized chemically, without a complementary DNA or RNA template, at the start of this century.
The digital revolution has affected biology at its most fundamental level, that is, DNA sequencing, which helps synthetic biologists to read and to write, and even to rewrite, DNA molecules to design useful microorganisms with the help of computer design.
The successful creation of viral genomes of moderate size has led to the synthesis of the genomes of more complex organisms, such as from some mycoplasma species. From a few kilobases, genomic synthesis jumped to megabases,, which made it much easier to produce better strategies for the chemical synthesis of DNA as well as to transport the genome. On the other hand, a single mutation not only spelled nonsense but prevented the initiation of DNA transcription.
Building a lean mean genome version
The team behind the earlier mycoplasma genome synthesis then built a minimized version of one of these DNA molecules, including only essential genes. Moreover, an international group of 21 institutions is now synthesizing the 16-chromosome genome of the common yeast, Saccharomyces cerevisiae. It had successfully covered about 40% of the whole by 2018.
The designers focused chiefly on eliminating repeating sequences of nucleic acid-like genes for tRNA, introns (intervening noncoding sequences), and transposons so that they could replace homologous genes between species more accurately. They also incorporated new loxP sites where accurate and controlled insertion of genes at specific sites becomes possible. They aimed to reduce the size of the genome step by step once the yeast chromosomes were completely synthesized. This was necessary at that time since the powerful CRISPR gene editor was not yet available at that time.
Redundancy and rewriting codons
It is remarkable that the genetic code offers opportunities for engineering codons throughout the genome because of the redundancy, that is because multiple codons encode the same amino acid. This has been exploited to rewrite codons across a whole genome in viruses and a few bacteria and yeast. This experimentation brought to light the non-feasibility of rewriting the code using synonymous codons in some instances, especially when it was near the 5’ or 3’ ends of sequences that coded for proteins.
More recently, gene cassettes, or self-contained complete gene units, have been rewritten in tandem with the replacement of genes in the whole genome, via the synthesis of DNA from scratch. Yet currently, more work remains to be done, with many instances where successfully synthesized genes remain non-functional. It may be that some signals for transcription or translation are embedded at the ends of the exons, or that we don’t really know where a gene starts in some cases.
This was the motive for the current study, which attempts to rewrite the genome on a large scale, using synonymous codons, in combination with finding the causes of errors in the genome in a systematic manner. The need is for “a broadly applicable high-throughput error diagnosis approach,” whereby the rewriting of the whole genome can be evaluated.
The researchers tried to synthesize the minimized or stripped-down version of the genome of the freshwater bacterium Caulobacter crescentus -2.0 (C. eth-2.0), a very productive and responsive cell cycle model organism. This bacterium’s transcriptome, ribosomes, and many other measurements are already available as a detailed genome model.
The aim was to evolve a no-frills approach to design-test-build technique that will help produce a customized genome that incorporates the essential functions of the cell and little else. The basis of this approach is to build this genome chemically so that the information in the genome could be analyzed for what it tells us about essential genes.
The first step was to extract the DNA parts needed from the native bacterium and join them to form a digital genome, keeping the organization and orientation of genes intact. They also included some marker genes, self-replicating sequences, and a shuttle vector sequence that would allow tight low copy replication in this bacterium at the origin of replication.
They ran into a roadblock here when they found that many of the DNA building blocks were not commercially available because of constraints on synthesis. They then decided to test their theory that the use of synonymous codons would make it easier to synthesize the sequence chemically but still keep the biological functions unchanged. Accordingly, they created an optimized design based on the earlier one, with over 10,000 changes in the DNA bases, and successfully removed almost 5700 synthesis constraints.
They also wanted to test their theory that chemical synthesis would allow them to find out how accurately the genome is annotated and identify inbuilt functions. They, therefore, included over 100,000 base substitutions within the exons. Thus, they replaced about 56% of all codons by their synonyms.
In essence, the amino acid sequence remained the same. Still, all other layers of genetic regulation, including alternative reading frames and other hidden controls within the exons, were reduced to the bare minimum. In fact, from 77% to 95% of these elements were removed. This was to allow the identification of genes which need more than the amino acid sequence to be specified to work properly, by seeing which of the rewritten genes remain functional. The non-functional genes must then be repaired, to eventually yield a completely artificial cell with a full complement of essential functions.
They found that over 90% of enzyme-encoding genes retained functionality. Over 432 genes, overall, remained functional. About 100 genes became non-functional, probably due to wrong annotations or the presence of unknown genetic controls, or because they don’t code for protein at all. This approach is, therefore, useful to test the accuracy of the annotation of a genome.
Four of these genes were repaired in a targeted manner, leading to the identification of an essential element upstream of one of them. In some of them, there were noncoding controls within the exons, implying these genes had been wrongly annotated earlier.
The combination of these findings shows that while the scientists have not yet produced a living cell, they have tested and found the approach a promising one to produce designer genes. It promises a highly flexible design capability with low-cost, reliable chemical manufacture. The biggest challenge, as always, is not technical, but the ethical and social issues that can arise if it is misused.
Chemical synthesis rewriting of a bacterial genome to achieve design flexibility and biological functionality Jonathan E. Venetz, Luca Del Medico, Alexander Wölfle, Philipp Schächle, Yves Bucher, Donat Appert, Flavia Tschan, Carlos E. Flores-Tinoco, Mariëlle van Kooten, Rym Guennoun, Samuel Deutsch, Matthias Christen, Beat Christen Proceedings of the National Academy of Sciences Apr 2019, 116 (16) 8070-8079; DOI: 10.1073/pnas.1818259116