ESM3 and Bridge RNA - What do they mean for synthetic biologists?
Two new techniques dropped this week and both immediately garnered huge attention. But are they as promising as they seem?
This week we’ve seen two major releases in synthetic biology: CRISPR-less gene editing without any DNA breakages, and ESM3; a generative language model for proteins. One state-of-the-art technique for use in the wet lab, the other for the dry lab. And yet, despite occupying opposing ends of this wet/dry lab spectrum, both converge on being huge for all synthetic biologists.
When taken in the wider context of the bioeconomy, I believe this could be the best time in history for these two techniques to have dropped. But espousing grand claims like that is easy (and dare I say extremely common…) whenever AI is involved.
So let me ease you into why I think these might actually be the real deal for once.
When taken in the wider context […] I believe this could be the best time in history for these two techniques to have dropped.
Bridge RNA: better than smashing LEGOs together?
Now if you’ve been keeping your eyes on the news or on Twitter lately, you would’ve seen the same headlines as I did about the new bridge RNA system. But the one that stuck out to me was this article from Nature:
No CRISPR: oddball ‘jumping gene’ enzyme edits genomes without breaking DNA.
And I immediately thought… Without breaking DNA?
Conceptually, I didn’t really understand how that could be possible. Immediately I started thinking about how that could even work, given that all other DNA manipulation techniques in existence pretty much mandate that you break DNA and stick various ends together. A bit like if you had two fully-built LEGO sets and wanted to swap the halves across each other. How could you possibly swap those two bits of LEGO from one set to another without breaking them apart first?
So I dug into the paper, and it turns out I was right - you can’t. Except, well, you sort of can. It’s a little complicated, and requires I get a little bit technical. But essentially:
The bridge RNA (bRNA) system works by using a bispecific RNA molecule to guide a recombinase enzyme, IS621, in joining two separate DNA sequences. This bRNA has two distinct loops: a target-binding loop (TBL) and a donor-binding loop (DBL).
Each loop contains guide sequences that are able to form base-pair bonds with specific regions of the target DNA and donor DNA, respectively.
The IS621 recombinase, guided by the bRNA, brings these two DNA molecules together.
It then cleaves one strand of each DNA molecule while forming a covalent bond with the cleaved ends. This creates a 5'-phosphoserine intermediate where the DNA remains attached to the enzyme.
The cleaved strands are then exchanged and stuck together, forming a Holliday junction intermediate. Finally, the other strands are cleaved and exchanged, completing the recombination.
Extending the slightly tired LEGO analogy, this would be like if you broke apart the two sets and then immediately swapped some kind of flexible adaptor onto the exposed ends of the LEGO, thereby preventing any of the bobbly ends from ever being exposed to the air.
And in the likely case that isn’t very clear, essentially this means that throughout the process, the DNA ends remain covalently attached to something at all times -preventing exposed sticky ends and potentially reducing the risk of unintended modifications. Bridge RNAs thus allow for programmable, site-specific DNA recombination without ever fully "breaking" the DNA in the conventional sense. CRISPR, on the other hand, involves cutting the backbone of the DNA to leave exposed ends that float about in solution until they bump into the donor sequence.
What’s of particular interest to me as a science communicator, is that since it sidesteps any broken DNA, it has the potential to shrug off a lot of the same criticisms as CRISPR. Essentially it’s a more gentle alternative that still remains all of the same power as CRISPR. A modern tool for the modern age of gene editing.
It has the potential to shrug off a lot of the same criticisms as CRISPR […] A modern tool for the modern age of gene editing.
Or, put it this way - this could be a really compelling and much safer sounding alternative to CRISPR when communicating with lawmakers and the public about the future of genetic engineering. Lawmakers in the west have long been enthusiastic about CRISPR yet remained cautious when drafting legislation due to the perceived risk - this safer-seeming alternative could then be huge for future genomic engineering and bio-entrepreneurship.
ESM3 - AI with an actual use!
Time to flip then to the other end of the wet/dry lab spectrum. ESM3. This, too, has been making headlines this week.
Now, don’t get me wrong. I love AI and LLMs. I’m on r/LocalLlama (a subreddit for LLM enthusiasts) literally daily. But I think it’s that very same daily exposure that means I’ve seen even more AI start-ups than most which are just nothingburgers with huge marketing budgets. And so jaded as I was, I had sort of subconsciously chalked ESM3 up as just another AI hypetrain.
Separate point - I’ve got nothing against business-to-business solutions, but I assumed that even if ESM3 was as good as it touted, it’d end up being something similar to what we saw with Chroma from the GenerateBiomedicines lab. That is to say, a super powerful model for protein generation that garners a lot of public attention, but not something that ever gets put in the hands of the general public.
In the end then, I think what particularly caught my eye about ESM3 was when even one of my lecturers back from my university days took notice.
(Tweet from Tom Ellis, Professor of Synthetic Genome Engineering at Imperial College London)
Tom is quite outspoken on Twitter, and I generally consider him quite a reserved individual when it comes to getting swept up in hyped-up new technologies. Seeing then that even he was speculating about its impact was what tipped the scale from scepticism to curious for me.
To give a very brief bit of background; BioLMs have existed for some time. We covered this ourselves when we published about the topic a few months back. Essentially, unlike diffusion models like Chroma, BioLMs are language models - much more similar to OpenAI’s GPT-4 or Meta’s Llama-3. The difference being they’re trained on billions and billions of tokens of sequence data from public databases, rather than works of text and prose.
ESM3 is a bioLM like many others - but what stands out is that with 98 billion parameters, it's one of the largest models of its kind. Making it uniquely capable of capturing complex patterns within protein sequences across evolutionary scales.
From the abstract of their paper:
We have prompted ESM3 to generate fluorescent proteins with a chain of thought. Among the generations that we synthesized, we found a bright fluorescent protein at far distance (58% identity) from known fluorescent proteins. Similarly distant natural fluorescent proteins are separated by over five hundred million years of evolution.
Leading to their marketing hook that they’re “Simulating 500 million years of evolution with a language model”. And exaggeration or not, the fact that they were able to create a new fluorescent protein is a huge achievement; especially given that if their Github page is any indication, that they prompted this LLM in the same way that you can prompt ChatGPT to write you an article. Granted, a fluorescent protein is maybe a little mundane. But generating a protein by describing it using natural language is what makes this a landmark achievement.
I had chalked ESM3 up as just another AI hypetrain […] But generating a protein by describing it using natural language is what makes this a landmark achievement.
It’s also worthwhile remembering that at this moment in time, this is the worst that this AI technology will ever be. Over the coming months and years as more models come out, the ability of bioLMs to reason and understand genetic sequences and ultimately generate new ones is only going to increase. For normal LLMs, this isn’t necessarily that exciting - humans have always been able to write stories. LLMs getting better at that just means they’re approaching human-level.
But for bioengineering, this is huge. ‘Writing’ a protein from scratch in the language of amino acids is nearly impossible for us. ESM3 has already demonstrated that it can reason over the various patterns in protein sequences in ways we never could.
The impact on bio-entrepreneurship
So why do I think now was the best time in history for these techniques to come out? Well, I do have to admit that I was being at least a little hyperbolic. But I genuinely do believe this is indicative of a wave of changes hitting the bioeconomy, which will start to manifest in entrepreneurship very soon. As tools like ESM3 become better, faster, and more accessible, this may catalyse a shift in how biotech research and development is conducted.
The combination of more accessible AI-driven protein design, potentially safer genetic engineering techniques like bridge RNA, and taken in the context of recent advancements in lab automation - such as OpenTrons' GPT-4 powered interface for their robots - could lower the barriers to entry for new biotech ventures.
And since the biotech industry is experiencing unprecedented numbers of layoffs, there’s an ever-growing pool of experienced talent looking for new opportunities. This confluence of factors - improved tools, lower costs, available talent, and the potential for more favourable legislation - this may just create a hyper-fertile ground for future biotech startups. The only thing, then, would be for researchers whose work is stuck in the lab to receive translational support to get those ideas into the real world… which is exactly where ValleyDAO aims to support.