The evolution of ML in early-stage drug development
Early Promises
The maturation of machine learning tools, coupled with large volumes of data in the life sciences, has infused new resources into the early-stage drug development industry and attracted a diverse range of stakeholders. Recently, I have been reviewing the landscape of early-stage companies operating in this space. My goal has been to trace the trajectory of applied ML in drug development to understand where it has been and where it is going.
As is the case with any industry undergoing technological disruption, initial assumptions regarding the impact of ML on drug development were broad and speculative. This pioneering phase was driven by an underlying hope that ML could revolutionize drug discovery—a distinct subset of drug development. In early stages, ML promised to transform the scientific discovery process, accelerating our understanding of disease biology, leading to novel druggable targets. The fruits of discovery research are first-in-class therapeutics, the primary assets of biotech and pharma.
A Shift from Discovery to Design
Over the past few years I believe there has been a shift from an emphasis on ML-driven discovery to ML-driven design. Initially, ML was seen as a potential disruptor for scientific discovery, promising to unveil new mechanisms and insights into disease biology. However, the field has undergone a recalibration of expectations, with ML now finding a narrower, albeit more applicable role in design-oriented processes. This represents a transition from a 'first-in-class' to a 'best-in-class' approach, where ML contributes to accelerating design iterations for established targets and known biology.
Of course, most companies lie somewhere on the discovery / design spectrum. I have personally found this framework helpful for simplifying the proposed upside and risk profile of a particular venture.
Changing Market Dynamics
In private markets, teams can no longer survive on the potential of a platform. First of all, everyone has a platform. Second, capital is now flowing towards more de-risked applications of ML, where there are clear optimizations and rapid feedback cycles. I roughly see this as an application of the technology to more of an engineering problem, rather than a scientific problem.
These changes reflect broader trends in technology advancements, investor sentiment, and market dynamics (including the hype surrounding generative AI). An additional variable is that two distinct investor groups are active in this space: traditional biotech investors, who are intrigued by how new technology will change current processes, and tech-bio investors, who are attracted to the potential of unlocking growing datasets through computation. Crucially, the different perspectives of these investor groups can significantly influence the capital-raising strategies, and therefore directions, of early-stage teams.
Platform vs. Pipeline
Investment strategies have evolved as evidence from earlier years accumulates, driven by the need for more tangible outcomes and a clear path to marketable therapeutic assets. The focus has shifted from research acceleration to pipeline advancement, with resources reallocated accordingly. This pivot has created a dilemma for many early-stage ventures, as they must balance their original vision of ML for platform potential with the necessity to prioritize pipeline programs. This is the classic 'platform versus pipeline' tradeoff.
So, resources are allocated to specific programs rather than platform maturation. This approach is a high-risk gamble for resource-constrained teams (compared to big pharma or biotech), as time and money are rapidly consumed in pursuit of uncertain outcomes.
Current Applications
Currently, investment trends favor ML-driven ventures that quickly demonstrate potential therapeutic assets. Structure-based generative models are particularly aligned with this requirement. However, questions remain about the iterative design cycle's duration and efficiency compared to traditional methods (for example, evolutionary antibody refinement and rational medicinal chemistry). Unsurprisingly, ML is not the all-encompassing solution it was once envisioned to be, but rather a powerful tool for a few specific phases in the drug development process. Currently, I think the most impacted steps include hit to lead and lead optimization. It's important to note that these phases are just a few aspects of drug development, and importantly, are not the largest resource bottlenecks. I’m thinking about this more right now and I hope to follow up.
Applications of ML on the discovery side still have huge potential, but they are still in their early stages. One of the motivations for this review was the unexpected engagement with a my tweet about the challenges of using ML for target identification. Without a very precise biological niche or disease focus (i.e., strong priors), steering discovery efforts with ML is risky.
Looking Ahead
The prevailing focus on structure-based generative models to expedite therapeutic asset development stems from both a viable technological niche and external financing demands. A critical question is whether ML really offers significant efficiency advantages over conventional approaches in the design phase. If so, such a shift could even pave the way for the emergence of a novel segment of drug design firms. These firms would be characterized by unique partnering strategies and deal-making approaches, veering away from the traditional overreliance on a few development programs and their conversion into assets.
Looking forward, it's likely that ventures with a focus on specific biology, and complementary insights / assays, will initially benefit the most from ML tooling. Over a longer horizon, I expect ML's role in discovery research will become prominent, although it is unclear how much private capital is willing to make this bet in the short term. At the moment, academia appears poised to lead the charge.