Bringing Home The Bacon

Friday, 19 June 2020

Blockchain, crowdsourcing and compounds for open science

There are many vendors of various compound sets for physical screening, for example approved drugs for repositioning studies, GPCR ligand sets, etc. Getting these to be comprehensive and well annotated and current (with for example, the correct chemical structures, prodrugs annotated, controlled substances, etc.) can be a frustrating challenge. This community need (small companies, institutes and individual labs) in turn has led to many national and institutional efforts to assemble and curate such physical compound sets. However, these institute or infrastructure sets are typically only available in collaborative relationships, under MTAs, etc as opposed to a product you can purchase or use freely with other third parties.

Here’s an idea for a novel (I think) way of assembling useful, focussed compound sets in a crowdsourced setting, using tenuous analogies to blockchain. The method also gets over the well intentioned coordinated consortia sort of approach, which in practice runs at the speed of the slowest participant, often suffer from cost over-runs, and typically occupy a large amount of management and communication overhead. In some settings this sort of approach can work very well.

I’m not a blockchain expert (or an expert in anything really), so the analogy to blockchain is weak, so don’t freak out about the blockchain part.

Imagine you want access to a reasonably sized set of compounds with a common theme - approved drugs in Japan that aren’t approved in Europe, oncology pipeline candidates, irreversible enzyme inhibitors, chemical probes for the epigenetic modulators, etc. it doesn’t matter what the theme is for now, and for illustration purposes, imagine that there’s 300 compounds that you want - ‘block’ of compounds. It could be already available, maybe. Purchasing these is a big pain, unless you’re really set up to do this, handling ordering from multiple vendors, invoicing, accounts, sample handling, QC, plating the compounds, etc. Some of them you probably have already in your collection, or you already synthesised some because it wasn’t available commercially or at low enough cost.

There’s likely to be a number of other organisations and companies with a similar need for ‘oncology clinical candidates’. Sometimes the ‘mining’ involves assembling the interesting list in the first place, other times the mining is in procuring or synthesising the compounds of interest.

The basic idea is simple. I hope.

A definition of a block of compounds is published - the list of ‘wanted compounds’. The aim then is for the community to complete this block and publish it (make it available) for the lowest cost and with least effort, and in the fastest way. The contents of the block, in terms of chemical structures, is freely published. Miners then work to complete this block with providing physical samples to a central registry. Once a miner has added a compound to the physical collection, the specific compound is claimed, it’s claim is published on the public compound ledger, and further mining needs to be against missing compounds in order to complete the block.

Speed clearly is of the essence, the cheapest compounds, or those already in internal collections provide an advantage to those committing early with compounds already in hand, for no cash outlay. It becomes progressively more difficult to complete the block. You could imagine coordination amongst miners to allocate the compounds, but then this suffers from the issues above, and is easy to game. So open competition amongst the miners is probably optimal. Once the block is complete, it’s plated out and divided amongst the participants that contributed materially to the block. With freedom to use the compounds for whatever purposes they see fit, potentially even place their portion into the public domain or use it to build collaborations on (however, it’s a finite resource, so won’t last for ever).

So for this hypothetical 300 compound block, imagine that you define a threshold number of compounds, 15 for the sake of argument, and that in order to claim shared ownership of the entire block, you need to submit 15 compounds. Any submitted in excess of this 15 would seed a new replicate block. So for this case, 20 organisations would receive a share of the entire set. First past the post with a compound received by the central registry claims the compound and removes if from the available set.

Any excess, or duplicates from different miners, would not be wasted but would seed new blocks, that in turn would be complete and released in due course.

There is a role for a central registry (aka a coordinating organisation, an ‘honest broker’, etc) to publish the completed portions of the block, handle and distribute compounds, etc. That organisation would most likely have some skin in the game themselves and have an interest in the compounds, and would therefor naturally take a proportionate cut of the compound set for its own purposes. But there would be no restrictions on other registries running the same scheme themselves, but if everyone becomes a registry the ecosystem will die. A registry would also need the compound handling, weighing plating, etc infrastructure in place to be credible. It would all be open though, so the compound block could be forked, or someone decides on a different block design, and the same process run independently by others, or made into a commercial set from a classical compound vendor - this would be a good thing in fact.

There, that’s it. Simples.

Would it work, and what sort of thematic blocks would people find valuable?

Jpo

Friday, 29 June 2018

We're Hiring

We are recruiting again, expanding the Informatics Lab here at the Medicines Discovery Catapult with eight new positions, some data scientist roles, software developers and a few specialist roles. I'll post more later about the overall set of new staff, and some of the projects that the staff will be working on.

But for now, check out the roles at md.catapult.org.uk/about/careers

Wednesday, 20 June 2018

Why such an odd name for the blog?

I got a message asking why the blog was called "Bringing Home The Bacon". Well it's a really nerdy pun, which can be read as "Bringing Home Thebacon". Thebacon is the INN name of an opiate drug, and the idiom Bringing home the bacon means providing for the family. So it captures one of the very serious elements that I want to cover here, the link between drug discovery and the economic benefits of our work in this sector, albeit expressed in a pun on a class 1 regulated substance.

The pun was shamelessly appropriated from the excellent xkcd cartoon pictured above.

Monday, 18 June 2018

In the Olden Days, I used to commute to Cambridge from south-west London, and to while away the long, long hours I wrote a blog about drugs and data. I used to love doing it. It was a good way to organise thoughts about why discovering drugs was so, so hard, and why there is true wonder in the chemistry that Nature has invented - no not that Nature - but the real one, the important one. By writing those ramblings and toy analyses down on a blog, it forced me to think in a specific way about the type of data science I was interested in. Writing things down is always a good idea, it turns out. I learned that too late in life. But I loved writing about my Science - no not that one, but the real one, the important one.

In the Olden Days, I wrote about the sort of things that could never end up in papers - they were either wrong, were supported by insufficient data, were too early or too late, or simply idle speculation; but now and then, those blog posts would lead to something. The (few) very best blog posts led to something specific and unexpected, a link to a new collaborator, or helped refine an idea that grew into something more useful. Many also led to long-lasting friendships that I value to this day.

But that was the Olden Days, and these are the Modern Days, and the world is very different.

These Days I'm working at the Medicines Discovery Catapult, building an informatics team, that, I hope, will improve the way we, as a community, execute drug discovery. Many of the wiser people I used to meet argued that the time for data collection and citation collecting was coming to an end, I have now come round to that view too. We need translation of that data, and that historical public and private investment into new drugs. We need new ideas, new technologies, new companies, new jobs that last and allow one to build a career. Not like it was in the olden days.

So I've decided to start writing about drug discovery informatics again. If you like anything, get in touch.

jpo