Printer friendly version
Data scientists the Alexander Flemings of the future
26 June 2012
The process by which drugs, treatments and cures are developed could soon be radically improved by people with absolutely no biological, chemical or pharmacological knowledge whatsoever. That’s the conclusion of a groundbreaking data science experiment. This showed that crowd-sourcing made it possible to radically improve the predictive power of models used to predict the pharmaceutically relevant biological responses of a compound based solely on the knowledge of calculated molecular properties- for example size, shape, or elemental constitution.
Carried out as a partnership between the pharmaceutical company Boehringer Ingelheim, in Ridgefield, CT, and the revolutionary data science company Kaggle, in San Francisco, CA, the experiment took the form of a crowd-sourcing competition. Hundreds of data scientists from all over the world competed against each other for a $20,000 prize, which was awarded to the teams that developed the three best predictive models.
In all, more than 800 data scientists submitted nearly 9,000 entries to the competition, which ran for 91 days. Kaggle’s unique approach to data science uses a real-time leaderboard to encourage competitors to repeatedly refine their solutions in order to leapfrog their rivals. In this competition competitors were given 1,776 different variables each representing a molecular descriptor pertaining to a characteristic of the molecule, such as its size, shape or chemical composition. They were also given experimental data relating to an actual biological response. Combined these were used as training data by the contestants in order to develop and test their models.
In principal, in silico modeling is able to make the drug discovery process more efficient and effective by narrowing down the extraordinarily large search space of potential candidate molecules. By using models to predict which compounds hold the greatest promise it becomes possible to focus experimental efforts on those compounds while avoiding having to test those, which for a variety of complex
reasons, may have ultimately proved not to work.
The results of the competition revealed the best models to be 25.6% more effective than those currently used by the industry. That is despite the fact that the Kaggle competitors had no expert knowledge of the industry, and even if they had the data they worked with had been stripped of anything to indicate
what kinds of chemical descriptors each of the values represented.
“This experiment just goes to show that expertise in a field only gets you so far. Sometimes the answers are buried in the data,” said Jeremy Howard, Kaggle’s Chief Scientist and President. “But this is just the beginning. Given how successful this experiment has been we anticipate this approach to become the norm for the pharmaceutical industry.”
“Because the chemical search space is so vast, even a small improvement in the predictive power of biological models will be of significant value in filtering this search space,” said Dr. James Baxter, Vice
President of Development at Boehringer Ingelheim. “Congratulations to the winners and I thank them for their help with our quest to build more accurate modelling.”
The winning solution came from team of three people consisting of Sergey Yurgenson, Jeremy Achin and Tom DeGodoy, who will all share a $10,000 prize. Second place was named Mitrofan who will receive $6,000, and the third place prize of $4,000 will be awarded to Wang Qing.
Kaggle is the global leader in running predictive modeling competitions. The company has run approximately 100 competitions with major enterprise, government, and academic customers,
including Allstate Insurance, Dunnhumby, Ford, Heritage Health, Microsoft, NASA, Stanford, Wikipedia and Deloitte, and it is currently running the $3 million Heritage Health Prize, the largest medical prize ever, to help prevent unnecessary hospitalization. More than 39,000 data scientists worldwide have contributed more than 122,000 entries to competitions that have tackled the toughest predictive problems in the marketing, life sciences, insurance, financial services, travel, and science industries. Kaggle’s investors include Index Ventures and Khosla Ventures. It was founded in 2010 and is based in San Francisco, Calif.
About Boehringer Ingelheim
The Boehringer Ingelheim group is one of the world’s 20 leading pharmaceutical companies. Headquartered in Ingelheim, Germany, it operates globally with 145 affiliates and more than 42,000
employees. Since it was founded in 1885, the family-owned company has been committed to researching, developing, manufacturing and marketing novel products of high therapeutic value for human and veterinary medicine.
As a central element of its culture, Boehringer Ingelheim pledges to act socially responsible. Involvement in social projects, caring for employees and their families, and providing equal opportunities for all employees form the foundation of the global operations. Mutual cooperation and respect, as well as environmental protection and sustainability are intrinsic factors in all of Boehringer Ingelheim’s endeavours.
In 2010, Boehringer Ingelheim posted net sales of approximately $16.7 billion (about 12.6 billion euro) while spending almost 24% of net sales in its largest business segment, Prescription Medicines, on research and development.