Matter Lab launches vast library of virtual organic chemical compounds for catalysis

April 29, 2021 by Dan Haves

An open-access tool for chemists that promises to save time and money in the discovery of chemical reactions has been launched this week by the research group of Professor Alán Aspuru-Guzik from the Departments of Chemistry and Computer Science.

Kraken – created in a collaboration between the Aspuru-Guzik's Matter Lab and the Sigman group at the University of Utah, IBM Research and AstraZeneca – is a library of virtual, machine-learning calculated organic compounds, roughly 300 thousand of them, with 190 descriptors each. 

“The world has no time for science as usual." says Professor Aspuru-Guzik, “Neither for science done in a silo. This is a collaborative effort to accelerate catalysis science that involves a very exciting team from academia and industry.” 

“It takes a long time, a lot of money, and a whole lot of human resources to discover, develop and understand new catalysts and chemical reactions.” says co-lead author and Banting Fellow Dr. Gabriel dos Passos Gomes. “These are some of the tools that allow molecular scientists to precisely develop materials and drugs, from the plastics in your smartphone to the probes that allowed for humanity to achieve the COVID-19 vaccines at an unforeseen pace. This work shows how machine learning can change the field.”

When developing a transition-metal catalyzed chemical reaction, a chemist must find a suitable combination of metal and ligand. Despite the innovations in computer-optimized ligand design led by the Sigman group, ligands would typically be identified by trial and error in the lab. With kraken, chemists will eventually have a vast data-rich collection at their fingertips, reducing the number of trials necessary to achieve optimal results. 

The kraken library features organophosphorus ligands, what Dr. Tobias Gensch – one of the co-lead authors of this work – recalls as “some of the most prevalent ligands in homogeneous catalysis.” 

“We worked extremely hard to make this not only open and available to the community, but as convenient and easy to use as we possibly could,” says Dr. Gomes, who worked with CS graduate student Theophile Gaudin in the development of the web application. “With that in mind, we created a web app where users can search for ligands and their properties in a straightforward manner.”

The team also notes that while 330,000 compounds will be available at launch, a bigger-scale library of over 190 million ligands will be made available in the future. In comparison, similar libraries have been limited to compounds in the hundreds with far fewer properties.

“This is very exciting as it shows the potential of AI for scientific research. In this context, the University of Toronto has launched a global initiative called the Acceleration Consortium which hopes to bring academia, government, and industry together to tackle AI-driven materials discovery. It is exciting to have Professor Matthew Sigman on board with the consortium and seeing results of this collaborative work come to fruition." says Aspuru-Guzik. 

Starting in January 2022, Dr. Gomes will start as an Assistant Professor in the Departments of Chemistry and Chemical Engineering at Carnegie Mellon University, where he aims to pioneer research on the design of catalysts and reaction discovery.

Kraken can be freely accessed at The preprint describing how the dataset was elaborated and how the tool can be used for reaction optimization can be accessed at ChemRxiv.