IBM has introduced a new cloud analytics application to allow researchers to speed up their searches of massive amounts of patents and scientific journals to find information on pharmaceutical chemicals.
Search algorithms in the IBM Strategic IP Insight Platform (SIPP), announced on Dec. 8, make extraction of drawings, figures and articles from scientific publications faster than humans laboriously sifting through pages.
IBM donated to the National Institutes of Health (NIH) a large amount of data the company curated using SIPP. Researchers at the NIH will use the information to discover new medication and research cures for cancer.
The data contains 12 million patents and 20 million Medline scientific abstracts. Medline is a National Library of Medicine database of biomedical and life sciences journal citations. Life sciences and consumer goods companies can also access the NIH data to research chemicals.
Universities such as Berkeley, Johns Hopkins and Stanford are interested in using the data donated to the NIH, Dr. Ying Chen, a research scientist at IBM and a developer of SIPP, told eWEEK.
IBM pulled the data from millions of patents and scientific literature published from 1976 to 2000. Scientists can search the data at the National Center for Biotechnology Information's PubChem site, a database that aggregates scientific data on chemical structures. PubChem allows scientists to research chemicals for new drugs, cancer treatments and consumer products.
Researchers also use PubChem to see which pharmaceutical companies may have registered patents for certain drugs, according to Chen.
Scientists traditionally had to search for the chemical names in paper journals, and now IBM's cloud-based platform will help them curate the data on molecules and chemicals within 24 hours of publication. In the database, the chemical names map out to synonyms for the chemicals.
"We've invented a machine-curation technology that would automatically read patents and scientific literature and extract chemical names," Chen said.
For the NIH project, IBM took pharmaceutical data from AstraZeneca, Bristol-Myers Squibb, DuPont and Pfizer.
"They really contributed their domain expertise around the chemistry, biology and drug research to develop the technologies around chemical names extraction and curation," Chen said. "It's their contributions that made it possible for us to be able to extract chemicals and make the information available to the public."
IBM extracted the data from 2.4 million chemical compounds, 4.7 million patents and 11 million biomedical journals.
The SIPP software runs on IBM's software as a service (SAAS) SmartCloud platform. SIPP is able to quicken automated image analysis and enhanced optical recognition of chemical images and symbols taken from patents and literature. Researchers can access this information in real time using analytics and natural language, or speech, processing.
SIPP will allow the NIH as well as other organizations to build similar databases in the cloud, according to Chen.
"We have set up SIPP in a very scalable cloud model that allows us to grow our underlying data content on an ongoing basis," Chen said. "Today we're mining abstracts of content, and tomorrow we can analyze full articles," Chen said.
This article was originally published on 12-12-2011