Google+ Badge

Total Pageviews

Introduction (I)

There is a short introduction below the posts (scroll to bottom)

Wednesday, 22 August 2012

NCATS repurposing compounds in PubChem: Part II

Updates: the unmapped code names have been appended to the end of this post for crowdsourcing, a subsequenct post has addded in the AZ/MRC compound list  and we now have a paper out on the combined results.

**************************************************

Interest in the NCATS set of 58 compounds for academic repurposing has recently been invigorated by two slide presentations at the Philadelphia ACS meeting, blog posts and some personal e-mail contacts.  It turns out I was neither alone in pointing out the problems associated with project tendering for  blinded clinical candidates nor in expending some effort in trying to map the names to structures  (the other groups are mentioned in the slides from Antony Williams and the blog from Sean Ekins). Suitably inspired, I managed to track down three more name > strucs, as shown in  the image hits below.


While OSRA did well on some previous images it only picked up a ring or two for these three new cases so I actually had to sketch them.  The "orphan" provenance of a Taiwanese chemical supplier for the JNJ39393406 structure is interesting and somewhat unusual (did they pick it from SciFinder perhaps?). I could find no corroboration for the SMILES output from the sketcher  (C1=CC(=CC2C1OC(O2)(F)F)NC3=N[N](C(=N3)C4=CC=NC=C4)CCC(N(C)C)=O) because this had no exact matches or high similarities in PubChem or SureChemOpen. However, the information supplied by Janssen to NCATS specifies the compound as a "positive allosteric modulator at the nicotinic α7 receptor" and the closest match in PubChem is CID 24850110 (below).
Not only does this look like a plausible analog of the vendor structure but it also has a SureChemOpen exact match to US-20090253691-A1 from Janssen, where the abstract quotes; "invention particularly relates to positive allosteric modulators of nicotinic acetylcholine receptors".  Low and behold browsing the PDF revealed the vendor structure as compound 33 on page 40 (below).
This is listed with a pEC50 of 6.2 as mid-potency withing the range covered in the large SAR table on page 74.  The interesting corollary here is that SureChemOpen has not yet completed their image extraction back-fill so this is likely to be dropped-in eventually (see patent mining section below).  So there we have it, ....possibly. If anyone from Janssen is prepared to corroborate the identity of JNJ39393406 I would be pleased to acknowledge this in an update. Note that it is arguably more important for them to do this if the vendor structure to-code name assignment is wrong, rather than right ! (see Live-chemical-structure-blogging). Below I have included my revised identification list (now with thee more structures than the previous post as brief provenance descriptions with PubChem CID links. 


JNJ-39393406  sketched from vendor entry, analog is CID 24850110 C1=CC(=CC2C1OC(O2)(F)F)NC3=N[N](C(=N3)C4=CC=NC=C4)CCC(N(C)C)=O
CP-945598  otenabant http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10052040
LY500307 (Erteberel?) http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10286159
AZD0530 saracatinib http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10302451
JNJ-18038683    Chemicalize supp.data from PMID:22570363 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11151899
PF-04136309  =  PF-4136309 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11192346
AZD1981   (TTD miss-map to  CID 5311037 ) OSRA via PMID: 21944852 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11292191
CE-326597 OSRA via PMID: 21493064 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11541667
AVE8134   PMID: 22212431 > OSRA  http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11625114
CE-210666  OPSIN  http://issx.confex.com/issx/15na/webprogram/Paper11299.html http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11697831
PH-670187 dermaciclane, EGIS-3886 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=119590
SB223412 talnetant http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=133090
SAR115740 , sketched from image in PMID: 19063991 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=15984196
AZD1656  OSRA  from http://www.citeulike.org/user/cdsouthan/article/10861475 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=16039797
PF-03654746  chemicalized from  PMID: 21928839  http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=16119086
ABT-089 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=178052
HMR1766, ataciguat http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=213037
PF-05416266, senicapoc , ICA-17043) http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=216327
AZD7325  Chemcalize from PMID: 22122233 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=23581869
GSK1004723  (TTD-miss-map to famitodene CID 5702160 ?) http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=24803482
PF-04191834  OPSIN from PMID: 20378715  http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=24986635
GSK835726    (Mesh + TTD, TFA salt is CID 16219413 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5311268
PF-00913086 prinaberel  ERB-041 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5326893
PF-03463275  = PF3463275  PMID: 20186106 > MeSH > OPSIN http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=56657376
AVE5530 canosimibe http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=56841608
CP-448187 elzasonan  (Cl salt is CID 6506051) http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6914152
AZD0328 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9794392
GW274150 http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9797017
SSR149744C  celivarone http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9807128
AZD3355 lesogaberan http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9833984
BMS-562086 pexacerfont http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9884366
ZD4054  zibotentan http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9910224
AZD2171 cediranib   http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9933475

To consolidate this update I searched each of the CIDs against SureChemOpen. This was done with the canonical SMILES string, starting with exact matches but, if these were negative, backing-off to a similarity search.  The links presented below are generally the oldest and presumed first publication. Note that for the three older compounds with INNs they have been named as prior art and mixtures in 100s of patents. It should be possible to use date cutting in SureChemOpen to find the earliest filings as IUPACs or image-extracted structures (but I can't be bothered just now).  The other thing I have not done is check each publication to see if the presumed assignee, target, SAR data etc, tally with those in the NCATS PDFs, but the ones I glanced at seemed to fit  (anyone interested in details can contact me). Note that 30 out of 33 patent whacks is not bad going and indicates, at least for this set, most structures have been exemplified and sucessfully extracted rather than being specified only in a Markush nest. The results are listed below. 


           CID 
Patent match
10052040
https://open.surechem.com/en/document/EP-1890767-A2/
10286159
https://open.surechem.com/en/document/EP-1626974-B1/
10302451
484 patent hits 
11151899
https://open.surechem.com/en/document/WO-2005040169-A2/
11192346
https://open.surechem.com/en/document/WO-2005060665-A2/
11292191
https://open.surechem.com/en/document/WO-2004106302-A1/
11541667
https://open.surechem.com/en/document/US-20110130365-A1/
11625114
https://open.surechem.com/en/document/US-6635655-B1/
11697831
https://open.surechem.com/en/document/WO-2005090300-A1/
119590
https://open.surechem.com/en/document/EP-0694299-A1/
133090
1650 patent hits 
15984196
https://open.surechem.com/en/document/US-20080125459-A1/
16039797
https://open.surechem.com/en/document/WO-2007007041-A1/
16119086
https://open.surechem.com/en/document/WO-2007049123-A1/
178052
https://open.surechem.com/en/document/WO-2000042044-A1/
213037
https://open.surechem.com/en/document/WO-2000002851-A1/
216327
https://open.surechem.com/en/document/WO-2000050026-A1/
23581869
https://open.surechem.com/en/document/US-20070142382-A1/
24803482
https://open.surechem.com/en/document/WO-2007122156-A1/
24986635
https://open.surechem.com/en/document/US-20080125474-A1/
5311268
https://open.surechem.com/en/document/WO-2002096934-A1/
5326893
no hits down to 0.8 
56657376
https://open.surechem.com/en/document/US-20060229455-A1/
56841608
https://open.surechem.com/en/document/WO-2002050027-A1/
6914152
https://open.surechem.com/en/document/EP-1451166-A1/
9794392
https://open.surechem.com/en/document/WO-1999003859-A1/
9797017
no hits down to 0.75
9807128
https://open.surechem.com/en/document/US-20030225100-A1/
9833984
https://open.surechem.com/en/document/WO-2001042252-A1/
9884366
https://open.surechem.com/en/document/WO-2002072202-A1/
9910224
https://open.surechem.com/en/document/WO-1996040681-A1/
9933475
450 patent hits including deuterateds

The utility of patent mapping (n.b. there are additional open patent links via PubChem sources for some of these entries)  in the context of  in silico and/or in vitro investigations on these compounds is at least threefold.  Firstly, some may include  substantially larger SAR data sets (e.g. IC50 tables)  than were eventually included in journal articles. Secondly, they may include other unpublished biological and/or ADMET data. Thirdly, analogs that are very useful (essential even ?) for a range of comparative investigations, will not only have their synthesis routes described, but also, one might assume, in cases where the NCATS proposals have been approved, that the companies concerned could donate them. 

We can compare the current small-molecule efforts as outlined in the Collaborations-to-get-the-ncats-library-of-industry-provided-reagents  post, where it is reported that Chris Lipinski found 36  (via SciFinder, Thomson Reuters Integrity and  web searches) and  Tudor Oprea et al., 41 (via  IBM US Patents database, Google and publications).  This leaves me trailing in third place with 33 structures but note that no commercial databases were used and some relevant publications were not on the Göteborg Universíty Library subscription list. I did receive some useful comments on my original post including the Google images trick.   

There are a lot of interesting corollaries to all of this but I shall just introduce some brief ones here (they also depend on intersecting the three sets to determine concordance). The first is it would be useful to know what the sources were for the three or more mappings that I "missed" but were presumably explicitly curated in SciFinder and/or Thomson.  The reason is that these products, comprehensive as they are, cannot (I presume) disclose proprietary mappings even via a company CDAs because their content is licensed to many users (~ 0.3 million globally?). Thus any code-name-to-struc they capture has to have a public primary source (including subscription publications) even if this is just a meeting poster or slide image that never got Google crawled.   The only possible exception I can think of is where CAS may be in possession of a code-name-to-struc as a necessary prerequisite for an INN and/or USAN application, but presumably it cannot disclose to users until the WHO PDF has appeared.  The second corollary is code-name-to-struc occurance in patents.  This is unlikely to be in first-filings because the identity of the eventual clinical candidate (that they may not have selected or given a development code to at filing time anyway) is exactly what applicants generally try to obfuscate but also exemplify and claim as an IUPAC.  Code names can thus only be back-mapped to structures in the early filings (as in the list above).  I have come across code names with their associated IUPACs in patents but these tend to be associated with later filings of formulations or combinations and not the first disclosure of a code-name-to-struc.

Last but not least, here are the sharing bits: 

1) The links above should be live  (but you will need the free SureChemOpen sign-up for the patents)
2) The complete Excel sheet is available for download at  http://figshare.com/articles/NCats_Compounds_with_identifications/92850
3) You can now "View my collection, "32 NCATS CIDs" from NCBI".  If you open these up there is a lot of information in the consitutive filters on the right hand side, including 15 active in assays and 15 available from vendors. Note also you can save this to your own MyNCBI, perform a range of analyses with the PubChem toolbox and  download the structures as a set of SD files or any other format.
4) As a test, I have submitted one new synonym to PubChem in the form of AZD1656 in SID 136946384.  I may do more but I am awaiting imminent enhancements to their submission system and I would also prefer to eventually do this collaboratively, so the mapping provenances can be independently corroborated (perhaps even by the companies concerned?) before they become enshrined in the PubChem synonym compilations.

Addendum  25 Aug.  Those small-molecule codes I have been unable to map or remain equivocal are pasted below (but note other parties may have dug some of them out).   If anyone can resolve any of these from declarable sources (but not necessarily be personally held to their provenance, unless they were the project leader or portfolio manager!) they are most welcome to post such new information (e.g. even just a pointer to an image) and thereby be attributed for extending the mappings. Ideally they could add a comment to this blog post but any open channel would do.

ABT-639  
LY2828360
SSR150106  
AZD2423
JNJ-39269646
PF-05190457
BMS-820132
ABT-288
PF-04995274  publication links are PubMed -ve
LY2590443
SD-7300/SC-81490  referenced  in PMID 20726512 but points to SC-78080/SD-2590
AZD1236   possibly in PMID: 21624491  but TTD-only mapping to CID 56603698
BMS-830216
AZD5904  (TTD-only mapping to CID 177992 )
SAR103168
CP-601927,   CP-601,927 possibly in PMID:     21594972
SD-6010 (SC-84250)  assuming SC-842  possibly in PMID: 17672879
AZD7268
AVE0847
PF-05019702  (PRA-27) = WAY-257027
AZD9056  possibly in PMID: 21440623  
LY2245461
SSR97225 

1 comment:

brun said...

ABT-288 can be found here
http://jpet.aspetjournals.org/content/343/1/233.full