Daniel Lowe

28 posts

Daniel Lowe

@dan2097

Beigetreten Ocak 2010

3 Folgt27 Follower

Daniel Lowe@dan2097·16 Şub

@cdsouthan No, there is not. I think the most common cause of "Too many radicals for esters" would be erroneous additional whitespace

English

Christopher Southan@cdsouthan·10 Şub

. @dan2097 (or anyone) Is there an index of #opsin IUPAC failure codes with possible fixes? I can correct obvious typos/whitespace issues but things like "locant 7 unsaturated" or "Too many radicals for esters" not obvious to fix

English

118

Daniel Lowe@dan2097·9 May

@iChemLabs Incidentally (Optical) Chemical Structure Recognition, abbreviated OCSR or CSR, is another common way of describing this without calling the process "OCR"

English

iChemLabs@iChemLabs·8 May

Chemical Image Recovery (CIR) is today's article in my 3-part series on #chemical data recovery. A few years of work went into producing this #ChemDoodle algorithm. #chem #molecule #chemicaldrawing Details are described & a demo, if you are interested. ichemlabs.com/news/read?post…

English

Daniel Lowe@dan2097·10 Nis

@houndcl In a real-world setting you could potentially also get improvements by processing all images from a given document and preferring the interpretation that gave more groups in common with the other images, as often a series of related compounds is synthesized

English

Daniel Lowe@dan2097·10 Nis

@houndcl Many compounds will be in PubChem, so there are real-world benefits, but I still wonder whether a fragment-based approach could be more general e.g. for documents with novel compounds; likely no advantage for the Kaggle comp, but probably within their new restrictions.

English

Daniel Lowe@dan2097·10 Kas

@jwmay @uspto @ReedTechLifeSci They appear to now have switched their naming scheme of _oldest.tar _old.tar, no suffix to no suffix, _r1.tar and _r2.tar. In the case I looked at the re-released files had a supplementary information zip that was missing from the original

English

John Mayfield@jwmay·3 Kas

Hi @uspto, updates to your bulk data look to have left some stray files "*_old.tar" laying around. bulkdata.uspto.gov/data/patent/ap…. @ReedTechLifeSci bulk data file sizes match the "*_old.tar" rather than the new ones. Any info on what changed, old files should probably be removed?

English

Daniel Lowe@dan2097·25 Eki

@cdsouthan @markussitzmann @vfscalfani @theNCI The issue is the forward slash. Many web servers don't allow them as part of the URL, even when encoded, ostensibly for security reasons. As well as OSRA, there's also Imago from Epam, and MOLVEC from NCATS: molvec.ncats.io

English

Christopher Southan@cdsouthan·24 Eki

@markussitzmann @vfscalfani @theNCI If I were a conspiracy theorist ..... calling @dan2097

English

Christopher Southan@cdsouthan·22 Eki

OSRA down @theNCI ?

Daniel Lowe@dan2097·10 May

@ChemConnector @markussitzmann @baoilleach I'm sure @baoilleach will correct me if I'm wrong, but I think LeadMine's current code for NMR extraction for the most part comes from that project. It's unfortunate that the next step with MestreLab, to do the NMR prediction to verify the spectra, didn't come to fruition.

English

ChemConnector@ChemConnector·10 May

@markussitzmann @baoilleach Your hypothesis may be proven correct, or incorrect, don't know until the work is done. ANd that's exactly the work I wanted to do but we never made progress. However, I STILL want to do it if there is interest.

English

Noel O'Boyle (@baoilleach@mstdn.science)

Noel O'Boyle (@[email protected])@baoilleach·9 May

Just been counting up the number of molecule/NMR spectrum pairs LeadMine extracts from US patent applications: ~500K. Entries like: 1H NMR (500 MHz, DMSO-d6) δ 8.94 (d, J = 7.3 Hz, 1H), 8.39 (s, 1H), 8.32-8.23 (m, 2H), 7.73 (t, J = 7.9 Hz, 1H), 7.10 (dd, J = 7.3, 5.2 Hz, 1H),...

English

Daniel Lowe@dan2097·9 May

@egonwillighagen @baoilleach Are you eluding to the Experimental Data Checker which tried to check whether a textual spectrum was plausible by comparison with the structure?

English

Daniel Lowe@dan2097·8 Oca

@cdsouthan The issue is actually the forward slash, quite a few web servers (including this one) don't correct handle encoded slashes, ostensibly for security reasons. I should probably send requests in a different way to workaround this issue.

English

Christopher Southan@cdsouthan·10 Ara

Ho hum @dan2097 (server fallen over?)

English

Daniel Lowe@dan2097·6 Eki

@baoilleach You can also encounter mentions of Tyr(O-Me), which possibly makes the reason for the O clearer, I think it's to indicate which atom on the tyrosine is the substitution point, but given the normal meaning of an O in a formula this just ends up being even more unclear

English

Noel O'Boyle (@[email protected])@baoilleach·3 Eki

Peptide scientists, please decide: is it Phe(4-OMe), Tyr(Me) or my favourite, Tyr(OMe)? Fortunately, Tyr(Me) appears to be much preferred in PubMed Abstracts. This follows the rule of minimising the substitutent size.

English

Daniel Lowe@dan2097·21 Ağu

@baoilleach @marwinsegler @ChemProfCramer @ACSCOMP @phisch124 @nmsoftware It is technically in the CML version of the dataset. It can be found by searching for reactions with a reactionAction where the action is "Irradiate". This action however doesn't distinguish between sonication and EM radiation (but the text can be checked)

English

Noel O'Boyle (@[email protected])@baoilleach·20 Ağu

@marwinsegler @ChemProfCramer @ACSCOMP @phisch124 @nmsoftware Just spoke to John. I think we already have this information - though it's not in the dataset. Though the weird catalysts are already pretty distinctive I think for these reactions.

English

Noel O'Boyle (@[email protected])@baoilleach·20 Ağu

At @ACSCOMP AI symposium, @ChemProfCramer asks @phisch124 whether it's possible to include light as a reagent. Hmmm....interesting....

English

Daniel Lowe@dan2097·22 Tem

@rmcgibbo @baoilleach If there are specific cases where ChemAxon works, but OPSIN doesn't I'm happy to look into them.

English

Noel O'Boyle (@[email protected])@baoilleach·20 Tem

I use this for basic IUPAC name->SMILES. echo ethane | py opsin.py where opsin.py is: import sys import requests url = "opsin.ch.cam.ac.uk/opsin/%s.smi" data = sys.stdin.read() res = requests.get(url % data) print(res.json()['smiles'])

English

Daniel Lowe@dan2097·20 Nis

@i_vishalll @cdsouthan Apologies for the issue, OPSIN was moved to a new server earlier this week that was not configured quite right for JNI-InChI. This is now resolved.

English

Vishal Siramshetty@i_wwwish·16 Nis

@cdsouthan @dan2097 I think the service as such does not generate any InChI. Try benzene for e.g.

English

Christopher Southan@cdsouthan·16 Nis

. @dan2097 OPSIN usually brilliant but an odd InChI failure here? I think this is pubchem.ncbi.nlm.nih.gov/compound/11843… from citeulike.org/user/cdsouthan…

English

Daniel Lowe@dan2097·21 Oca

@eawRDM InChI is the answer to the question...but if I wanted to use the chemical structure I would also want a format that more precisely captured the structure. As @baoilleach commented, the InChI/InChIKey can be generated precisely from the SMILES, but not vice versa.

English

Daniel Lowe@dan2097·24 Ağu

@JanStanstrup @egonwillighagen I'll make it more obvious next week. I think flashing would be a step too far ;-)

English

Daniel Lowe@dan2097·18 May

@aclarkxyz @dr_greg_landrum Sometimes there is no sketch eg text-mining. The main alt. is problematic esp. OD stereo molmatinf.com/whynotmolsdf.h…

English

Daniel Lowe@dan2097·7 Eki

@cdsouthan Will try to fix as it's not even an experimental section! Tricky in general as you don't want want to find fragments of name

English

Daniel Lowe@dan2097·24 Mar

@MatToddChem @cdsouthan @jessicabaiget @OpenSourceTB You can paste the reaction SMILES into MarvinSketch (if you replace > with >)

English

Daniel Lowe@dan2097·24 Mar

@MatToddChem @cdsouthan @jessicabaiget @OpenSourceTB Each compound in the XML has SMILES/InChI (when name to structurable) +Reaction SMILES

English

Entdecken

@cdsouthan @iChemLabs @houndcl @jwmay @uspto @ReedTechLifeSci @vfscalfani @theNCI