I was all set to write a review of an interesting study of bowtiene 1: its rearrangement to other C10H6 isomers and its dimerization. But as I was gathering my information, I wanted to prepare the images of the optimized geometries, and so I went to get the supplementary materials.
The author has a section on the supplementary materials that indicates it contains Cartesian coordinates – just what I need. (This section ends with the curious line “This material is available for free of charge via the internet at http://pubs.acs.org.”; it’s curious because the article is in a journal not published by ACS. I’ll leave for speculation just what happened here, but clearly the copy-editing done by the Canadian Journal of Chemistry is not quite up to snuff!)
So, I went to the website and clicked on the link for the supplementary material and was then told that I did not have access to this material and that either I had to become a subscriber or I had to purchase access to the article. (I should point out here that I received this article through interlibrary loan.) This is the first time that I have run into a paywall to get supporting materials! I know I am probably lucky that it took 7 years before running into this problem. But that makes this situation so frustrating – just why is the Canadian Journal of Chemistry placing supplementary material behind a paywall, especially when so few other publishers are doing this?
Well, until I get the supplementary materials, I will not write a post about this article.
New policy: I will not blog about an article unless (a) there is information on the 3-D structure of the molecules, typically in supporting materials, and (b) this information is available for free. This requirement should really be the minimum for publishing computational chemistry results. Now, I would also hope that the coordinates are readily reusable – see Henry Rzepa’s post about recent problems he’s run into!
Henry Rzepa responded on 29 Jul 2014 at 4:22 am #
I agree, it very often is a sub-optimal experience.
1. A DOI for all journal articles points merely to what is called the landing page.
2. This has links to the various components of an article, including the PDF and HTML versions, and various supporting information links.
3. If the article is GOLD open access, all these links should work regardless of any paywall.
4. And as Steve says, the links to supporting information should all work regardless of any paywall. But apparently with the Canadian J. Chem, this is not true. I do so hope this is a mistake rather than a policy
But where journals really let us down is that they in effect put NO EFFORT into curating the supporting information. I suppose their argument is that if they are making it free, they have no resources earned from anywhere to curate it. It is entirely up to the submitting authors to prepare a fit-for-purpose document (or document collection), and they in turn will do the minimum required to “get the article published”. This work is also often done by a relatively inexperienced research student (and quite possibly only skimpily inspected by the main author).
I would raise further points. Since the “landing page” does not have to conform to any standard, each publisher constructs it parochially. This makes any automated retrieval of information from it quite a lottery. If supporting information is truly freely available, it should also be minable (the right to read is the right to mine). Surely, acquiring the SI should be simply a matter of quoting the DOI and perhaps some flag that creates a request for the SI. But no, few publishers allow this (some might even disable an institutional access if they detect any attempt at automated accessing by any user, and attempts to automate access to SI may well be caught up in such a block). However, there ARE standards that can be used to achieve this; you can see one in operation at http://doi.org/qcc where the diagram you see there does indeed query a “landing page” and retrieve what it wants from there automatically. It then ges on to automatically populate the diagram with a visual rendering of that data. I do encourage you to go visit, but I have at least one ulterior motive for asking this. Every time such a request is made, my “impact factor” index is incremented! This in turn is called “deposition with recognition“. It is a small “carrot”.
Of course, automated access to SI is only the start. Once the document(s) have been retrieved, data in a structured format has to be retrieved (ie cartesian coordinates). And that is a new story which I will not start upon here.
I was discussing with a publisher yesterday (of a very novel and very new science journal that actually is one of the very few that does attempt to solve some of these problems). He tells me that there is a strong antipathy amongst very many scientists to provide “useful data” as part of their publication. Its almost a cultural thing he tells me. The data that IS provided is only because of a “stick” (“carrots” apparently do not work), and sticks rarely provide optimal solutions.
Perhaps we need to think about more attractive carrots that will make much of the above possible. Any ideas anyone?
Steven Bachrach responded on 29 Jul 2014 at 7:49 am #
Not so sure that “sticks” don’t work – think about how all crystal structures are now deposited in semantically rich form, either to CCCD or PDB. This has been a condition for publishing, especially by the top-tier journals, and so everyone does this because they want to be published in these top journals.
If the top journals mandated deposition of, for example, computed structures in some specific format, say XML or as a molfile or whatever, wouldn’t we all do this because we want to get published in these top journals?
I think that what is needed is that chemists need to express a greater demand for data. The x-ray structure situation came about because it was so difficult to solve a structure back in the day, that people wanted access to this data in a way that allowed them to make direct comparisons and reuse in their own labs. We need to make more significant access of SI and then make it known to editors and publishers when this SI is not appropriately resuable.
Henry Rzepa responded on 29 Jul 2014 at 9:07 am #
Well, with CCDC there was a clear organisation which publishers could target with their sticks. They were founded some 50 years ago, and currently survive because they are seen to add value to deposited structures (validation, and the creation of a rich search engine for subtle structural features). Nevertheless, they do charge organisations for access to this, and from this income they continue to add value. A reasonably virtuous circle one might say, although absolutely not open access. A caveat is that eg http://www.crystallography.net is catching up in various ways. Thus the latter has 300,000 clearly open access structures, compared to about twice as many for CCDC. I do not know much about the validation/added value that http://www.crystallography.net themselves add, but it is noteworthy that they only came on to the scene some 50 years after the closed CCDC model launched. Are they are the future?
So the question is whether either of these two particular successful models will work as a carrot for other communities that generate molecular structures? Both crystallographic models work because of scale; 300,000+ structures clearly enables new science. SI does not work because any individual publisher gathers SI on a relatively tiny scale, and applies no validation or added value (and there is already very little value in much SI, at least in a semantic machine sense). It is incredibly fragmented, far too so to act as either a stick or a carrot.
There are many facets to why computationally generated structures are different from crystallographically measured ones, and I will not try to list them here. But I might pose one question for the community. CCDC uses CheckCif for (some degree of) validation. Is there a CheckComp? (and no, its not just a matter of checking they are valid cartesian coordinates, but also very much the provenance and metadata).
Henry Rzepa responded on 29 Jul 2014 at 9:31 am #
An after thought about SI being behind a paywall. Even worse are incidents where an article retraction is itself behind a paywall. You would have to pay to find out why an article had been withdrawn!
And of course errors. I have paid APCs (article processing charges) to make an article open access. On two occasions, for a month or more, these articles appeared behind paywalls (both eventually emerged, but only after I complained).
It might seem that some (many?) journals have no particularly pro-active mechanism for ensuring such errors are trapped. They do appear to respond largely to complaints (which for them is presumably a very cheap way of error-checking!).
Steven Bachrach responded on 29 Jul 2014 at 10:29 am #
I think Henry has hit on a real important aspect here – data is simply too important and too domain specific to be turned over to journal publishers for handling and archiving. Just as CCDC is a third-party repository, I think we need a third-party repository of computational chemistry data. This would include not just the coordinates, but loads of other computed information and meta data – like what method was utilized. Oh to dream…
Henry Rzepa responded on 29 Jul 2014 at 1:06 pm #
It does not have to be a single, global repository, as was the case for many years with crystal structures. But what does need to happen in any alternative distributed model is that the repositories are able to communicate with each other. In the sense of exchanging meta-data, and annotation (= more metadata). SWORD is a meta-data exchange and synchronisation protocol for repositories which we now have working (in trial mode) between two repositories. One might envisage how any particular repository might specialise. For example, in validation. Or perhaps computing a property where local expertise is present. These attributes can then be added to the meta-data and SWORD can be used to synchronise this to other repositories.
Oh to dream. I hope the above both sounds like a dream and an achievable reality at the same time. And of course it is predicated on open data. The closed CCDC style model (= closed ecosystem if you like) we know about. But the potential of open models has only just started to be dreamt about!