NEJM editorial calls data scientists "research parasites." Can Joe Biden fix this?

Vice President Joe Biden recently called for a "moonshot" to cure cancer, which President Obama announced in his State of the Union address last week. Motivated by the tragic death of his son Beau, who died last year of brain cancer, Biden says he will devote his remaining time in office, and many years after, to helping fight cancer. On his VP blog, he writes that he wants to do two things:

  1. Increase resources — both private and public — to fight cancer.
  2. Break down silos and bring all the cancer fighters together — to work together, share information, and end cancer as we know it.

I'm 100% behind the Vice President on these efforts, and I hope he succeeds beyond his wildest ambitions. But he might discover, paradoxically, that raising money–his first goal–is easy compared to the challenge of getting scientists to share data.

Exhibit A is an editorial titled "Data Sharing" that appeared in last week's New England Journal of Medicine, written by Dan Longo and Jeffrey Drazen, the deputy editor and editor-in-chief of the journal. Drazen and Longo wrote that scientists who wish to use other people's data to make new discoveries are "research parasites." Or, to be more precise, they wrote that "some front-line researchers" (none of whom are named) have this view. They also argued that "someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters" and thus have no business re-analyzing the data.

The condescension implicit in this statement is deeply troubling. Drazen and Longo are saying, essentially, that only the people who originally collect a data set can truly understand it, and anyone else who wants to take a look is a parasite.

The editorial has led to a firestorm on social media. For example, Nobel Laureate Barry Marshall tweeted that
"Plenty of Nobel prizes came from a new look at other people’s data."
UC Davis professor Jonathan Eisen tweeted that the "editorial by @nejm is simply deranged," and a new Twitter account under the name ResearchParasite quickly drew many followers.

I asked Dr. Drazen if he really meant to imply that scientists who use other people's data are parasites. He and I spoke on the phone, and he emphasized that he's a strong supporter of data sharing, and that's he been traveling the country promoting a new policy to share the information from clinical trials (something that rarely happens). Just a few days ago, he and other medical journal editors proposed a new policy on clinical trial data sharing, a policy that (while not perfect) would be a big step forward.

So why, I asked him, did he use the harshly negative phrase "research parasites"? Dr. Drazen pointed out that he had heard this term from others, and that's why he enclosed the phrase in quotation marks in his editorial (true). He shared with me an update that will appear in NEJM this week, in which he and Longo will explain further; however the journal asked that I not quote from that.

I was relieved to hear that Dr. Drazen and his NEJM colleagues are supportive of data sharing, and that are implementing new, more open policies on clinical trial data sharing for the journal. I asked him if he would also state directly that he did not believe the phrase "research parasites" was accurate or appropriate. He declined to comment, though he reiterated the point that this phrase came from others, not from him or Dr. Longo.

So the attitude is clearly out there. Indeed, it's not that unusual: I have encountered similar attitudes many times in my own career, although I should quickly add that it is far from universal.

It's a simple fact today that biomedical researchers (take note, Mr. Vice President) rarely share their data with others. Unless a funding agency or a journal in which they wish to publish requires them to share, they will sit on their data forever. I've personally been involved in projects where the various participants–funded by NIH or other federal agencies–refuse to share data even with other groups in the same consortium. For example (and this is just one of thousands I could point to), the raw data behind this clinical exome sequencing study, led by Baylor College of Medicine and published in 2013 in NEJM, is not available. The data collected by the famous Framingham Heart Study, running since 1948, has been locked up by Boston University scientists for half a century, and only recently (after considerable pressure from their funders) have they agreed to let others take a look at small pieces of the data, if they beg hard enough.

Let's go back to Vice President Biden's blog, where he wrote:
"We’ll encourage leading cancer centers to reach unprecedented levels of cooperation, so we can learn more about this terrible disease and how to stop it in its tracks.... Data and technology innovators can play a role in revolutionizing how medical and research data is shared and used to reach new breakthroughs."

Again, I'm 100% behind the VP here. Biden is already meeting with cancer researchers to see what he can do to accomplish these goals, and I'm sure they will tell him what he wants to hear. In contrast, let's see what Drazen and Longo wrote in their NEJM editorial:
"...a new class of research person will emerge — people who use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”"
Shocking! If you share your data, someone might try to disprove your results! Could it be that a published result relies on misinterpreted data and is wrong? It took me less than a minute on Retraction Watch to find multiple articles retracted by the NEJM itself, including some that were retracted because the original data could not be found.

Disproving a claim using the same data is what reproducibility is all about, and this is one of the most important reasons that data needs to be shared. After all, if someone has distorted their data in order to reach a conclusion that isn't really justified, we need someone else–someone not invested in proving the same result–to re-analyze the data using independent methods. This is how science corrects itself.

These sentiments of the unnamed "front-line researchers" quoted by Drazen and Longo reveal the dangerously arrogant assumption that only they understand the data, and that no one should question their findings. And there's also that concern that another scientist might discover something that was missed by the original group. In what view of reality is this "stealing from the research productivity" of that group?

The phrase "research parasites" also reflects the view of some scientists that the data they collect is their property, despite the fact that their research is (frequently) funded by the public. It's time for the funding agencies to set some new ground rules: if the government funds a study, then we all own the data. Scientists who don't like the rule can find another source of funding (and believe me, they might grumble and complain, but they will do what their funders demand).

One final note: a quick scan of recent articles in the NEJM reveals that, not surprisingly, many of them rely on the human genome sequence. Did any of those authors contact the "data gatherers" to get permission to use the genome in their work? Did they offer to include the human genome sequencers as co-authors on their papers, a step that Drazen and Longo recommend? Of course not–and they shouldn't. When we publish papers, we cite the sources of our data, but we don't ask their permission nor do we include them as co-authors. Citations are the currency of modern science.

So here's some advice to Vice President Biden: don't just talk to scientists and urge them to collaborate. They'll all agree, and tell you wonderful things about their numerous collaborations, but once you leave the room, they'll go back to business as usual. If you really want to change the culture, Mr. Vice President, change the rules.

1 comment:

  1. This post reminded me of a piece of astronomy history. Tycho Brahe collected the most accurate and comprehensive set of astronomical observations to date in the late 16th century, but then misinterpreted them as support for an Earth-centric model of the solar system. Later Kepler reanalyzed the data to describe his famous and accurate laws and of planetary motion. I suppose Kepler is a research parasite, but that is a good thing. And it should comfort the data gatherers of the world that Brahe's technical achievement in taking such accurate observations is still appreciated today.

    ReplyDelete

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS