Genomics, Medicine, and Pseudoscience: Good news for "Research Parasites": NEJM takes it back, 8 years later

After years of debate, the National Institutes of Health finally rolled out a data sharing policy early this year, one that should greatly increase the amount of data that biomedical researchers share with the public. This week, three prominent scientists from Yale described, in an op-ed in the New England Journal of Medicine, how “the potential effects of this shift ... toward data sharing are profound.”

For some of us, it’s deliciously ironic that this op-ed appeared in NEJM, which just a few years ago coined the term “research parasites” to describe anyone who wants to make discoveries from someone else’s data. That earlier piece, written in 2016 by the NEJM’s chief editors, was simply dripping with disdain. It caused a huge outcry, including a response from me in these pages and a sharply worded response from the Retraction Watch team, published in Statnews. The editor backed down (slightly) in a follow-up letter just a few days later, but the damage was done.

One interesting consequence was that a group of scientists created a Research Parasite Award, now awarded each year (entirely seriously, despite the tongue-in-cheek name) at a major biomedical conference, for “rigorous secondary data analysis.”

The 2016 op-ed in NEJM was itself a response to a call for greater data sharing published in the New York Times by cardiologists Eric Topol and Harlan Krumholz–and Krumholz, we should note, is a co-author of the latest piece in NEJM. Meanwhile, the former editor of NEJM retired years ago, and it appears that the journal is now ready to join the 21st century, even if it’s a few decades late.

What is all this fuss about? Well, many people outside of the scientific research community probably don’t realize that vast amounts of data generated by publicly-funded research–work that is paid for by government grants–are not usually released to the public or to any other scientists.

On the contrary: in much of biomedical research, data sets collected with government funding are zealously kept private, often forever. The usual reasons for this are simple (although rarely admitted openly): the scientists who collected the data want to keep mining it for more discoveries, so why share it? Sometimes, too, researchers package up the data and sell it, which is completely legal, even though the government paid for the work.

(It’s not just medical research data, either: once I tried to get some data from a paleontologist, only to learn that he treated every fossil he ever collected as his personal property. But that’s a blog for another day.)

Many scientists have been fighting this culture of secrecy for a long time. Our argument is that all data should be set free, at least if it’s the subject of a scientific publication. It’s not just scientists making this argument: since the early 2000s, patient groups began to realize they couldn’t even read the studies about their own diseases unless they paid a for-profit journal to access the paper. Those groups lobbied–successfully, after a years-long fight–that any publicly-funded research had to be published on a free website, not locked behind the doors of private publishers. Their effort led to an NIH database called PubMedCentral, which contains the full text of thousands of articles.

The new NIH data sharing policy is one consequence of the Open Science movement (which I’m a part of), which argues that science moves much faster when it’s done in the open. This means sharing data, software, methods, and everything else. There’s now a U.S. government website dedicated to Open Science, open.science.gov, which includes more than a dozen federal agencies including NIH, NSF, and the CDC.

A bit more history: as far as I can tell, the earliest voices for data sharing emerged during the Human Genome Project, an international effort beginning in 1989 that produced the first draft of the human genome in 2001. When a private company (Celera Genomics) emerged in 1998, a dramatic race ensued, and as one strategy for competing, the public groups announced that, in contrast to the private group, they would release all their data openly on a weekly basis, long before publication. That wasn’t how things had worked before.

Very soon after that, scientists in genomics (my own field) realized that all genome data, whether from bacteria, viruses, animals, or plants, ought to be released freely. The publicly-funded sequencing centers received millions of dollars to generate the data, but they weren’t the only places who could analyze it. NIH and NSF agreed, and pretty soon they required all sequencing data to be released promptly.

This same spirit didn’t touch most medical research, though. Even though far more money–billions of dollars a year in NIH funds–is spent on disease-focused research, data from those studies remained locked up in the labs that got the funds. This is now changing.

As the Yale scientists (Joseph Ross, Joanne Waldstreicher, and Harlan Krumholz) point out in their NEJM editorial, open data sharing has already yielded tremendous benefits. For example, they point out that hundreds of papers have been published using public data from the NIH’s National Heart, Lung, and Blood Institute, including studies that revealed new findings about the efficacy of digoxin, a common drug used to treat heart failure.

The new NIH policy covers all of NIH, not just one institute, and we can hope it will unlock new discoveries by allowing many more scientists to look at the valuable data currently kept behind closed firewalls.

But simply requiring scientists to have a “data management and sharing policy,” as the NIH is now doing, might not be enough. Many thousands of scientific papers already say they share data and materials–but as it turns out, the authors don’t always want to share.

A study published last year illustrated how toothless some current policies are. That study identified nearly 1800 recent papers in which the authors said they would share their data “upon request.” They wrote to all of them, only to find that 93% of the authors either didn’t respond at all, or else declined to share their data. That’s right: only 7% of authors shared their data, despite publishing a statement that they would.

The NEJM editorial proposes a different solution, one that could be far more effective: putting scientific data into a government repository. This is something the government itself can enforce (because they control the funding), and once the data is in a public repository, the authors won’t be able to sit on it as (some of them) now do.

It’s good to see NEJM joining the open science movement. Science that is shared openly will inevitably move faster, and everyone–except, perhaps a few data hoarders–will benefit.

Genomics, Medicine, and Pseudoscience

Good news for "Research Parasites": NEJM takes it back, 8 years later

No comments:

Post a Comment