Text-mining tool seeks out ‘hidden data’

Source: Nature

Forgotten to free your data? A tool called Wide-Open can search out instances of locked online research data sets that are supposed to be public — and it has already flagged hundreds of such instances in genetics research, according to a study1 published in PLoS Biology on 8 June.

Scientists often post ‘hidden’ data online in repositories while their related studies are going through peer review, intending to make data sets public later.

Two popular repositories that offer researchers the option to keep genetics data hidden, for example, are the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA), both run by the US National Center for Biotechnology Information. Both sites require data sets to be made open when papers are published. But in practice, scientists often forget to do this, says Maxim Grechkin, a computer scientist at the University of Washington in Seattle.