Caveat lector: This blog is where I try out new ideas. I will often be wrong, but that's the point.

Home | Personal | Entertainment | Professional | Publications | Blog

Search Archive



Something ghoti with science citations

Science has a lot of problems. Or rather, scientometrics has a lot of problems. Scientific careers are built off the publish or perish foundation of citation counts. Journals are ranked by impact factors. There are serious problems with this system, and many ideas have been offered on how to change it, but so far little has actually been affected. Many journals, including the PLoS and Frontiers series, are making efforts to bring about change, but they are mostly taking a social tactic: ranking and commenting on articles.

I believe these methods are treating the symptom, not the problem.

Bradley Voytek drunk ghoti

Publish or perish reigns because our work needs to be cited for we scientists to gain recognition. Impact factors are based on these citation counts. Professorships are given and tenure awarded to this who publish in high-ranking journals. However citations are biased, and critical citations are often simply ignored.

Bear with me here for a minute. How do you spell "fish"? g-h-o-t-i: "g-h" sounds like "f", as in "laugh". "o" sounds like "i", as in "women". "t-i" sounds like "sh", as in "scientific citations". This little linguistic quirk is often (incorrectly) attributed to George Bernard Shaw; it's used to highlight the strange and inconsistent pronunciations found in English. English spelling is selective. You can find many spelling examples that look strange, but support your spelling argument.

Just like scientific citations.

Bradley Voytek scientometrics

There are a lot of strange things in the peer-reviewed scientific literature. Currently, PubMed contains more than 18 million peer-reviewed articles with approximately 40,000-50,000 more added monthly. Navigating this literature is a crazy mess. When we created brainSCANr, our goal was to simplify complex neuroscience data. But now we want to shoot for more.

At best, as scientists we have to be highly selective about what studies we cite in our papers because many journals limit our bibliographies to 30-50 references. At worst, we're very biased and selectively myopic. On the flip side, across these 18+ million PubMed articles, a scientist can probably find at least one peer-reviewed manuscript that supports any given statement no matter how ridiculous. Don't believe me? Here's my first whack at a questionable series of statements supported by peer-reviewed literature:

Human vision extends into the ultraviolet frequency range1, possibly mediated by an endogenous violet receptor2.


The effects of retroactive prayer are well-described in improving patient outcomes1. Herein we examine the hypothesis that such retroactive healing is mediated by an innate human ability for "psi"; that is, for distance healing mediated by well known quantum effects2.

What we need is a way to quickly assess the strength of support of a statement, not an authors' biased account of the literature. By changing the way we cite support for our statements within our manuscripts, we can begin to address problems with impact factors, publish or perish, and other scientometric downfalls.

brainSCANr is but a first step in what we hope will be a larger project to address what we believe is the core issue with scientific publishing: manuscript citation methods.

We argue that, by extending the methods we present in brainSCANr to find relationships between topics, we can adopt an entirely new citation method. Rather than citing only a few articles to support any given statement made in a manuscript, we can create a link to the entire corpus of scientific research that supports that statement. Instead of a superscript number indicating a specific citation within a manuscript, any statement requiring support would be associated with a superscript number that represents the strength of support that statement has based upon the entire literature.

For example, "working memory processes are supported by the prefrontal cortex"0.00674, gets strong support, and a link to PubMed showing those articles that support that statement. Another statement, "prefrontal cortex supports breathing"0.00033, also gets a link, but notice how much smaller that number is? It has far less scientific support. (The method for extracting these numbers uses a simple co-occurrence algorithm outlined in the brainSCANr paper).

My citation method removes citation biases. It provides the reader a quick indication of how well-supported an argument is. If I'm reading this paper and I see a large number, I might not bother to look it up as the scientific consensus is relatively strong. But if I see an author make a statement with a low number--that is, a weak scientific consensus--then I might want to be a bit more skeptical about what follows.

We live in a world where the entirety of scientific knowledge is easily available to us. Why aren't we leveraging these data in our effort to uncover truth? Why are we limiting ourselves to a method of citations that has not substantially changed since the invention of the book? My method may have flaws, but it much harder to game than the current citation biases that only give us the narrowest slice of scientific support. My citation method entirely shifts the endeavor of science from numbers and rankings of journals and authors (a weak system for science, to say the least!) to a system wherein research is about making statements about truth. Which is what science should be.


. (2006). The Impact Factor Game PLoS Medicine, 3 (6) DOI: 10.1371/journal.pmed.0030291
(2010). How to improve the use of metrics Nature, 465 (7300), 870-872 DOI: 10.1038/465870a
Robinson KA, & Goodman SN (2011). A systematic examination of the citation of prior research in reports of randomized, controlled trials. Annals of Internal Medicine, 154 (1), 50-5 PMID: 21200038