So as some of you may be aware, I gave a talk at Google last month. The intent was to talk about some of the computational road blocks some of us neuroscientists hit when we're analyzing our data. The details of this are not important right now, but what is important is the thought processes that stemmed from this, namely, how important writing script/code is for neuroscience research and how poorly trained most of us neuroscientists are at this.
I count myself as a bit lucky that I had a bit of a head start. Spending my teen years in the mid 90s as a 100% geek meant that I was, by default, really into computers, that I built my own systems out of spare parts, hacked together settings in my config.sys and autoexec.bat files to optimize for different games, and generally broke things to see what happened. Then, as I got older, this interest in hardware slowly segued into an interest in software so that, when I got to college, I took a few programming classes to expand this knowledge a bit.
Now, by no means am I a "programmer". Hell, these days I do almost everything in Matlab, but what I do generally works and when I need to do something new I can usually figure out how. A lot of neuroscientists use Matlab. More specifically, most cognitive neuroscientists who work with fMRI use SPM which is a toolbox built using Matlab. This (and other toolboxes and software) are great in that they allow a lot of people to easily analyze data.
fMRI is not a technique I choose to use, however there are lots of other pieces of software (proprietary and closed-source, semi-open, and open-source) for electrophysiology analysis, which is a technique I do use. When I first started my PhD I did my analyses in a plug-and-play manner using a variety of software packages. But I got interested in what was happening to my data. I wanted to know how to do this stuff myself. So I started hacking together my own scripts and obsessively plotting, analyzing, coding, debugging, and re-analyzing my data from scratch. And wow did I learn a lot. I learned what "good" clean, artifact free data looked like compared to both large and subtle artifacts.
And I began seeing those artifacts in published research papers.
The researchers in my field are really trying to be good scientists. In cognitive neuroscience that usually means a solid grounding in statistics, psychology, and hopefully philosophy and a few other fields. But usually we're not signal analysts, programmers, mathematicians, or engineers. But we are increasingly relying upon tools and methods that necessitate an understanding of those fields, and sometimes poor understanding can lead to mistakes that get published, and those mistakes propagate as "truth" until verified or debunked. But either way, those mistakes add noise to the system and cost someone time and money before they get sorted out. (One of the nice things about the scientific method is that errors really do get sorted out over time as people try to replicate findings.)
But how much noise is there in the scientific literature? How many studies have been published making claims based on erroneous or artefactual data? Informally, when commiserating with colleagues over beers, the consensus seems to be that this number is probably quite large. And this is probably not an issue just in neuroscience.
I've got a lot more to say about this, but I'll have to continue in another post, as this one is getting a bit long...