Science has has always been a discipline requiring analysis, conclusions, and communication of results to be interwoven, but with the increase in scale and diversity of data, a dangerous rift is occurring. This rift is happening at every level of scientific processing. The age of big data should be a time of great analysis, but the limitations from data management and reproducibility are often holding scientific exploration back. It is becoming increasingly difficult to replicate data driven experiments, thus terminating scientific progression, which relies on replication of previous research. The split has separated the data from analysis and both from communication of scientific conclusions. These separations are leaving dissemination of scientific results incredibley difficult, but not insurmountable. The scientific community is increasingly becoming aware of these rifts, and a movement for a more open science is gaining momentum. I was fortunate enough to be part an event organized by the rOpenSci project, aimed at development of tools that benefit science by re-uniting scientific data, analysis, results, and publishing. The event was a two-day hackathon, in which people from many disciplines met at Github headquarters in San Fransisco to work on tools that foster and enhance open science.
The types of tools worked on ranged quite a bit, from easier manipulation of specific types of data to better handling of map data and visualization, to better connection of data to data analysis. There was a focus on the statistical programming language R, but an overall enthusiasm toward any language to do the right job was apparent.
About a month away from the Hackathon there were casual discussions through Github on what projects could be tackled. I became interested in the conversations occurring on best practices for reproducibility and realized this sort of information would all be a great resource if it were all in one place. Therefore, I suggested that we build a home for it. I was fortunate enough to have others on board, so we formed a small team to make a community driven location for scientists to learn about current practices in reproducible science.
We have been continually working on the project in our spare time since the hackathon. The whole topic of reproducibility is heavily based on technology, and is constantly evolving. Our goal is to have the guide evolve congruently with contribution from people with knowledge and ideas of how to enhance reproducibility in science. The Guide to Reproducibility in Science, where the code is and site is hosted on Github, to encourage collaboration. So stop by, check it out, and become part of the team by contributing.
For me the overall theme of the hackathon event was collaboration. I was impressed with the level of collaboration possible when everyone is working with the philosophy of open science in mind. We were able to make progress on projects without scheduled meetings, file transfer problems, and even without ever meeting each other. At one point I had been discussing through chat recent contributions to the project from Ben Marwick, and realized I didn’t remember meeting him. I asked where he was and turned out he wasn’t even in San Fransisco, where the hackathon was taking place - he wasn’t, he was in Seattle! Same thing with Jeff Holister and Eduard Szöcs who were working with us from Rhode Island and Germany. We were working effectively and efficiently throughout the event as a team, across the world, with people I had never even met. The thought that this sort of collaboration could become part of how our scientific community operates is incredibly exciting.
Every person was already convinced of the power of open source, thus making working together a breeze. No one was using software that wasn’t open to everyone else and everyone was hyper-aware of how to easily disseminate the work to others. Every project could be replicated on any computer. This was the first time I had collaborated on a project through Github. I fell in love with how easy it was to work together through the framework of version control, which became extremely meta since we were literally sitting on couches in Github headquarters.
It is worth working within the frameworks that foster collaboration and open science no matter the field. I do not consider myself a computational biologist, as most of my time is spent doing wet lab work. You do not have to consider yourself a programmer to take advantage of these resources; they can help every type of scientist from data collection to publication. The discrepancy in use, in my opinion, lies in attitude. Attitude of everyone involved! If you are someone who absolutely hates doing your analysis in something that allows others to easily reproduce your results, like R, or hates version control because it you think it makes your job harder - change your attitude. On the other end, if you are a programmer who hates the people who make you use Microsoft Word to collaborate on papers or become irate with the misuse of your favorite form of programming syntax, also, change your attitude. Acting this way is harmful and you are actually isolating people from even trying to adopt new strategies. The shift to a more open science will take time, so everyone involved must be patient and helpful. Spend the time to convey your expertise how and what standards should be placed on tool development and maintenance, reproducibility in analysis, and publishing criteria whether you are a programmer or a scientist whose main skills lie elsewhere. The scientific community needs to come together to mend these rifts and renovate a modern science that is open to all and capable of handling all the amazing data that is flowing around us.