Open Source Research

Open source is changing the way geeks work. And in just the span of a little spark of time in comparison to the lifetime of itself, the software development industry has been adapting to accomodate them. It's happended so quickly, and so effectively, that I hope the time is right to also change the way another big sector of geeks does its work, those in science. The difference, as I see it, is that young geeks in the software playing field have at least a chance of making a name for themselves, by themselves, by producing results which benefit them on the merits of their work alone. However, the big machine of scientific research today provides much less support for new techniques. The funding mechanisms, the information distribution techniques, the communication methods, they all sort of work, and too many people are afraid to tear them down to build something better. Research is currently a monolithic agency of the US federal government, and so being, hardly adaptable.

Communications within the scientific research community are based on centuries old technology. The primary carrier medium is still printed journals, where results are presented to the community in finished form. Scientific success is too often based on the number of articles published in journals. No karma points are given to those who help mold and direct the ideas of others. No benefit is given to the presenter for ideas half finished, so many are carried through to terminus, whether of value or not. For hundreds of software ideas, one can find several different projects in various states of completeness and effectiveness. Sir Isaac Newton is credited with attributing his success to the fact that he has ``stood on the shoulders of giants.'' The giants provide fantastic stepping stones, but the midgets also provide assistance in furthering the progress of science. Unfinished ideas have value, but they are now currently swept under the carpet. Statistical studies have suggested that the e-folding time of an idea underneath the current system is about 20 years, as the older and article in a journal is, the less likely it is to be referenced. If the articles were dynamic, they could be modified, updated with current ideas and technology, rather scratched. Every idea in one's head remains static only if the mind is closed.

Many workers in the field are more than willing to go out of their way to help others carry out similar work to theirs. However, it's generally not possible to go out and grab a tarball of their data, processing tools, and finished reports, and submit diffs for their approval. A paper published is and idea dead to the world. There is no revision control, no periodic updates, no maintainer of a given document. Papers are tossed onto the stack and forgotten about, wit the separation of the gems from the detritus left to the individual. If I publish a paper and receive a letter from a reader pointing out omissions, incompleteness, or (more likely) general inadequacy, there's nothing the commenter or commentee can do to change the document, in its ancient static form. It doesn't have to be this way.

revoluti0n

A large body of scientific work could be referenced in centralized, but mirrorable, repositories of meta data. Information is currently distributed by a commercial industry, the journal publishers. In the U.S. and many other countries in its region of the scientific world, research is funded by the public. The products of this funding, however, are not uniformly available to the public which provides the means. Scientific journals are too expensive for an individual to maintain subscriptions to, and only a larger university holds anything but the most mainstream publications. Tidbits of this information are available through news agencies, but often in dumbed down form. The net has set us free in many ways, but it should mean much more than being able to buy a Tamagotchi revision #II in teal at the lowest bid. However, if one wants to access the raw scientific information pipe right from the source, it's not possible without reading it on paper, provided one can get to a library, or wait the time it takes for a library to photocopy and mail the information if possible. Sneakernet has wonderful bandwidth, but the latency sucks. There is very little standing in the way of redirecting output away from the journals to more efficient and sensible sources. One obstacle is the creation of these new repositories. The second lies in breaking traditions built of steel reinforced concrete. Making a scalable version of slashcode might be one way to start out on the former. The only way I can imagine surmounting the latter is by properly executing the former.

A majority of scientific work is still done in small groups behind closed doors. Ideas are shared, but too often the need to produce ``original'' work throttles the free exchange of information. Data can easily be fudged, and results twisted to look nice for the funding agencies. I myself have broken down to my bosses' insistence that I ``make the data look prettier.'' I felt unclean doing it, especially knowing that this wasn't to be the first tainted product to make it into print. Open source development has shown itself effective at weeding out inconsistencies and mysterious confusing bits of work. The scientific morals would be upheld ever more strongly underneath the support of the Thousands of Eyes.

Implementing these suggestions will meet great resistance. The scientific community today is bloated and filled with dead wood. Researchers often work in disjointed parallel, in their respective scientific cubicles, publishing the same work under different titles. Scientists are publishing worthless results again and again under different names, just because they are buddies with the publishers and funding agencies. A large number of researchers depend upon routinely submitting papers in order to pay their bills and feed their families. Comfort is a difficult thing to abandon.

Work amongst the open source community proceeds in bits of punctuated development as opposed to steady forced growth. Work proceeds ahead only when there is work of a valid nature to be done, not for the purposes of paying Dr. X's next monthly salary. Thoughts, ideas, and creativity are not things that can be pressed into a pristine little 9-to-5 package.

The time is beyond ripe for change in the methods of information transfer in the field for which the net was first created. Sure, point to point exchanges of scientific data have been severely enhanced through networking technologies, but the ideas behind and on top of these data still go through their old channels. Email is more often used than letters, but to communicate in an analogous manner to its paper based predecessor. Open discussion mediums, such as slashdot, usenet, and the like are required to bring scientific research into the medium which was designed for it. OSS development works great for producing and distributing code. Now it's time to extend these ideas to tear down the dinosaurs which need it.

...keep on going to Internet Groupware for Scientific Collaboration .


[ home ] [ antarctica ] [ contact info ]