Beginning Bioinformatics
for Perl Programmers
by James D. TisdallJanuary 02, 2002
The importance of programming in biology stretches back before the previous decade. And it certainly has a significant future now that it is a recognized part of research into many areas of medicine and basic biological research. This may not be news to biologists. But Perl programmers may be surprised to find that their handsome language has become one of the most - if not the most popular - of computer languages used in bioinformatics.
My new book Beginning Perl for Bioinformatics from O'Reilly & Associates addresses the needs of biologists who want to learn Perl programming. In this article, I'm going to approach the subject from another, almost opposite, angle. I want to address the needs of Perl programmers who want to learn biology and bioinformatics.
|
Related Reading
|
First, let me talk about ways to go from Perl programmer to "bioinformatician". I'll describe my experience, and give some ideas for making the jump. Then, I'll try to give you a taste of modern biology by talking about some of the technology used in the sequencing of genomes.
My Experience
Bioinformaticians generally have either a biology or programming background, and then receive additional training in the other field. The common wisdom is that it's easier for biologists to pick up programming than the other way around; but, of course, it depends on the individual. How does one take the skills learned while programming in, say, the telecommunications industry, and bring them to a job programming for biology?
I used to work at Bell Labs in Murray Hill, N.J., in the Speech Research Department. It was my first computer programming job; I got to do stuff with computer sound, and learn about speech science and linguistics as well. I also got to do some computer music on the side, which was fantastic for me. I became interested in the theory of computer science, and entered academia full time for a few years.
When it became time for me to get back to a regular salary, the Human Genome Project had just started a bioinformatics lab at the university where I was studying. I had a year of molecular biology some years before as an undergraduate, but that was before the PCR technique revolutionized the field. At that time, I read Watson's classic "The Molecular Biology of the Gene" and so I had an inkling about DNA, which probably helped, and I knew I liked the subject. I went over to meet the directors and leveraged my Unix and C and Bell Labs background to get a job as the systems manager. (PCR, the polymerase chain reaction, is the way we make enough copies ("clones") of a stretch of DNA to be able to do experiments on it. After learning the basics of DNA -- keep reading! -- PCR would be a great topic to start learning about molecular biology techniques. I'll explain how in just a bit.)
In my new job I started working with bioinformatics software, both supporting and writing it. In previous years, I'd done practically no programming, having concentrated on complexity theory and parallel algorithms. Now I was plunged into a boatload of programming -- C, Prolog, Unix shell and FORTRAN were the principal languages we used. At that time, just as I was starting the job, a friend at the university pressed his copy of Programming Perl into my hands. It made a strong impression on me, and in short order I was turning to Perl for most of the programming jobs I did.
Don't miss the Beginning Perl for Bioinformatics session, Monday, January 28, 2002, at the O'Reilly Bioinformatics Technology Conference. |
I also started hanging out with the genome project people. I took some graduate courses in human genetics and molecular biology, which helped me a lot in understanding what everyone around me was doing.
After a few years, when the genome project closed down at my university, I went to other organizations to do bioinformatics, first at a biotech startup, then at a national comprehensive cancer center, and now consulting for biology researchers. So that's my story in a nutshell, which I offer as one man's path from programming to bioinformatics.
Bringing Programming to Biology
Especially now that bioinformatics is seen as an important field, many biology researchers are adding bioinformatics to their grant proposals and research programs. I believe the kind of path that I took is even more possible now than then, simply due to the amount of bioinformatics funding and jobs that are now staffed. Find biology research organizations that are advertising for programmers, and let them know you have the programming skills and the interest in biology that would make you an asset to their work.
But what about formal training? It's true that the ideal bioinformatician has graduate degrees in both computer science and biology. But such people are extremely rare. Most workers in the field have a good mix of computer and biology skills, but their degrees tend to come from one or the other. Still, formal training in biology is a good way for a computer programmer to learn about bioinformatics, either preceding or concurrently with a job in the field.
I can understand the reluctance to face another degree. (I earned my degrees with a job and a family to support, and it was stressful at times.) Yes, it is best to get a degree if you're going to be working in biology. A masters degree is OK, but most of the best jobs go to those who have their doctrate degree. They are, however, in ample supply and often get relatively low pay, as in postdoc positions that are frequently inhabited for many years. So the economic benefit of formal training in biology is not great, compared to what you may be earning as a computer expert. But at present bioinformatics pays OK.
On the other hand, to really work in biology, training is a good thing. It's a deep subject, and in many ways quite dissimilar to computer science or electrical engineering or similar fields. It has many surprises, and the whole "wet lab" experimental approach is hard to get out of books.
For self-study, there's one book that I think is a real gem for Perl programmers who want to learn about modern biology research. The book is called "Recombinant DNA," by the co-discoverer of the structure of DNA, James Watson, and his co-authors Gilman, Witkowski, Zoller, and Witkowski. The book was deliberately written for a wide audience, so you can start at the beginning with an explanation of what, exactly, are DNA and proteins, the two most important types of molecules in biology. But it goes on to introduce a wide range of fundamental topics in biology research, including explanations of the molecular biology laboratory techniques that form the basis of the revolution and the golden age in biology that we're now experiencing. I particularly like the use of illustrations to explain the techniques and the biology -- they're outstanding. In my jobs as manager of bioinformatics, I've always strongly urged the programmers to keep the book around and to dip into it often.
The book does have one drawback, however. It was published in 1992. Ten years is as long in biology as it is in computer technology; so "Recombinant DNA" will not go into newer stuff such as microarrays or SNPs. (And don't get the even earlier "Recombinant DNA: A Short Course" -- the 1992 edition is the one to get for now.) But what it does give you is a chance to really understand the fundamental techniques of modern molecular biology; and if you want to bring your Perl programming expertise to a biology research setting, then this is a great way to get a good start getting the general idea.
Pages: 1, 2 |



