Bioinformatics is one area of
computing where you'll still want to pay special attention to performance. With the
human genome consisting of 3 billion bases, using one byte per base gives you
three gigabytes of data to work with. Clearly, something that gives you only a
constant reduction in computational complexity can result in huge time savings.
Because of that concern for performance, I expect to be working in C++ regularly this
semester. In fact, the first day of class was a nice review of it, and I welcome the
change since it's been many years since I've done much of anything in the language.
One thing that struck me as particularly painful was memory management and pointers.
When was the last time you had to remember to
delete [] p;
? The power of
being able to do such low-level manipulation may be inebriating, but you better not get
too drunk. How ever would you be able to
keep
the entire program in your head? (Paul Graham's timing was amazing, as I saw
he posted that article about 10 minutes before my re-introduction to C++).
C++ works against that goal on so many levels, particularly with the indirection pointers
provide. Something like this simple program is relatively easy to understand and remember:
#include
#include
using namespace std;
int main(int argc, char *argv[])
{
int *i = new int(1);
*i = 1;
cout << *i;
delete [] i;
system("PAUSE");
return EXIT_SUCCESS;
}
It is easy to see that
i
is a pointer to a location in heap memory
that's holding data to be interpreted as an integer. To set or get that value you
need to dereference the pointer, using the unary
*
operator.
But what happens when you increase the complexity a little? Here we'll take a reference
to a pointer to
int
.
int printn(int *&n)
{
cout << *n;
}
The idea stays the same, and is still relatively simple. But you can tell it is
starting to get harder to decide what's going on. This program sets a variable and
prints it. Can you imagine working with pointers to pointers or just a couple of
hundred lines of this? Three cheers for the people that do.
What if we change it a bit?
int printn(int *n)
{
cout << *n;
}
Are we passing a pointer by value, an
int
by reference, or is something else
going on?
It makes me wonder how many times people try adding or removing a
*
when trying to
fix broken code, as opposed to actually tracing through it and understanding what is
going on. I recall doing a lot of that as an undergrad.
I'm not convinced mapping everything out would have been quicker. (I'm not
convinced throwing asterisks around like
hira shuriken was either.) One thing is for sure though - getting back into C++ will make my head hurt, probably more than trying to understand the real bioinformatics subject matter.
Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!
Leave a comment
as my C++ teacher used to say:
"there are only two types of programmers in this world: the ones that understand pointers in C and the ones that don't"
Posted by barry.b
on Sep 02, 2007 at 06:11 PM UTC - 6 hrs
Certainly there are a lot of people who don't understand pointers. But I still think among those of us who do, their existence can lead to long chains of code that become hard to follow.
That's true in general without pointers, but I think they exacerbate the problem. You have to keep the code incredibly simple to follow them without impedance.
Posted by
Sam
on Sep 02, 2007 at 06:23 PM UTC - 6 hrs
Here's a non-pointer oriented tidbit one of my professor's mentioned:
Friend classes:
"The thing to remember about friends is that they can touch your private parts. (From a design standpoint) The question is, should friends be touching your private parts?"
Just thought you might enjoy that.
Mike.
Posted by
Mike Kelp
on Sep 02, 2007 at 08:46 PM UTC - 6 hrs
Hey buddy,
Wow -- sounds so exciting!
One thing you could do is use one of these nice C++ garbage collection systems that works behind the scenes. I've heard good things about these two:
Boehm:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/Giggle:
http://giggle.sourceforge.net/Coming from the LISP world, I love not having to worry about such things. If you are into using LISP, you could use Gnu Common Lisp (GCL), and once you're ready to roll, just issue the form `(comp t)' and all of your functions will be compiled down to machine code, and will run at near C-speed. I find it wonderful both for the fact that I need only worry about the real functionality, and at the same time, I can use tools like ACL2 (
http://www.cs.utexas.edu/users/moore/acl2/) to help me prove properties of my code.
-brother grant
Posted by
grant
on Sep 04, 2007 at 10:47 AM UTC - 6 hrs
@Mike - Maybe they should (at least the way they are designed it seems that it the purpose), but do you want them to? I've got some hot friends, so ... =)
@grant - Awesome. Thanks for the links. I wasn't aware of any of that, in particular I had never considered the possibility that Lisp could be compiled down to machine code. As you know, I've used it very sparingly in the past, but I plan at some point to learn it well. I'm especially interested in Paul Graham's Arc if he ever finishes the thing.
I'm not particularly worried about it at this point, but I do remember the pain of C++ when compared to all the lack thereof in the even higher-level languages I've been using since then.
Posted by
Sam
on Sep 07, 2007 at 11:04 AM UTC - 6 hrs
Leave a comment