Posted by Sam on Jul 04, 2008 at 02:02 AM UTC - 5 hrs
Seriously, be a purple cow.
Not [the] best cow or most milk-giving cow or prettiest cow. A purple
cow would stand out in a crowd of best, most milk-giving, and prettiest
cows. [Indeed,] It would be the purple one that you would talk about if you saw
that group.
-Chad Fowler in My Job Went To India, with the idea from Seth Godin (whose book is linked in the quote)
I've never believed I'm smart or gifted or otherwise especially endowed with intelligence. To me, working hard and putting in the effort has always been the key to success. Not until many years after I came to that conclusion did I read about how "eastern" educational philosophy outpaces that of the "west" in that way.
This is hard for me to say, given those beliefs, but it's something I've come to realize in the past few weeks: it's not hard to be remarkable. Let me explain how easily not sucking leads to remarkability, from stories I've personally experienced in the last two weeks.
A Tale of Two Types Of Customer Service
Ring... Ring
Hello?
Hi, I just bought a reel for my garden hose, and there's supposed to be a short 3 foot hose that connects the faucet to the reel which connects to the hose. The reel I picked up didn't have one on it, can I get one?
Sure, is there something I can "steal" and give you from another one?
...
Ring... Ring
Hello?
Hi, I just bought a ceiling fan and installed most of it until I realized one of the blades is scratched. Do I have to disassemble the fan and bring it all back in?
No, just bring the blade and the receipt...
...
Marsha, check out this 61 inch slim DLP TV. Isn't it awesome
Well, the viewing angle sucks. Can't we get a flat screen LCD?
Wow. You're right.
We don't have one. The nearest one in the computer is in Corpus Christie [3.5 hours drive time], but there's also one in the warehouse that will be here Friday.
That's not good enough, I need it by Monday.
[After making some calls to check around for TVs not in the system] I located one for you. It can be here tomorrow at 1 PM.
Great, I'll see you then.
...
Contrast those short stories with taking Wednesday off to get telephone service, calling AT&T and asking why a technician never showed up to get told they'll be there Thursday, to be taking off all day Thursday and then calling asking why a technician never showed up to get told they'll be there Friday, to be taking off all day Friday and calling to be told there was a problem with your order and no one ever bothered to call to tell you about it, to be taking off and calling ...
Also contrast those short stories with waiting a week for an appointment with Comcast on Monday, to be calling after the time frame ended, to being told to wait up as late as you can because they have flashlights, to being told they'll certainly be there Tuesday, to waiting until Wednesday, and again until Thursday.
Aside from the monopolistic consequences we observe, the difference in customer service is striking. Normal would have been acceptable, but the first three examples were remarkable, especially given the crap that came later.
The Point
There may be a lot of companies who aren't looking for great hackers, so perhaps being one isn't going to be in your best interests for finding just any job. It may be in your interest for finding an incredible job, however.
I used to think being lazy might be a quality of a good software developer. Instead, I learned that you should be proud, not lazy. It's not the negative side of pride we should strive for, mind you - it's the the limit of not sucking x_amount as not sucking x_amount approaches infinity.
I don't want to be around people who only want to succeed. I want to be around people who want to excel.
It's that easy to be remarkable to most people. It's the difference between being the one who blows the feather that ends up floating in the wind versus being the feather.
Being remarkable is the difference between being the one to flush the toilet as opposed to being the piece of shit that rides the wave down.
As always, comments, thoughts, and criticism are encouraged and appreciated.
Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!
Posted by Sam on Jun 27, 2008 at 09:49 AM UTC - 5 hrs
Linus Torvalds, Yukihiro Matsumoto, David Heinemeier Hansson, and Larry Wall. They're all famous software developers
you may have heard of. If you haven't heard of them, surely you know about some of their creations:
Linux, Ruby, Rails, and Perl.
Checking the Rails core alumni list, I had heard of half of them before I knew they were ever Rails team members.
You know other names as well - many inside what you might consider your core community, and probably several
outside of it.
Even if these developers weren't trying to do so, they
did a fantastic job of marketing themselves. As Chad Fowler notes in this week's chapter of MJWTI,
Anyone can write Struts or Nant on their
résumé. Very few can write Struts committer or Nant committer.
When thinking about marketing yourself as a programmer, keep that in mind. In other words, it's not just
about the fame from a hugely successful project you started, or the love from all the sexy kittens who
know your name.
More...
Simply having participated in the project shows not only your passion for software
development, but also that you're well versed in the technology you intend to use. You helped develop it, after all.
My own experience in this sphere has been limited, but it's something I hope to rectify.
I have released a couple of projects, to little fanfare, but since then, I've been wanting to work on projects that someone else started, because being responsible for
the life of the project is something I'm just not interested in at the moment.
In pursuit of that goal, at the beginning of the year, I resolved to get more involved in OSS
(among other things), and for a couple of weeks I actually stuck to it. But once school started I quickly
realized that there just wasn't enough time to do everything I wanted, and open source contributions were
some of the first to go.
Another limitation was that while I wanted to work on JRuby, I wasn't using it for any major projects
(just many small scripting tasks) - so
I couldn't even help in the most obvious way of filing bug reports. However, now I'm working on a Ruby on Rails
application that we expect to deploy on .NET using IronRuby, so I may get some good opportunities to help
on that project, even if just by a little.
These are all baby steps. I expect to get more involved in the future, even if in small ways to various projects that
never lead to "committer" status.
I know a lot of you already have your own projects and collaborate or participate in others. Perhaps you
can help answer the concerns of everyone else.
For the rest of you, here are a few questions for discussion:
What are you waiting for? Is there a way you can help your favorite project this weekend? It will make you
a better developer than reading this blog will, that's for sure. =)
Do you feel too intimidated to start, or just don't know how?
Posted by Sam on Jun 25, 2008 at 08:01 AM UTC - 5 hrs
I don't like to have too many microposts on this blog, so I've decided to save them up and start
a Programming Quotables series. The idea is that I'll post quotes about programming that have one or more of the
following attributes:
I find funny
I find asinine
I find insightfully true
And stand on their own, with little to no comment needed
Here's the seventh in that series. I hope you enjoy them as much as I did:
More...
In the software industry, we've been chasing quality for years. The interesting thing is there are a number of things that work. Design by Contract works. Test Driven Development works. So do Clean Room, code inspections and the use of higher-level languages.
All of these techniques have been shown to increase quality. And, if we look closely we can see why: all of them force us to reflect on our code.
That's the magic, and it's why unit testing works also. When you write unit tests, TDD-style or after your development, you scrutinize, you think, and often you prevent problems without even encountering a test failure.
Perhaps I've made it seem like I'm on the side of the pirates. Just to make it clear that I'm not sailing under the jolly roger: In my own view, piracy is wrong. It's wrong even when the people making and selling the game are senseless, self-destructive fools. It's wrong even if the game sucks. It's wrong if you're broke. It's wrong even if "you weren't going to buy it anyway." It's wrong and I don't do it, ever.
It is not my intention to preach at pirates and get them to change their habits. I'm not anyone's mum, and it's not my place to tell people how to act. I actually think that having lots of people repent of piracy right now would be horrible. The managers would conclude their monstrous policies were working, and we'd get a double helping of the same, forever after, in every game they put out.
Regardless of the approach taken, I definitely no longer believe that sprocs should play any significant role in any application. The current mandate in the software industry is to strive to lower costs by increasing developer productivity and ORM's clearly help to do this by eliminating the need to write and maintain countless simple CRUD sprocs.
It's definitely time for all of us .NET developers to abandon our convention sproc wisdom and start playing catch-up with the rest of the industry when it comes to using ORM's.
I am not at the mercy of some big up-front UML diagrams or "non-agile" models grounded in getting something wrong in its entirety and very thoroughly before you take measures to fix it (or even begin to detect it).
Posted by Sam on Jun 18, 2008 at 07:44 AM UTC - 5 hrs
Just two years ago, I was beyond skeptical towards the forces telling me that comments are
worse-than-useless, self-injuring blocks of unexecutable text in a program. I thought the idea was downright ludicrous. But as I've made an effort towards reaching this nirvana called "self-documenting code," I've noticed it's far more than a pipe dream.
The first thing you have to do is throw out this notion of gratuitously commenting for the sake of commenting that they teach you in school. There's no reason every line needs to be commented with some text that simply reiterates what the line does.
After that, we can examine some seemingly rational excuses people often use to comment their code:
More...
The code is not readable without comments. Or, when someone (possibly myself) revisits the code, the comments will make it clear as to what the code does. The code makes it clear what the code does. In almost all cases, you can choose better variable names and keep all code in a method at the same level of abstraction to make is easy to read without comments.
We want to keep track of who changed what and when it was changed. Version control does this quite well (along with a ton of other benefits), and it only takes a few minutes to set up. Besides, does this ever work? (And how would you know?)
I wanted to keep a commented-out section of code there in case I need it again. Again, version control systems will keep the code in a prior revision for you - just go back and find it if you ever need it again. Unless you're commenting out the code temporarily to verify some behavior (or debug), I don't buy into this either. If it stays commented out, just remove it.
The code too complex to understand without comments. I used to think this case was a lot more common than it really is. But truthfully, it is extremely rare. Your code is probably just bad, and hard to understand. Re-write it so that's no longer the case.
Markers to easily find sections of code. I'll admit that sometimes I still do this. But I'm not proud of it. What's keeping us from making our files, classes, and functions more cohesive (and thus, likely to be smaller)? IDEs normally provide easy navigation to classes and methods, so there's really no need to scan for comments to identify an area you want to work in. Just keep the logical sections of your code small and cohesive, and you won't need these clutterful comments.
Natural language is easier to read than code. But it's not as precise. Besides, you're a programmer, you ought not have trouble reading programs. If you do, it's likely you haven't made it simple enough, and what you really think is that the code is too complex to understand without comments.
There are only four situations I can think of at the moment where I need to comment code:
In the styles of Javadoc, RubyDoc, et cetera for documenting APIs others will use.
In the off chance it really is that complex: For example, on a bioinformatics DNA search function that took 5 weeks to formulate and write out. That's how rare it is to have something complex enough to warrant comments.
TODOs, which should be the exception, not the rule
Explaining why the most obvious code wasn't written. (Design decisions)
In what other ways can you reduce clutter comments in your code? Or, if you prefer, feel free to tell me how I'm wrong. I often am, and I have a feeling this is one of those situations.
What are some other reasons you comment your code?
Posted by Sam on Jun 16, 2008 at 06:12 AM UTC - 5 hrs
Is there a perfect way to teach programming to would-be programmers? Let's ask the Magic 8-Ball.
More...
Outlook not so good.
Does that mean we shouldn't teach them? Of course not. Does it mean we shouldn't look for better methods of teaching them? Emphatically I say again, "of course not!"
And what of the learner? Should beginners seek to increase their level of skill?
Only if they want to become a level 20 Spaghetti Code Slingmancer (can you imagine the mess?). Or, that's how some make it seem.
All it means to me is that we shouldn't let our paranoia about the wrong ways of learning stop us from doing so. For instance, take this passage about the pitfalls of reading source code:
Source code is devoid of context. It's simply a miscellaneous block of instructions, often riddled with a fair bit of implicit assumptions about preconditions, postconditions, and where that code will fit in to the grand scheme of the original author's project. Lacking that information, one can't be sure that the code even does what the author wanted it to do! An experienced developer may be able to apply his insight and knowledge to the code and divine some utility from it ("code scavenging" is waxing in popularity and legitimacy, after all), but a beginner can't do that.
Josh also mentions that source code often lacks rationale behind bad code or what might be considered stupid decisions, and that copy and paste is no way to learn.
They're all valid points, but the conclusion is wrong.
Which one of us learned the craft without having read source code as a beginner? Even the author admits that he was taught that way:
Self-learning is what drives the desire to turn to source code as an educational conduit. I have no particular problem with self-learning -- I was entirely self-taught for almost three quarters of what would have been my high school career. But there are well-known dangers to that path, most notably the challenge of selecting appropriate sources of knowledge for a discipline when you are rather ill-informed about that selfsame discipline. The process must be undertaken with care. Pure source code offers no advantages and so many pitfalls that it is simply never a good choice.
This is a common method of teaching - "do as I say, not as I do." It's how we teach beginners anything, because their simple minds cannot grasp all the possible combinations of choices which lead to the actual Right Way to do something. It's a fine way to teach.
But I'd wager that for all X in S = {good programmers}, X started out as a beginner reading source code from other people. And X probably stumbled through the same growing pains we all stumble through, and wrote the same crapcode we all do.
Of course, there are many more bad programmers than good, so lets not make another wrong conclusion - that any method of learning will invariably produce good programmers.
Instead, let's acknowledge that programming is difficult as compared to many other pursuits, and that there's not going to be a perfect way to learn. Let's acknowledge that those who will become good programmers will do so with encouragement and constant learning. Instead of telling them how they should learn, let them learn in the ways that interest them, and let's guide them with the more beneficial ways when they are open to it.
Let's remember that learning is good, encourage it, and direct it when we can. But let people make mistakes.
Learning in the wrong manner will produce good programmers, bad programmers, and mediocre ones.
Independent, orthogonal, and irrelevant are all words that come to mind. The worst it will do is temporarily delay someone from reaching their desired level of skill.
I would be knowledgeable having read programming books with no practical experience. But I wouldn't have any understanding. Making mistakes is fundamental to understanding. Without doing so, we create a bunch of angry monkeys, all of whom "know" that taking the banana is "wrong," but none of whom know why.
Posted by Sam on Jun 04, 2008 at 09:01 AM UTC - 5 hrs
If you get too smart, you start to think a lot. And when you think a lot, your mind explores the depths of some scary places. If you're not careful, your head could explode.
More...
So to combat the effects of increasing intelligence due to reading books like The Mythical Man Month and Code Complete, I'm careful about maintaining a subscription to digg/programming in my feed reader. Incidentally, this tactic is also useful in preemptive head explosion. However, this second type of explosion is usually caused by asininity, as opposed to the combinatorial explosion due to choices you gain from reading something useful.
Ohloh, a company that ranks the nation's top open source coders, is opening its service to let other developers to track and rank their own teams. [Strong emphasis is mine.]
It's the latest move by Ohloh, a Bellevue, WA company that already distributes its coder profiles and related data to about 5,000 open source sites. The Ohloh profiles can serve as advertising for these sites, because the profiles show how active their open source development projects are.
Here's how it works. Ohloh ranks individual coders by tracking their activity. Ohloh can do this because open source projects publish their code, along with a record of updates each coder makes. Ohloh exploits this publicly available information and analyzes which coders are the most active in making key contributions to the most important open source projects. It assigns them a "KudoRank" to each coder between 1 (poor) through 10 (best).
Teams now have access to Ohcount - "a source code line counter" that "identifies source code files in most common programming languages, and prepares total counts of code and comments."
Unfortunately, since Ohcount helps power the normal Ohloh website, I'd bet it can track commits and lines of code by committer.
As is well known to many people, if you want something done, measure it. In this case, presumably you want more lines of code.
And what makes measuring lines of code per developer (and saying more == better) completely stupid is that program size is code's worst enemy. You'll end up doing the opposite of what you intended.
Still, Ohloh has some interesting stats for you to look at. And you know you want to be ranked #1.
Posted by Sam on Jun 02, 2008 at 07:57 AM UTC - 5 hrs
I don't like to have too many microposts on this blog, so I've decided to save them up and start
a Programming Quotables series. The idea is that I'll post quotes about programming that have one or more of the
following attributes:
I find funny
I find asinine
I find insightfully true
And stand on their own, with little to no comment needed
Here's the sixth in that series. I hope you enjoy them as much as I did:
More...
Ahhh... configuration. I sometimes think this is a misnomer. At least in the way that the Java and .NET community have approached config in practice. We've had this trend in which we started jamming everything into XML configuration.
So much so, we often get asked to provide XML to configure features I think ought to be set in code along with unit tests. We've turned XML into a programming language, and a crappy one at that. Ayende talks about one issue with sweeping piles of XML configuration under a tool. This is not an intractable problem, but it highlights the fact that XML is code, but it is code with a lot of ceremony compared to the amount of essence. To understand what I mean by ceremony vs essence read Ending Legacy Code In Our Lifetime.
I thought about this for a minute, and realized that implicit in the conversation are several assumptions (or perhaps more accurately, conventionally perceived "truths") with regard to the craft of software development.
1)"there isn't that much to it" = "software is really easy to write"
2)"We have everything important figured out" = "in a business, the actual software is just icing on the cake"
3)"All we need is a programmer" = "software developers are cogs in a machine, or interchangeable components of an assembly line"
But do I belong to the company I work for? No! Never!
If that means I'm doomed to walk the Earth for eternity writing code and building beautiful ideas, then that's ok.
No matter how much my job makes me happy, my family and my life outside work are just as important and more. Obsessing about anything is not good. Moderation is good. Do everything well but know when to stop. Do your job well but remember to go and hang out with your friends. Put down the mouse and call someone to go out. Liking your life outside work does not mean you suck at work. It means you are good at living.
Posted by Sam on May 30, 2008 at 07:42 AM UTC - 5 hrs
What do you do? What have you done?
According to Chad Fowler in this week's chapter of MJWTI, those are two of the worst questions someone can ask about you. Why? Because it means they don't already know.
You might as well move to the basement and spend your days mumbling about your red Swingline stapler. Get used to the idea that no one will notice you're missing.
More...
You were fired years ago. There was just a glitch in the payroll system that caused your paycheck to get printed anyway. Don't worry, they'll fix it soon.
Instead of being a "no-talent ass clown" that simply does what's expected of you, why not take some initiative and change some things for the better?
Can you automate TPS reports? (Can I reference anything else in that movie?)
Can you show them the light and get them to ditch ceremony in favor of essence when it makes sense?
IronRuby now runs unmodified Rails.
And JRuby's been on Rails for a while. There's Groovy and Grails and IronPython and ColdFusion and Jython and Scala and frameworks in the platform-anointed languages that relieve pain points in Java and .NET (while still running on them and integrating well!). It's true - they make Vicodin for the programmer's broken spirit. Chicken noodle soup for the coder's lost soul.
So why are you still always writing home-brew apps from scratch with all the pomp required by Her Majesty?
Can you sell essence to your boss and coworkers?
You can start unit testing. You can set up source control. You can tell people when you think their ideas are wrong, and fix other WTFs as you encounter them.
Then no one will need to ask what you do. They'll know.
Ruffled any feathers lately? Been a force for Good? Let's hear about it!
Posted by Sam on May 28, 2008 at 06:15 AM UTC - 5 hrs
In the field of bioinformatics, one way to measure similarities between two (or more) sequences of
DNA is to perform sequence alignment:
"a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may
be a consequence of functional, structural, or evolutionary relationships between the sequences."
Think of it this way: you've got two random strands of DNA - how do you know where one starts and one begins?
How do you know if they come from the same organism? A closely related pair? You might use sequence alignment
to see how the two strands could line up in relation to each other - subsequences may indicate similar
functionality, or conservation through evolution.
More...
In "normal" programming terms, you've got a couple of strings and want to find out how you might align them so they they look
as much like one another as possible.
There are plenty of ways to achieve that goal. Since we haven't done much programming on here lately,
I thought it would be nice to focus on two very similar algorithms that do so:
Needleman-Wunsch and
Smith-Waterman.
The first @substitution_matrix is fairly simplistic - give one point for each match, and ignore any mismatches or gaps introduced.
In @substitution_matrix2
what score should be given if "s" is aligned with "a"? (One.) What if "d" is aligned with another "d"? (Six.)
The substitution matrix is simply a table telling you how to score particular characters when they are in the same position in two
different strings.
After you've determined a scoring scheme, the algorithm starts scoring each pairwise alignment, adding to or
subtracting from the overall score to determine which alignment should be returned. It uses
dynamic programming, storing calculations
in a table to avoid re-computation, which allows it to reverse course after creating the table to find and return
the best alignment.
It feels strange to implement this
as a class, but I did it to make it clear how trivially easy it is to derive Smith-Waterman (SW) from Needleman-Wunsch (NW). One design that jumps out at me would be to have a SequenceAligner where you can choose which algorithm as a method to run - then SW could use a NW algorithm where min_score is passed as a parameter to the method. Perhaps you can think of something even better.
Anyway, here's the Ruby class that implements the Needleman-Wunsch algorithm.
classNeedlemanWunsch@min_score=nildefinitialize(a,b,substitution_matrix,gap_penalty)@a=a@b=b# convert to array if a/b were strings@a=a.split("")ifa.class==String@b=b.split("")ifb.class==String@sm=substitution_matrix@gp=gap_penaltyenddefget_best_alignmentconstruct_score_matrixreturnextract_best_alignment_from_score_matrixenddefconstruct_score_matrixreturnif@score_matrix!=nil#return if we've already calculated itinitialize_score_matrixtraverse_score_matrixdo|i,j|ifi==0&&j==0@score_matrix[0][0]=0elsifi==0#if this is a gap penalty square@score_matrix[0][j]=j*@gpelsifj==0#if this is a gap penalty square @score_matrix[i][0]=i*@gpelseup=@score_matrix[i-1][j]+@gpleft=@score_matrix[i][j-1]+@gp#@a and @b are off by 1 because we added cells for gaps in the matrixdiag=@score_matrix[i-1][j-1]+s(@a[i-1],@b[j-1])max,how=diag,"D"max,how=up,"U"ifup>maxmax,how=left,"L"ifleft>max@score_matrix[i][j]=max@score_matrix[i][j]=@min_scoreif@min_score!=nilandmax<@min_score@traceback_matrix[i][j]=howendendenddefextract_best_alignment_from_score_matrixi=@score_matrix.length-1j=@score_matrix[0].length-1left=Array.newtop=Array.newwhilei>0&&j>0if@traceback_matrix[i][j]=="D"left.push(@a[i-1])top.push(@b[j-1])i-=1j-=1elsif@traceback_matrix[i][j]=="L"left.push"-"top.push@b[j-1]j-=1elsif@traceback_matrix[i][j]=="U"left.push@a[i-1]top.push"-"i-=1elseputs"something strange happened"#this shouldn't happenendendreturnleft.join.upcase.reverse,top.join.upcase.reverseenddefprint_score_visualizationconstruct_score_matrixprint_as_table(@score_matrix)enddefprint_traceback_matrixconstruct_score_matrixprint_as_table(@traceback_matrix)enddefprint_as_table(the_matrix)putsputs"a="+@a.to_sputs"b="+@b.to_sputsprint" "@b.each_index{|elem|print" "+@b[elem].to_s}puts""traverse_score_matrixdo|i,j|ifj==0andi>0print@a[i-1]elsifj==0print" "endprint" "+the_matrix[i][j].to_sputs""ifj==the_matrix[i].length-1endenddeftraverse_score_matrix@score_matrix.each_indexdo|i|@score_matrix[i].each_indexdo|j|yield(i,j)endendenddefinitialize_score_matrix@score_matrix=Array.new(@a.length+1)@traceback_matrix=Array.new(@a.length+1)@score_matrix.each_indexdo|i|@score_matrix[i]=Array.new(@b.length+1)@traceback_matrix[i]=Array.new(@b.length+1)@traceback_matrix[0].each_index{|j|@traceback_matrix[0][j]="L"ifj!=0}end@traceback_matrix.each_index{|k|@traceback_matrix[k][0]="U"ifk!=0}@traceback_matrix[0][0]="f"enddefs(a,b)#check the score for bases a. b being alignedforiin0..(@sm.length-1)breakifa.downcase==@sm[i][0].downcaseendforjin0..(@sm.length-1)breakifb.downcase==@sm[0][j].downcaseendreturn@sm[i][j]endend
Needleman-Wunsch follows that path, and finds the best global alignment possible. Smith-Waterman truncates
all negative scores to 0, with the idea being that as the alignment score gets smaller, the local alignment
has come to an end. Thus, it's best to view it as a matrix, perhaps with some coloring to help you visualize
the local alignments.
All we really need to get Smith-Waterman from our implementation of Needleman-Wunsch above is this:
However, it would be nice to be able to get a visualization matrix. This matrix should be able to use windows
of pairs instead of
each and every pair, since there can be thousands or millions or billions of base pairs we're aligning. Let's add a couple of methods to that
effect:
#modify array class to include extract_submatrix methodclassArraydefextract_submatrix(row_range,col_range)self[row_range].transpose[col_range].transposeendendrequire'needleman-wunsch'classSmithWaterman<NeedlemanWunschdefinitialize(a,b,substitution_matrix,gap_penalty)@min_score=0super(a,b,substitution_matrix,gap_penalty)enddefprint_score_visualization(window_size=nil)returnsuper()ifwindow_size==nilconstruct_score_matrix#score_matrix base indexessi=1#windowed_matrix indexeswi=0windowed_matrix=initialize_windowed_matrix(window_size)#compute the windowswhile(si<@score_matrix.length)sj=1wj=0imax=si+window_size-1imax=@score_matrix.length-1ifimax>=@score_matrix.lengthwhile(sj<@score_matrix[0].length)jmax=sj+window_size-1jmax=@score_matrix[0].length-1ifjmax>=@score_matrix[0].lengthcurrent_window=@score_matrix.extract_submatrix(si..imax,sj..jmax)current_window_score=0current_window.flatten.each{|elem|current_window_score+=elem}beginwindowed_matrix[wi][wj]=current_window_scorerescueendwj+=1sj+=window_sizeendwi+=1si+=window_sizeend#find max score of windowed_matrixmax_score=0windowed_matrix.flatten.each{|elem|max_score=elemifelem>max_score}max_score+=1#so the max normalized score will be 9 and line up properly #normalize the windowed matrix to have scores 0-9 relative to percent of max_scorewindowed_matrix.each_indexdo|i|windowed_matrix[i].each_indexdo|j|beginnormalized_score=windowed_matrix[i][j].to_f/max_score*10windowed_matrix[i][j]=normalized_score.to_irescueendendend#print the windowed matrixwindowed_matrix.each_indexdo|i|windowed_matrix[i].each_indexdo|j|printwindowed_matrix[i][j].to_sendputsendenddefinitialize_windowed_matrix(window_size)windowed_matrix=Array.new(((@a.length+1).to_f)/window_size)windowed_matrix.each_indexdo|i|windowed_matrix[i]=Array.new(((@b.length+1).to_f)/window_size)endreturnwindowed_matrixendend
And now we'll try it out. First, we take two sequences and perform a DNA dotplot analysis on them:
Then, we can take our own visualization, do a search and replace to colorize the results by score, and have a look:
Lo and behold, they look quite similar!
I understand the algorithms are a bit complex and not particularly well explained, so I invite questions about
them. And as always, comments and (constructive) criticisms are encouraged as well.
Posted by Sam on May 07, 2008 at 06:21 AM UTC - 5 hrs
Dave Mark raises some interesting questions about artificial intelligence in games over at AIGameDev.com. First, he explains that although we're seeing more and better AI in games, a common complaint heard from gamers runs along the lines of "why can't they combine such and such AI feature from game X in game Y." Then, Dave poses the questions for developers to answer:
We can only cite limited technological resources for so long.
...
Perhaps, from a non-business standpoint... that of simply an AI developer, we should be asking ourselves what the challenges are in bringing all the top AI techniques together into the massive game environments that are so en vogue. What is the bottleneck? Is it money? Time? Hardware? Technology? Unwillingness? Unimaginativeness? A belief that those features are not wanted by the gamer? Or is it simply fear on the part of AI programmers to undertake those steps necessary to put that much life into such a massive world?
Let me first admit that I'd wager Dave Mark knows a lot more about this stuff than me. That's how he makes a living, after all. My experience in developing game AI comes from choose your-own-adventure text-based games as a kid (where the algorithm was very deterministic, with few options), making villagers walk around in Jamaicanmon!,
More...
and making spaceships run away from you instead of seeking you out in Nebulus: Ozone Riders.
I even asked Glitch, Wes, and Jonathan (teammates on the project) to remind me of some simple vector math and draw it out on the wet erase board for Nebulus. And I still made them go the wrong direction (which ended up being pretty cool, actually).
In other words, I haven't had much experience with AI as it's typically implemented in games, and what little experience I have had is limited to things I wouldn't (personally) classify as AI to begin with.
Still, I have had some experience in what I might call "classical" AI (perhaps "academic" is a better term).
Stuart Russell and Peter Norvig wrote the Microsoft of books on Artificial Intelligence (90% market share for AI textbooks), and I've read through a fair bit of it. I've implemented a couple of algorithms, but mostly I've just been introduced to concepts and skimmed the algorithms. In addition, I've been through Ethem Alpaydin's book, Introduction to Machine Learning, which would have comparatively fewer ideas applicable to games.
I guess what I'm trying to say is: although I have some knowledge of AI, consider the fact that Dave's experience dwarfs my own when I disagree with him here: It is precisely the fact that we don't have enough processing power that gets in the way of more realistic AI in our games. Or, put more accurately, the problems we're trying to solve are intractable, and we've yet to find ways to fake all of them.
I'm not convinced you can separate the failures of AI from the intractability of the problems, or the inability to design and implement a machine to run nondeterministic algorithms for those problems in NP.
Compared to deciding how to act in life, deciding how to act in Chess is infinitely more simple: there are a finite set of states, and if you had the time, you could plot each of the states and decide which move gets you the "closest" (for some definition of close) to where you'd like to be N moves from the decision point. Ideally N would get you to the end, but even for Chess, our ability is limited to look ahead only a small number of moves. Luckily for Deep Blue (and others), it turns out that's enough to beat the best humans in the world.
Even though Chess is a complex problem whose number of game states prevent us from modeling the entire thing and deciding it, we can cheat and model fewer states - when we can make an informed decision that a particular path of the decision tree will not be followed, we can forgo computation of those nodes. Still yet, the problem will be huge.
There are other ways of "faking" answers to the AI problems that face us in development. Approximation algorithms can get us close to the optimal solutions - but not necessarily all the way there. In these cases, we might notice the computer doing stupid things. We can tweak it - and we can make special case after special case when we notice undesired behavior. But we are limited in the balance between human-like and rational. Humans are not rational, but our programs (often) are made to be.
Presumably, they give the policeman in GTA 4 a series of inputs and a decision mechanism, and he's thinking purely rationally: so sometimes the most rational thing for him to do based on the decision model we gave him is to run around the front of the car when he's being shot at. Sometimes he jumps off buildings. (Video via Sharing the Sandbox: Can We Improve on GTA's Playmates?)
It may not be smart, but it is rational given the model we programmed.
You can make the decision model more complex, or you can program special cases. At some point, development time runs dry or computational complexity gets too high. Either way, game AI sucks because the problems we're trying to solve have huge lower bounds for time and space complexity, and that requires us to hack around it given the time and equipment we have available. The problems usually win that battle.
Game AI has come a long way since Pacman's cologne (or maybe he just stunk), and it will get better, especially as we move more gaming super-powerful servers. Still, it's far from ideal at the moment.
What do you have to say? (Especially the gamer geeks: programmers or players)
Posted by Sam on May 05, 2008 at 02:56 PM UTC - 5 hrs
Since I do a lot of maintenance work, I get to see a lot of crapcode. Even better, I get to work in it. It's discouraging that I wrote a lot of it.
The smell isn't pleasant, but opportunities to do good things are abundant. Thus, it's easy to do something to beautify the code, to leave it in a better state as a result of refactoring. Moments where you think, "hey, that's cool" are anything but rare.
Ok, that's the positive spin. While I'm making my way through the muck, people within earshot (or just the building in general) will hear expletive after WTFing expletive. Usually, it's emanating from my general direction.
More...
But when I'm done, I get to look back on it with a sense of accomplishment. It's an accomplishment that goes above and beyond the typical new feature or bug fix. That's what I'm doing now.
I was just working on an ordering system where the calculations in one file came out to be a couple of thousand lines of code. It was taking forever - like hacking through jungle with nothing but a dull machete. Most of the small changes I was making resulted in new bugs being introduced to the system. As the number of bugs increased, it started to feel like a jungle. After a while I gave up.
I gave up on my quick-n-dirty add-another-hack-to-a-bunch-of-hacks approach and decided it was time to do something positive for myself, the code, and future programmers who have to touch it.
Extract method (or in this case, extract-template) came to the rescue. Just by extracting related bits of functionality I was able to get the file into manageable, largely cohesive chunks. I didn't go to the extreme: I stopped when it was good enough, touching only those parts I was prepared to fix if I broke them, which meant limiting it to what I was doing there in the first place.
The code isn't incredible, but it's in better shape than it was before I went in. Now, the main calculation file is about 1200 lines, and the other 800 are broken out into different, cohesive files. Forget about reuse, this was just for literacy.
Posted by Sam on Apr 28, 2008 at 08:29 AM UTC - 5 hrs
When I was younger I was "an arrogant know-it-all prick" at one point in the "middle years" of my programming experience, as many of you know from the stories I often relate on this weblog.
The phrase "middle years" doesn't give us a frame of reference for my age though. For instance, if I were 50 years old right now, my "middle years" of programming may have been when I was in my thirties. That's not the case, and I want to give you that frame of reference: I'm 28 at the time of this writing. The middle years as I talked about them would have referred to my late teens to early twenties. Maybe even up to the the middle of my twenties.
More...
By most standards, that's young.
And I know a thing or two about being set in your ways. We can all see the laugh I have at myself with the title here being "My Secret Life as a Spaghetti Coder" and some of the stories I've told as well.
In fact, let me add to the wealth of stodginess, idiocy, and all around opposite-of-good-developerness here:
I once said I preferred Windows to Linux. While that's not a completely shocking statement, the reason behind it was: I said I preferred Windows because 14 year olds work on Linux. Not because of any experience I'd had with it, but because of my fear of learning it.
Because of my prior experience being unwilling to learn, I was quite interested when I read this:
When you are young, you don't have that sense of self to protect. You're driven by a need to find out who you are, to turn the pages of your biography and see how the story turns out. If people around you are doing something you don't understand, you assume the problem is your inexperience and you go to work trying to understand it.
But when you are old, when you know who you are, everything is different. When people around you are doing something you don't understand, you have no trouble at all explaining why they are assholes mistaken.
. . .
If you want a new idea, you have to silence your inner critic. Your sense of right and wrong, of smart and stupid works by comparing new ideas to what you already know. Your sense of what would be a good fit for you works by comparing new things to who you already are. To learn and grow, you must let go of you, you must be young again, you must accept that you don't understand and seek to understand rather than explaining why it doesn't make any sense.
In a couple of paragraphs, Reg sums up almost precisely some of what I've been thinking and writing about for the last several months. He's so close, but misses a fundamental point: the old and young parts are incidental.
My hypothesis is that the level of learning and idea absorption you can attain has little to do with age. Instead, it is influenced more by your perceived level of experience. Normally, age is highly correlated to experience - but it doesn't have to be. In my case, when I was younger I thought I knew everything. Now that I've aged, I came to the realization I know very little.
My conclusion is not that different from Reg's, and this is not some scientific experimental contest, so let me explain why I feel the difference is worth noting: If we blame our reluctance to try new things on age, we are dooming ourselves to think of it as some unchangeable, deterministic process. By thinking of it in terms of perception of experience, we admit to being able to control it with more ease. (My belief is that we have control over what and how we perceive things.)
In other words, we lose our ability to blame anyone but ourselves. That's a powerful motivator sometimes.
Thoughts? Disagreements? Please be kind enough to let me know.
Posted by Sam on Apr 16, 2008 at 07:22 AM UTC - 5 hrs
I don't like to have too many microposts on this blog, so I've decided to save them up and start
a Programming Quotables series. The idea is that I'll post quotes about programming that have one or more of the
following attributes:
I find funny
I find asinine
I find insightfully true
And stand on their own, with little to no comment needed
Here's the fifth in that series. I hope you enjoy them as much as I did:
More...
At this stage, if you've heard of Rails and you haven't converted, it's entirely possible that you never will. It's also entirely possible that anybody who still isn't even taking Rails seriously by this point might just be some kind of idiot.
...
Every programmer should also read Chad Fowler's "My Job Went To India" book, where he explains that as larger and larger numbers of programmers adopt a particular skill, that skill becomes more and more a commodity. Rails development becoming a commodity is really not in the economic interest of any Rails developer. This is especially the case because programming skill is very difficult to measure, which - according to the same economics which govern lemons and used-car markets - means that the average price of programmers in any given market is more a reflection of the worst programmers in that market than the best. An influx of programmers drives your rates down, and an influx of incompetent programmers drives your rates way the fuck down.
Instead, I want to talk about my first attempt at solving the puzzle, which was an utter failure. A glorious, spectacular failure. Perhaps the single most impressive failure of my career. Failures are often much more interesting than successes, but for some unfathomable reason, people are often reluctant to discuss them.
Me: read file blah.txt and display it on system output
Java: How should I name the class?
Me: Test
Java: How should I handle errors?
Me: I don't care right now, I just need to display that data to system output
Java: But I need to know this, what if something unexpected happens?!
Me: I just want to make a prototype damn it!
Java: Sorry, can't do it.
Me: Ok, do nothing on error.
Java: And which implementation of Stream class should I use for reading?
Everyone knows that diversification is the key to managing financial risk, but few people seem to apply this principal to their professional careers. Most developer shops are relatively limited when it comes to the number of technologies and problem domains they deal with. If you want to diversify your resume without job hopping every year, then it makes sense to actively seek out technology experiences that are different from the ones you use in your day job.
Neal Ford and others have been talking about the distinction between dynamic and static typing as being incorrect. The real question is between essence and ceremony. Java is a ceremonious language because it needs you to do several dances to the rain gods to declare even the simplest form of method. In an essential language you will say what you need to say, but nothing else. This is one of the reasons dynamic languages and type inferenced static languages sometimes look quite alike - it's the absence of ceremony that people react to.
It started with Larry quoting himself on praising extreme programming, and mentioning 110 thousand lines of test code to 30 thousand lines of application code, with the application having been developed in Python. Alan Holub took that as an indictment of dynamic languages, with Larry quoting him as saying:
More...
I want to take exception to the notion that Python is adequate for a real programming project. The fact that 30K lines of code took 110K lines of tests is a real indictment of the language. My guess is that a significant portion of those tests are addressing potential errors that the compiler would have found in C# or Java. Moreover, all of those unnecessary tests take a lot of time to write, time that could have been spent working on the application.
In fact, many people were shocked at the amount of tests compared to code, and that's what the discussion (at least the part I was interested in) centered around. Four times as much test code as application logic is too much. It would shackle you, instilling fear in your heart and soul. No changes would ever be made with that kind of viscosity. Furthermore, tests can provide a false sense of security, a blankie, if you will.
You've got to be kidding. Having a test that will tell you when you broke existing functionality is pressure to avoid changes? To me, that's liberating!
Contrast that with not having a test to tell you when something broke. Does it even make sense to say having tests pressures you to avoid changes? Only if you fear having a program that works over having one that you think works.
Let me try a different approach. Take the following simple program:
if (someCondition is true)
do something
else
something else
if (anotherCondition is true)
do another thing
There are four execution paths: One where both someCondition and anotherCondition are true, one where they are both false, and one each where one is true and the other isn't.
In other words, we have six lines of code and at least four tests we should write to cover all the cases. If each test is just a single line, we still need to write the method names and end lines, so that would give us three lines per test - for a total of twelve lines of test code.
The test code size is already double the number of lines in our application code for this simple, six line program with four execution paths.
How many execution paths are in a 30 thousand line program?
Seeing as the number of execution paths in code is more likely to grow exponentially than linearly with each new line that gets written, 110 thousand lines of test code isn't actually all that much.
Further, the solution to the blankie problem is not "have fewer tests," it is to recognize that passing the tests is a necessary - not sufficient - condition of working software.
Following the blankie argument to its logical conclusion - that fewer tests mean you write better code because you are more careful - we should have no tests.
In fact part of the reason we want the tests is that we can make changes to the code with less fear of unknowingly breaking existing functionality or introducing defects in the software.
In the end, if someone is using the tests as a tool to do wrong, there is something wrong with the person, not the test. They will find another way to do wrong, even if we remove the tests from their arsenal.
Posted by Sam on Apr 09, 2008 at 07:34 AM UTC - 5 hrs
Something's been bothering me lately. It's nothing, really. ?, ?, null, nil, or whatever you want to call it. I think we've got it backwards in many cases. Many languages like to throw errors when you try to use nothing as if it were something else - even if it's nothing fancy.
I think a better default behavior would be to do nothing - at most log an error somewhere, or allow us a setting - just stop acting as if the world came to an end because I *gasp* tried to use null as if it were a normal value.
In fact, just because it's nothing, doesn't mean it can't be something. It is something - a concept at the minimum. And there's nothing stopping us from having an object that represents the concept of nothing.
More...
Exceptions should be thrown when something exceptional happens. Maybe encountering a null value was at some time, exceptional. But in today's world of databases, form fields, and integrated disparate systems, you don't know where your data is at or where it's coming from - and encountering null is the rule, not the exception.
Expecting me to write paranoid code and add a check for null to avoid every branch of code where it might occur is ludicrous. There's no reason some sensible default behavior can't be chosen for null, and if I really need something exceptional to happen, I can check for it.
Really, aren't you sick of writing code like this:
string default = "";
if(form["field"] != null and boolFromDBSaysSetIt != null
and boolFromDBSaysSetIt)
default = form["field"];
when you could be writing code like this:
if(boolFromDBSaysSetIt)
default = form["field"];
I think this is especially horrid for conditional checks. When I write if(someCase) it's really just shorthand for if(someCase eq true). So why, when
someCase is null or not a boolean should it cause an error? It's not
true, so move on - don't throw an error.
Someone tell me I'm wrong. It feels like I should be wrong. But it feels worse to have the default path always be the one of most resistance.
Posted by Sam on Apr 07, 2008 at 07:46 AM UTC - 5 hrs
Lately I've been thinking about which charit[y|ies] I'd like to endow with $100 million dollars when I make my first billion. I know that sounds stingy, but considering the tax comes out first, that billion shrinks rather quickly.
Before I continue, I want to make it absolutely clear that I'm not endorsing any of the following charities, and I have not researched how well they do their purported missions, so they could be frauds for all I know. I just want to discuss the ideas.
More...
Naturally, I wanted to look for computer-related charities, and more specifically, those with a focus on programming. I first browsed a couple ofcharity-ranking websites and didn't find anything that I was searching for.
Everyone knows about One Laptop Per Child, whose mission is to educate children in developing nations, who otherwise wouldn't have as much of an education, by providing them with a low-cost, low-energy required laptop.
OLPC is not, at heart, a technology program, nor is the XO a product in any conventional sense of the word. OLPC is a non-profit organization providing a means to an end - an end that sees children in even the most remote regions of the globe being given the opportunity to tap into their own potential, to be exposed to a whole world of ideas, and to contribute to a more productive and saner world community.
On a bit of a smaller scale, I found Computers With Causes, which like so many charities who will let you donate your car, take your donated computer and turn it into good for charitable purposes. Mac Heist puts on a two week bonanza of a sale and donates proceeds to the charity you choose.
That's a start, but we could do better.
There is another group, charityfocus, which lets you volunteer your time to help build websites for different charities. That certainly sounds interesting, and more to my point - but it's not quite there.
These are all noble goals, but I'm more interested in a cause that's closer to home, one where technology is not ancillary, but where it is part of the goal. So, I thought I'd do a domain search and start one myself. One of the domains I entered was code4cause.org, but it turns out they're already doing some good work. They don't teach children to program, but they do take on your IT projects and donate proceeds to charities.
I was surprised I wasn't able to find more than what I did.
Are programmers that selfish?
Of course not. We donate a lot of our time to open source, for one thing, and I'm sure there are plenty of us who have our favorite causes that aren't computer-related at all.
I like Code4Cause's mission - that would let programmers donate their time to projects and convert that into money to send to charity. But, Code4Cause is based in Europe and Asia, and I'm more interested in something closer to home (which for me, is the United States). What are U.S. software developers doing?
I don't know, but I wouldn't mind seeing something like Code4Cause in the US. Or, at least it would be nice if we could donate the money our time produces to whatever organization we chose. Ideally, I'd like to see something where programming is more of the point, but I'm not sure how it would work. Even if coding remained auxillary, something like Code4Cause would be still be great.
Anyway, where are the coding organizations? Do you know of any I haven't listed? Share them below. Interested in trying to start one yourself? Let me know privately, and if there's enough interest, maybe we can figure out how to start an organization, and what we'd like it to do.
Sorry I don't have any more answers - I'm just fleshing these thoughts out, and throwing them out there to see if it helps.
Posted by Sam on Apr 02, 2008 at 02:10 PM UTC - 5 hrs
When I wrote about things I'd like to have in a job, I didn't expect that one of the items on my list would draw the kind of reaction it did. A couple of comments seemed to think I'm off my rocker about personal project time:
More...
Why should I pay for the time you spend doing your own projects? You are free to have 20% of the time to yourself if you can take 20% pay cut (from Adedeji O.)
Did I understand you correct? You want to work on something that has no relation to your employer and still get paid by him?
...
I don't know the exact rates in the US, but, these 20% will easily sum up to more than 10,000 USD per developer per year. Would YOU really pay that for nothing?
(from Christoph S.)
To be fair, Christoph did say that he's "not talking about time for personal and technical development and not about work related projects that MAY include benefit for your employer. I'm talking about working on 'my projects' and 'open source projects.'"
Here's what I had said that provoked the reactions:
I like the idea of Google's 20% time. I have other projects I'd like to work on, and I'd like the chance to do that on the job. One thing I'd like to see though, is that the 20% time is enforced: every Friday you're supposed to work on a project other than your current project. It'd be nice to know I'm not looked down upon by management as selfish because I chose to work on my project instead of theirs.
I wouldn't mind seeing something in writing about having a share of the profits it generates. I don't want to be working on this at home and have to worry about who owns the IP. And part of that should allow me to work on open source projects where the company won't retain any rights on the code I produce.
Calling it "my project," talking about who owns the intellectual property, and working on open source appear to be where I crossed the line. At Google, we see things like News and GMail coming from the personal time. What I stated could mean that a programmer for a bank's website that's done in ColdFusion could end up working on a GUI for Linux in C. It's rather hard to make the connection there.
First, I didn't mean to imply that the company would not own anything of any part of what I worked on. If I am willing to take a 20% cut in pay, then certainly I wouldn't expect them to own anything. What I was looking for is some equitable way to share what I create with my time. For example, I might take one day a week at work to build my new widget, but if I'm taking 2 days on the weekend to work on it, I ought to get more than a "thank you" if the project goes on to make hundreds of millions of dollars.
And, I was just saying there needs to be some way to allow me to work on open source software during that time. I don't know how the details would work, but there surely is an equitable way to deal with it. I certainly could have explained it better.
In any case, I don't expect that 20% of my time spent away from my main project translates to the project taking 20% longer, or 20% less profit or revenue or productivity for my company. The degradation may be linear if we are data entry clerks - if all we are doing is typing instructions given to us from on high (if the work is physical). But that's not typically what programming is about.
I'm not running a company though, and I've not done a study about such companies. If I were to do suc