My Secret Life as a Spaghetti Coder
home | about | contact | privacy statement
A little while ago I was trying to think of ways to have a program compare strings and discover patterns among them (as opposed to knowing patterns and looking for particular ones).

Over the years, I've learned about a lot of algorithms, but there's no way I could recall all of them. I knew I'd probably need to look at artificial intelligence, or more specifically, machine learning. But that's all I had to go on.

At the time, I decided it would be helpful to have a list of algorithms and data structures with short descriptions to browse and jog my memory.

Most of the problems you'll solve in your programming career don't require a lot of thought to arrive at a correct solution. But algorithms, data structures, and approaches to problems aren't just limited to the realm of programming. Reg Braithwaite reminds us of another reason to have these things at your disposal - even the problem of determining who to hire can be reduced to Naïve Bayes Classification.

And when you have those problems where there is no human solution (how can I discover patterns in several strings which may have hundreds of characters?), or the computer solution takes too long to find the optimal one where good enough will do, or there just isn't necessarily a right answer -- those are the hard ones where you aren't likely to stumble upon an answer -- where do you turn?

It turns out, a lot of problems can be reduced to others we already know how to solve. In fact, the basis of proving complexity class for an algorithm utilizes that: reduction of one problem to another will prove that if you solve the first one, you can solve the second one, and it will be just as complex. A famous example is SAT.

I haven't yet compiled the list I spoke of above, but luckily for all of us, Wikipedia has a good starting point. It's missing a couple that stand out in my mind (or that have a different name I didn't look for, or multiple classifications and it didn't make it to the one I looked at), but that's just something I can put on my to-do list to improve. The Machine Learning category, for instance, seems fairly light.

So just browsing a list and short description of algorithms may enlighten you as to how you can reduce your problem to one that's already been solved. If you can do that, you've got a solution from someone who's probably much smarter than you are. It's as if you have Donald Knuth and the rest of computer science academia on your team, and you don't even have to pay them (except, perhaps by buying their book, or subscribing to a journal that will disseminate their knowledge).

Thoughts?

Hey! Why don't you make your life easier and subscribe to the full post or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!


Comments
Leave a comment

Good list. The data structures list off that page is good. It mentions suffix trees, a more sophisticated [I don't understand them yet!] version of the suffix arrays described in "Programming Pearls", excellent for finding commun substrings in a string. For odd (less common) forms of similarity look at "Programming Collaborative Intelligence" by Toby Segaran.
Another way to get a similarity metric is a co-occurrence matrix as per Haralick criteria in texture analysis. I've found that one a quick way to detect similar files, comparing 256*256 numbers, rather than N megabytes.

Posted by hgs on Feb 20, 2008 at 09:33 AM UTC - 6 hrs

I considered using simple diff (or Longest Common Subsequence) for my problem at hand, but I haven't been in that direction yet because we actually knew more about the pattern than I originally thought, so now we can generate all such patterns that fit a pattern-template (? I don't know the word). You could say out of all possible patterns, we are narrowed down to a finite subset of that.

I haven't seen Haralick's, matrix, but we immediately ruled out other distance matrices because they would just be entirely too slow and a bit overfitting.

Always glad to have you point me at new things though - I will check out the ones you mentioned at some point just to familiarize myself with them. Thanks!

Posted by Sammy Larbi on Feb 20, 2008 at 02:40 PM UTC - 6 hrs

Leave a comment

Leave this field empty
Your Name
Email (not displayed, more info?)
Website

Comment:

Subcribe to this comment thread
Remember my details
Google
Web CodeOdor.com

Me
Picture of me

Topics
.NET (19)
AI/Machine Learning (14)
Answers To 100 Interview Questions (10)
Bioinformatics (2)
Business (1)
C and Cplusplus (6)
cfrails (22)
ColdFusion (78)
Customer Relations (15)
Databases (3)
DRY (18)
DSLs (11)
Future Tech (5)
Games (5)
Groovy/Grails (8)
Hardware (1)
IDEs (9)
Java (38)
JavaScript (4)
Linux (2)
Lisp (1)
Mac OS (4)
Management (15)
MediaServerX (1)
Miscellany (76)
OOAD (37)
Productivity (11)
Programming (168)
Programming Quotables (9)
Rails (31)
Ruby (67)
Save Your Job (58)
scriptaGulous (4)
Software Development Process (23)
TDD (41)
TDDing xorblog (6)
Tools (5)
Web Development (8)
Windows (1)
With (1)
YAGNI (10)

Resources
Agile Manifesto & Principles
Principles Of OOD
ColdFusion
CFUnit
Ruby
Ruby on Rails
JUnit



RSS 2.0: Full Post | Short Blurb
Subscribe by email:

Delivered by FeedBurner