Posted by Sam on May 07, 2008 at 12:00 AM UTC - 5 hrs
Dave Mark raises some interesting questions about artificial intelligence in games over at AIGameDev.com. First, he explains that although we're seeing more and better AI in games, a common complaint heard from gamers runs along the lines of "why can't they combine such and such AI feature from game X in game Y." Then, Dave poses the questions for developers to answer:
We can only cite limited technological resources for so long.
...
Perhaps, from a non-business standpoint... that of simply an AI developer, we should be asking ourselves what the challenges are in bringing all the top AI techniques together into the massive game environments that are so en vogue. What is the bottleneck? Is it money? Time? Hardware? Technology? Unwillingness? Unimaginativeness? A belief that those features are not wanted by the gamer? Or is it simply fear on the part of AI programmers to undertake those steps necessary to put that much life into such a massive world?
Let me first admit that I'd wager Dave Mark knows a lot more about this stuff than me. That's how he makes a living, after all. My experience in developing game AI comes from choose your-own-adventure text-based games as a kid (where the algorithm was very deterministic, with few options), making villagers walk around in Jamaicanmon!,
and making spaceships run away from you instead of seeking you out in Nebulus: Ozone Riders.
I even asked Glitch, Wes, and Jonathan (teammates on the project) to remind me of some simple vector math and draw it out on the wet erase board for Nebulus. And I still made them go the wrong direction (which ended up being pretty cool, actually).
In other words, I haven't had much experience with AI as it's typically implemented in games, and what little experience I have had is limited to things I wouldn't (personally) classify as AI to begin with.
Still, I have had some experience in what I might call "classical" AI (perhaps "academic" is a better term).
Stuart Russell and Peter Norvig wrote the Microsoft of books on Artificial Intelligence (90% market share for AI textbooks), and I've read through a fair bit of it. I've implemented a couple of algorithms, but mostly I've just been introduced to concepts and skimmed the algorithms. In addition, I've been through Ethem Alpaydin's book, Introduction to Machine Learning, which would have comparatively fewer ideas applicable to games.
I guess what I'm trying to say is: although I have some knowledge of AI, consider the fact that Dave's experience dwarfs my own when I disagree with him here: It is precisely the fact that we don't have enough processing power that gets in the way of more realistic AI in our games. Or, put more accurately, the problems we're trying to solve are intractable, and we've yet to find ways to fake all of them.
I'm not convinced you can separate the failures of AI from the intractability of the problems, or the inability to design and implement a machine to run nondeterministic algorithms for those problems in NP.
Compared to deciding how to act in life, deciding how to act in Chess is infinitely more simple: there are a finite set of states, and if you had the time, you could plot each of the states and decide which move gets you the "closest" (for some definition of close) to where you'd like to be N moves from the decision point. Ideally N would get you to the end, but even for Chess, our ability is limited to look ahead only a small number of moves. Luckily for Deep Blue (and others), it turns out that's enough to beat the best humans in the world.
Even though Chess is a complex problem whose number of game states prevent us from modeling the entire thing and deciding it, we can cheat and model fewer states - when we can make an informed decision that a particular path of the decision tree will not be followed, we can forgo computation of those nodes. Still yet, the problem will be huge.
There are other ways of "faking" answers to the AI problems that face us in development. Approximation algorithms can get us close to the optimal solutions - but not necessarily all the way there. In these cases, we might notice the computer doing stupid things. We can tweak it - and we can make special case after special case when we notice undesired behavior. But we are limited in the balance between human-like and rational. Humans are not rational, but our programs (often) are made to be.
Presumably, they give the policeman in GTA 4 a series of inputs and a decision mechanism, and he's thinking purely rationally: so sometimes the most rational thing for him to do based on the decision model we gave him is to run around the front of the car when he's being shot at. Sometimes he jumps off buildings. (Video via Sharing the Sandbox: Can We Improve on GTA's Playmates?)
It may not be smart, but it is rational given the model we programmed.
You can make the decision model more complex, or you can program special cases. At some point, development time runs dry or computational complexity gets too high. Either way, game AI sucks because the problems we're trying to solve have huge lower bounds for time and space complexity, and that requires us to hack around it given the time and equipment we have available. The problems usually win that battle.
Game AI has come a long way since Pacman's cologne (or maybe he just stunk), and it will get better, especially as we move more gaming super-powerful servers. Still, it's far from ideal at the moment.
What do you have to say? (Especially the gamer geeks: programmers or players)
Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!
Posted by Sam on Oct 17, 2011 at 03:20 PM UTC - 5 hrs
It's a small step, but emcee-3PO can now identify the
staves in an image of sheet music for my single test case of "My Darling Clementine." I need to include
hundreds more test cases, and I plan to when I implement code to make the tests mark the sheet music
with what emcee3po detected so I can visually inspect the accuracy.
Do a y-projection on the image.
A projection just reduces the number of dimensions in an image. In this case, we just take
the number of dark-colored pixels in a row of the image. It's similar in theory to
3D projection, but instead of projecting
three dimensions onto a plane, we're projecting a plane onto a line.
I used a threshold of 50% to determine if a pixel was dark enough to include in the projection.
So, if R+G+B < (FF+FF+FF) / 2, I count the pixel as dark.
Find the local maxima.
We want to find the places where the number of dark pixels in a row is highest - those will indicate the horizontal
lines on the staff. To do that, we find all the places where the number of pixels stops growing and starts getting smaller -- or where the
slope changes from positive to negative. To ignore noise, we set a threshold as Fujinaga suggests at
the average of each row, so we don't include anything less than that in our collection of local maxima.
Find the tightest groups of 5.
We want to find all the places where 5 local maxima are the smallest distance apart, which should indicate
the 5 lines in a staff. This part is accomplished by examining each 5-element window in the array of
local maxima, and finding the one with the smallest distance between its points. Then you can remove
all the windows that include any of those points, and continue until there are no more windows.
Expand those indexes to collect the places where the notes fall outside the staff lines.
I don't remember Fujinaga mentioning this in the paper I linked to above, but I'm thinking it must be in there.
Essentially, since the local maxima get us only what's in between the 5 lines of the staff, we need
to expand it a bit so we can get the notes that don't fall directly between the 5 lines. Right now,
I've used 1/4 of the average of the rows in the projection, but I think it will need to be
an even smaller threshold because I'm still not reliably getting all of the notes.
Up next: reading the notes on the staves. That's going to be cool.
Posted by Sam on Sep 15, 2008 at 12:00 AM UTC - 5 hrs
Even though they have a special hero(ish) status, it's a popular pastime (some might say cliché) to complain about medical doctors making so much money when nurses and other supporting cast in the medical industry "do all the work."
The justification for paying out the ass for just a few minutes of your proctologist's time is that they need to pay their insane educational loans. And don't forget about how hard they worked in school that long just to learn their asscraft.
"Still," we always say, "do we really need the doctor? My nurse did everything I needed done, and I could have told you I have strep throat and needed some pennicilin." Even if we don't trust the medical support staff to make the decisions, we have ways of making those decisions without a doctor.
As medical expert systems become better, we might expect the doctor to become obsolete. From my vantage point, I'm unable to see what about a doctor could be better than inputting a list of symptoms to a machine and getting back most likely diagnoses, which could ask questions to further refine the results. It could even consult a list of prescriptions (drugs, therapies, surgeries, et cetera), cross reference it with your medical profile (or DNA, when we have medicines tailored to individual genomes), and give you advice on what to do next.
Some people might even think an application like that would be better than human doctors.
Under such a system, it's unlikely all human doctors would become obsolete. For example, we'll always have a need for the maverick hacker doctor who can think outside the box.
But in most cases, most doctors are going to continue to diagnose the same diseases and prescribe the same (potentially biased) treatments to each patient who comes in. It's a factory, and we're on the conveyor belts.
Of course, even if we were to have the capability to replace most doctors (my belief is that we do), I don't think most people would feel comfortable consulting a machine about their problems. Dr. Sbaitso only gets us so far. We want the comfort of another human telling us what's wrong with our health, not a heartless machine.
Still, I think it would be interesting to see the results of large scale experiments pitting man vs. machine in the field of medicine. How much more successful is one over the other? Is that success due only to non-life-threatening conditions? Would that benefit to society be outweighed if one failed more often on the catastrophic problems of a few individuals?
I'm not a doctor, and I can't tell you exactly what they bring to the table. Could be something I've completely overlooked, or something we're unlikely to know unless we're in that industry. But that's how it looks to me. I'd like to confirm or disprove that hypothesis with a test.
Posted by Sam on Nov 14, 2007 at 08:30 AM UTC - 5 hrs
The Turing Test was designed to test the ability of
a computer program to demonstrate intelligence.
(Here is Alan Turing's proposal of it.)
It is often described as so: if a computer can fool a person into believing it too is a person, the computer
has passed the test and demonstrated intelligence. That view is a simplified version.
Quoth Wikipedia about the rules:
A human judge engages in a natural language conversation with one human and one machine,
each of which try to appear human; if the judge cannot reliably tell which is which, then the machine is
said to pass the test.
Specifically, the test should be run multiple times and if the judge cannot decipher which respondent
is the human about 50% of the time, you might say he cannot tell the difference, and therefore, the
machine has demonstrated intelligence and the ability to hold a conversation with a human.
I suppose the idea of poker bots passing the test comes about because (presumably) the human players at
the table don't realize they are playing against a computer. But if that is the case,
even a losing bot would qualify - human players may think the bot player is just an idiot.
More...
More interesting is his idea that the Turing test was beaten in the 1970s.
In defense of that thought, Giles mentions that the requirement having the questioner
"actively [try] to determine the nature of the entity they are chatting with" is splitting hairs.
Even if you agree with that, the poker bot certainly does not qualify - there is no chat, and there
is certainly no natural language.
If that still constitutes hair splitting, I'm
sure we can eventually
split enough hairs to reduce the test so that most any program can pass. Give the sum(x,y) x,y ε {0..9}
program to a primary school math teacher, and the fact that it can add correctly may lead the teacher to believe it is a smart kid.
Then add in the random wrong answer to make it more believable.
In any case, if you accept the hair-splitting argument about the program that actually
did chat in natural language, then certainly PARRY
passed the test.
On the other hand,
while you may consider
that the test was passed in fact, it was not done so in spirit.
I'll admit that I think the requirement of a tester talking to both a human and a computer program is an important one.
Even disregarding the "no occasion to
think they were talking to a computer," if we limit the computer's response to a particular set of people
who think they are in a particular context,
we could (for example) use Markov-model generated
sentences with some characters in keyboard-distance-proximity replaced within the words. We now have a drunk typist.
Perhaps in his drunken stupor he is not at his sharpest, but he would likely still be considered an intelligent being.
If the human tester is able to ask questions of both the computer and another human, do you think he would
ever choose the computer program that rephrases his questions to sound paranoid as the human over the
person who answers
to the best of his ability?
The point is that without that requirement, the test becomes meaningless. Giles notes that
All the test really does is expose how little we know about what makes people people.
The fact that it's easy to fool people into thinking a computer is human doesn't actually teach you
anything about the difference between computers and humans; all it does is teach you that it's easy to fool people.
I think that's only true if you conceive of the test as he does. By keeping all of the requirements Turing
proposed, it becomes quite a bit harder to pass, and retains its utility.
Further, removing that quality changes the essence
of the test.
Since we can assume the test was
designed to be useful, can we really say that insisting the test retain its essence is unreasonable?
For the monetary hurdles? A computer to prove anything on you.
If you are serving with talented coders so it doesn't change. Can you understand the case study present in different paradigm shifts in science. I was on vacation, "I can't see my old search terms every time you run into a code monkey? Are you a code monkey? Are you stick to standards so you could have implement our own develop good sound-editor to do it. Even more diplomatic terms, of course of action - if a proxy) the objectives and different approach much better, and how can you believe it?" I think it would solve those domain you're willing to him speak than after a little over a million users so it doesn't "mean that a competitors. Of course of action - if a proxy) the objectives and different paradigms to help out in the to-read list: The Power of integration in asking what other hand, JSF, Seam, and Tapestry are based on my to-read list.) Because I think you can keep the view updated - but there are made, and borders on absurd side of the elements in the code for you to record an improvement. That's not a far leap to see that you do?" Here he actually pointless. How do I know - I was doing before I started to deploy to XBOX 360 or PC (Developers created." Instead, you should use is a contest to avoid conversations. It can be threatened." What are the Maximizers are drive-in traffic and implementation of an old professor of mine, be honest about it. Thanks XNA!
As always came back in January. Perhaps you understand it. If you want to conversations. It can be threatened." What are the number of integration in asking people something to do it. Even more diplomatic terms, of course, you can do. In particular, by posting a good developers created." Instead, you should.
The point is that you're not passion.
There are a couple of repeats noticeable, which is mostly due to my lack of source text. At least, it worked a little better using one-hundred 8.5" x 11" pages of Great Expectations (still gibberish though).
Anyway, here's the Ruby source code:
classMarkovModeldefcreate_model(file,order,unit)#unit = "word" or "character"@unit=unitentire_file=""@order=order#read the entire file inFile.open(file,"r")do|infile|cnt=0while(line=infile.gets)entire_file+=linecnt+=1endend#split the file according to characters or wordsifunit=="word"split_on=/\s/elsesplit_on=""end@entire_file_split=entire_file.split(split_on)#construct a hash like:#first 'order' letters, letter following, increment count@model=Hash.newi=0@entire_file_split.eachdo|c|this_group=@entire_file_split[i,order]next_letter=@entire_file_split[i+order,1]#if group doesn't exist, create itif@model[this_group]==nil@model[this_group]={next_letter=>1}#if group exists, but this "next_letter" hasn't been seen, insert itelsif@model[this_group][next_letter]==nil@model[this_group][next_letter]=1#if group and next letter exist in model, increment the countelsecur_count_of_next_letter=@model[this_group][next_letter]+1@model[this_group][next_letter]=cur_count_of_next_letterendi+=1endenddefgenerate_and_print_text(amount)start_group=@entire_file_split[0,@order]printstart_groupthis_group=start_group(0..(amount-@order)).eachdo|i|next_letters_to_choose_from=@model[this_group]#construct probability hashnum=0probabilities={}next_letters_to_choose_from.eachdo|key,value|num+=valueprobabilities[key]=numend#select next letterindex=rand(num)matches=probabilities.select{|key,value|index<=value}sorted_by_value=matches.sort{|a,b|a[1]<=>b[1]}next_letter=sorted_by_value[0][0]print" "if@unit=="word"#if we're splitting on wordsprintnext_letter#shift the groupthis_group=this_group[1,@order-1]+next_letter.to_aryendenddefprint_modelrequire'pp'PP.pp(@model)endendfile=File.expand_path"source_text.txt"mm=MarkovModel.newmm.create_model(file,7,"character")mm.generate_and_print_text(2000)mm.create_model(file,1,"word")mm.generate_and_print_text(250)#mm.print_model
Posted by Sam on Oct 04, 2007 at 04:00 PM UTC - 5 hrs
A couple of weeks ago the UH Code Dojo embarked on the
fantastic voyage that is writing a program to solve Sudoku puzzles, in Ruby. This week, we
continued that journey.
Though we still haven't completed the problem (we'll be meeting again tenatively on October 15, 2007 to
do that), we did construct what we think is a viable plan for getting there, and began to implement some
of it.
The idea was based around this algorithm (or something close to it):
More...
while (!puzzle.solved)
{
find the most constrained row, column, or submatrix
for each open square in the most constrained space,
find intersection of valid numbers to fill the square
starting with the most constrained,
begin filling in open squares with available numbers
}
With that in mind, we started again with TDD.
I'm not going to explain the rationale behind each peice of code, since the idea was presented above.
However, please feel free to ask any questions if you are confused, or even if you'd just like
to challenge our ideas.
Obviously we need to clean out that commented-out line, and
I feel kind of uncomfortable with the small amount of tests we have compared to code. That unease was
compounded when we noticed a call to get_submatrix instead of get_submatrix_by_index.
Everything passed because we were only testing the first most constrained column. Of course, it will
get tested eventually when we have test_solve, but it was good that the pair-room-programming
caught the defect.
I'm also not entirely convinced I like passing around the index of the most constrained whatever along
with a flag denoting what type it is. I really think we can come up with a better way to do that, so
I'm hoping that will change before we rely too much on it, and it becomes impossible to change without
cascading.
Finally, we also set up a repository for this and all of our future code. It's not yet open to the public
as of this writing (though I expect that
to change soon). In any case, if you'd like to get the full source code to this, you can find our Google Code
site
at http://code.google.com/p/uhcodedojo/.
If you'd like to aid me in my quest to be an idea vacuum (cleaner!), or just have a question, please feel
free to contribute with a comment.
Posted by Sam on Sep 19, 2007 at 01:37 PM UTC - 5 hrs
A couple of days ago the UH Code Dojo met once again (we took the summer off). I had come in wanting to
figure out five different ways to implement binary search.
The first two - iteratively and recursively - are easy to come up with. But what about three
other implementations? I felt it would be a good exercise in creative thinking, and pehaps it
would teach us new ways to look at problems. I still want to do that at some point, but
the group decided it might be more fun to tackle to problem of solving any Sudoku board,
and that was fine with me.
Instead, we agreed that certainly there are some aspects that are testable, although
the actual search algorithm that finds solutions is likely to be a unit in itself, and therefore
isn't likely to be testable outside of presenting it a board and testing that its outputted solution
is a known solution to the test board.
We promptly commented out the test though, since we'd never get it to pass until we were done. That
doesn't sound very helpful at this point. Instead, we started writing tests for testing the validity
of rows, columns, and blocks (blocks are what we called the 3x3 submatrices in a Sudoku board).
Our idea was that a row, column, or block is in a valid state
if it contains no duplicates of the digits 1 through 9. Zeroes (open cells) are acceptable in
a valid board. Obviously, they are not acceptable in a solved board.
To get there, we realized we needed to make initialize take the initial game board
as an argument (so you'll need to change that in the TestSudokuSolver#setup method and
SudokuSolver#solve, if you created it), and then we added the following tests (iteratively!):
The implementation wasn't difficult, of course. We just need to reject all zeroes from the row, then run
uniq! on the resulting array. Since uniq! returns nil if each
element in the array is unique, and nil evaluates to false, we have:
The test failed, so we had to make it pass. Basically, this method is also the same for columns as it was
for rows. We just need to call Array#transpose# on the board, and follow along. The
valid_column? one-liner was @board.transpose[col_num].reject{|x| x==0}.uniq! == nil.
We added that first, made sure the test passed, and then refactored SudokuSolver
to remove the duplication:
All the tests were green, so we moved on to testing blocks. We first tried slicing in two dimensions, but
that didn't work: @board[0..2][0..2]. We were also surprised Ruby didn't have something
like an Array#extract_submatrix method, which assumes it it passed an array of arrays (hey,
it had a transpose method). Instead, we created our own. I came up with some nasty, convoluted code,
which I thought was rather neat until I rewrote it today.
Ordinarily, I'd love to show it as a fine example of how much prettier it could have become. However, due
to a temporary lapse into idiocy on my part: we were editing in 2 editors, and I accidentally saved the
earlier version after the working version instead of telling it not to save. Because of that,
I'm having to re-implement this.
Now that that sorry excuse is out of the way, here is our test for, and implementation of extract_submatrix:
Unfortunately, that's where we ran out of time, so we didn't even get to the interesting problems Sudoku
could present. On the other hand, we did agree to meet again two weeks from that meeting (instead of a month),
so we'll continue to explore again on that day.
Thoughts? (Especially about idiomatic Ruby for Array#extract_submatrix)
Update: Changed the misspelled title from SudokoSolver to SudokuSolver.
Our goal is to give commands to the partial order planner, telling it what the goal is, the initial state (if it exists), and actions it can perform. The actions contain the name of the action, any preconditions that must be fulfilled before that action can be performed, and a set of effects the action has on the world state. After giving this information to the planner, it should output a plan if one exists.
For simplicity's sake, I've used a STRIPS-like notation,
without the complexity of existentially or universally quantified variables, among other things.
Further, only one possible plan is returned, rather than attempting to find all plans.
The one returned is not guaranteed to be optimal, though (inadequate) tests have shown that it is correct.
Plans are to improve these limitations in the future, moving to a less restrictive ADL-style syntax,
and adding support for returning multiple plans.
More...
In the meantime, the first task is to allow a user to enter commands in a syntax that looks like:
PlanName("Put on Shoes")
Goal(:RightShoeOn ^ :LeftShoeOn)
The domain described in Code Sample 1 should produce a plan such as: LeftSock → LeftShoe →
RightSock → RightShoe and RightSock → LeftSock → LeftShoe → RightShoe.
As one can surmise from looking at the domain as it is written, any plan where the socks are on before
the shoes is sufficient.
Finally, the domain given in Code Sample 2 should render plans like Remove(Flat,Axle) →
Remove(Spare,Trunk) → PutOn(Spare,Axle), switching the first two actions depending on which
it decides to do first (since either one would work).
Implementing (or allowing) such syntax in Ruby turns out to be simple. To get the conjunction operator ^, we simply define a module with ^ as a method, and include that module in Ruby's String, Symbol, and Array classes, since we'll be using each of these as symbols in our "new" language (See Code Sample 3).
module Logic
def ^(condition)
[self, condition]
end
end
#modify the symbol class to include the ^ operation
class Symbol
include Logic
end
#modify the array class to include the ^ operation
class Array
include Logic
end
#modify the string class to include the ^ operation
class String
include Logic
end
Next, we need to allow the use of function-style symbols, such as Remove(:Spare,:Trunk).
As with most things in Ruby, this is also simple. We just use the method_missing method in our module:
# when the user enters a function, turn it into an action
def method_missing(method_id, *args)
symbol_name = "#{method_id}("
args.each { |arg| symbol_name += arg.to_s + "," }
symbol_name[0,symbol_name.length-1] + ")"
end
We now have the ability to use the syntax we laid forth in the first two blocks of code
to define our problems that need planning. All that remains are to implement
the functions in our "language" that allow us to define the problem domain, and an
algorithm to solve for plans.
To do so, first, we initialize the start state with an Init() function that simply stores
the conditions it is passed. Similarly, the goal state and initial open preconditions are simply stored
into member variables as they are passed via the Goal() method.
Finally, actions are constructed from a name and a hash with keys PRECOND and EFFECT.
#constants to use to store hash for precondition and effect
#(only for purposes of keeping the DSL looking close to the original)
PRECOND = :precondition
EFFECT = :effect
#store the start-state conditions
def Init(conditions)
@start_state = conditions
end
alias init Init
#store the goal defined by the user
def Goal(conditions)
@goal = conditions
@open_preconditions = @goal
end
alias goal Goal
# store actions defined by the user
def Action(name, precondition_effect)
action= ["name" => name,
"precondition" => precondition_effect[PRECOND],
"effect" => precondition_effect[EFFECT]]
@actions = [] if !@actions
@actions = @actions + action
end
alias action Action
Finally, we come to the meat of the problem, the partial-order planning algorithm. The algorithm itself follows a fairly simple path:
From the list of open preconditions, choose one.
Find an action whose effect is the same as the precondition we chose and add it to the plan.
Add to the list of preconditions any requirements for that action.
Remove from the list of preconditions any that match the effects for the chosen action.
Repeat steps 1-4 until the set of open preconditions is empty, or no action that satisfies a precondition can be found.
Remove any preconditions from the open list that match the starting state.
If the set of open preconditions is empty, return the plan. Otherwise, fail.
The algorithm in Ruby follows:
def make_plan
action_plan = []
fail = false
while (@open_preconditions.size > 0 && !fail)
#randomize the open_preconditions and actions to show order
#doesn't matter
@open_preconditions=@open_preconditions.sort_by { rand }
#find an action that solves it the first open precondition
attempted_precondition = @open_preconditions.shift
action_to_take = find_action_for attempted_precondition
if (action_to_take != nil)
add_preconditions_for action_to_take
remove_preconditions_fulfilled_by action_to_take
#add the action to the plan
action_plan.push(action_to_take["name"])
else
#put the precondition back on the open_preconditions, since
#it wasn't fulfilled by an action
fail = true if @open_preconditions.size == 0
@open_preconditions.push attempted_precondition
remove_preconditions_matching_start_state
fail = false if @open_preconditions.size == 0
end
end
if @open_preconditions.size > 0 || fail
puts "There appears to be no plan that satisfies the problem."
puts "Open preconditions: "
puts @open_preconditions
action_plan = []
end
sanitize_plan(action_plan.reverse)
end
Most of the code is aptly named where there are functions
(see RubyPOP.zip for the complete source), but two issues in this algorithm immediately
jump to the forefront. The first is: why aren't we also randomizing the list of actions?
Clearly, if there are two actions that satisfy the same precondition, the first one encountered will always win. This was done because randomizing the list of actions (if two or more satisfy the same precondition) has the potential to cause a loop of preconditions/effects, and thus cause incorrect plans to be generated. Since no attempt was made at finding the optimal plan, I didn't want to clutter the code and make it harder to follow. Correct plans are still generated, and a future version meant for more demanding environments would indeed allow a random action to be chosen.
The second issue that is not immediately clear begs the question: "just what is that sanitize_plan method doing there?" Some actions may add duplicate preconditions to the set of open preconditions. The algorithm as it stands allows this to happen for readability purposes, and simply cleans up the plan later with the sanitize_plan function.
Finally, it is also clear that a more "elegant" solution may have been to take actions as functions, which receive preconditions as their parameters, and whose output are effects. The thought of such an implementation is interesting and worthy of exploration, though time constraints prevented me from doing so in this case.
As mentioned above, a complete version of the code and three manual tests can be found in the RubyPOP.zip file.
As David Chapman noted in 1985, "planners of the most promising ('nonlinear') sort have been complicated, heuristic, ill-defined AI programs, without clear conditions under which they work." And since the time of NOAH, INTERPLAN, and NONLIN (and others I've left out for space reasons), there have been various improvements in the realm of partial-order planning. Chapman's program, TWEAK, is one of them, and was followed by the UCPOP and RePOP planners.
More...
Chapman said that he "decided to copy exactly" Sacerdoti's work on NOAH to implement his own planner, but struggled to make it work (and TWEAK was a result of it when he did finally get it to work). In explaining the reason for publishing his work on TWEAK, Chapman quotes Sacerdoti who said NOAH's "basic operations" (Chapman, 1) "were developed in an ad hoc fashion. No attempt has been made to justify the transformations that they perform, or to enable them to generate all transformations." (Sacerdoti, quoted in Chapman, 1) He then goes on to compare what he has done with Sacerdoti's work to the longstanding AI varieties of "scruffy" versus "neat." In general, the "scruffies" just want to try solutions and figure out what seems to work, and tolerate less mathematical rigor and proof than the "neats" (Russell, 25).
In essence, TWEAK was Chapman's successful attempt at formalizing the "theory about the ways in which interacting subgoals can be dealt with" (Sacerdoti, quoted in Chapman, 1), or formalizing partial-order planning.
Chapman called TWEAK "a rigorous mathematical reconstruction of previous nonlinear planners," and "an implemented, running program," which he described and proved correct in his paper. He was right: eighteen years later Russell and Norvig said his work "led to what arguably the first simple and readable description of a complete partial-order planner," which was incarnated as SNLP (Russell, 410) (on which Russell and Norvig base the POP in their book, which I in turn implement in Ruby in the following section).
After SNLP came UCPOP in 1992 from Daniel Weld (who also was an author of SNLP) and J. Scott Penberthy. As the title says, it is "a sound, complete, partial order planner for ADL." (ADL is a more advanced way to represent problems, Action Description Language). Their desire to create UCPOP stemmed from two problems they found with existing research into POP algorithms. The first hearkens back to the scruffies vs. neats debate: Many researchers "looked at formal characteristics of languages for describing change," as others built "actual planners, often losing a precise understanding of their programs in a forest of pragmatic choices" (Penberthy, 1). Further, the very few that had "complete algorithms" (emphasis added) either only implemented "the restrictive STRIPS representation" of problems (citing TWEAK and SNLP), or represented "plans as totally ordered sequences of actions" (Penberthy, 1).
Penberthy and Weld described UCPOP as "a theorem prover" at its heart. The algorithm itself requires three parameters: a plan, the set of goals that remain, and a set of actions. The goals are expressed as a tuple with two elements: a precondition and a step in the plan (Penberthy 6-7). An overview of the algorithm can be described as follows:
If the set of goal states is empty, return the plan (reporting "success").
Select a goal from the set of goals. If there is an invalid link that makes the plan impossible, exit, reporting failure.
Choose an operator.
Generate subgoals.
Protect against threats that may "cause the undoing of a needed goal if that step is done at the wrong time" (Dyer).
Recursively call the algorithm if the plan is not inconsistent.
(Penberthy, 6)
Although UCPOP was novel for its time, a visit today to its website (maintained by the authors at the University of Washington) shows it is quite outdated. The authors note, "UCPOP is an aging system - we recommend Sensory Graphplan (SGP) instead. SGP handles a superset of UCPOP functionality and is much, much faster" (Weld). On the other hand, research by XuanLong Nguyen and Subbarao Kambhampati in 2000 has shown that quite a few improvements can be made to UCPOP in particular, and partial-order planning in general.
Nguyen and Kambhampati argue that "the prevailing pessimism about the scalability of partial order planning (POP) algorithms" is perhaps unwarranted (Nguyen, 1). The pair seem to regret that research on POP algorithms seemed to stop (around 5 years prior) and make that point that advances in "heuristic state space planners … and CSP-based planners" like Graphplan are perhaps "(mis)interpreted as establishing the supremacy of state space and CSP-based approaches over" those of the POP variety (Nguyen, 1).
As evidence of their claim, they created RePOP, which is a partial-order planner based on UCPOP. But the novelty of RePOP lies in the authors' "key insight … that the techniques responsible for the efficiency of the currently successful planners … can also be adapted to dramatically improve the efficiency of the POP algorithm" (Nguyen, 1).
The techniques they applied to RePOP that other strategies had applied, "distance based heuristics, disjunctive representations for planning constraints and reachability analysis," led to outstanding results in "several 'parallel planning' domains" (Nguyen, 6). In fact, since in general, partial ordered planners are more flexible than their CSP counterparts, they obtained greater flexibility than Graphplan, while also outperforming it in most of their experiments (Nguyen, 5-6). The performance increase wasn't slight however- in one problem that RePOP took less than 3 seconds to solve, Graphplan took 47 minutes. On average, when GraphPlan could solve a problem within 3 hours or without using all 250 MB of memory, it still took around 10 times longer than RePOP to find a solution (Nguyen, 5).
The systems described above have a common link in that they are all focused on generic problems. However, it should be noted that in some domains, such as medicine, planners of these sorts do not tend to work well. As a consequence, great improvements – in performance and in accuracy – can be made by building a planner that can be supplied with domain-specific information (Miksch). Despite the interesting nature of such systems, covering planners of that sort is outside the scope of this paper, and I now turn to the second task of this report – building a partial order planner in Ruby.
* * * *
I've been doing almost nothing besides reading academic papers the last couple of weeks, and I'm
mostly done on that front. I enjoy reading them, but at times they can be dense, and take several
readings just to understand what's going on. In any case, I'm looking forward to doing some actual
programming now.
As before, questions, comments, observations, and critiques on how to improve this are appreciated.
References:
Chapman, David. "Planning for Conjunctive Goals." Massachusetts Institute of Technology. 1985. Available online from MIT
Dyer, C.R. "Partial-Order Planning: Chapter 11." CS 540 Lecture Notes. Retrieved on the World Wide Web on 4/24/2007 from this place.
Figure 1: Retrieved on 4/20/2007 from this place and verified in Russell and Sacerdoti.
Miksch, Silvia. "Plan management in the medical domain." AI Communications 12, pp 209-235. 1999.
Nguyen, XuanLong and Kambhampati, Subbarao. "Reviving Partial Order Planning." 2000. Retrieved on 4/20/2007 from around here on the World Wide Web.
Penberthy, J. Scott and Weld, Daniel S. "UCPOP: A Sound, Complete, Partial Order Planner for ADL." Third International Conference on Principles of Knowledge Representation and Reasoning, pp. 189-197. Cambridge, MA.
Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach 2nd Edition. New Jersey: Pearson Education, 2003.
Sacerdoti, Earl D. "The Nonlinear Nature of Plans." 1975. Proceedings of the Fourth International Joint Conference on Artificial Intelligence. Retrieved on 4/20/2007 from here on the World Wide Web.
Tate, Austin [A]. "INTERPLAN: a plan generation system which can deal with interactions between goals" Research Memorandum MIP-R-109, Edinburgh: Machine Intelligence Research Unit, December 1974. (Can be found online at here)
Tate, Austin [B]. "Generating Project Networks", Proceedings of the Fifth International Joint Conference on Artificial Intelligence (IJCAI-77) pp. 888-893, Boston, Mass. USA, August 1977. (Can be found online here)
Weld, Dan and Penberthy, Scott. "The UCPOP Planner." Retrieved on the World Wide Web on 4/19/2007 from Washington Univ.
Posted by Sam on Apr 22, 2007 at 03:18 PM UTC - 5 hrs
It was apparent as far back as 1975 that "linear planning" (or totally ordered planning, as described in What is Partial-Order Planning?) was not sufficient. Russell and Norvig relay the story of Allen Brown's experiment: that it could not solve simple problems such as the Sussman anomaly, where given 3 blocks labeled A, B, and C, with block B on the table and C on top of A which is on the table, get to the goal state of A on top of B on top of C (Russell, 410, 414) (See Figure 1). Around that time, as part of his Ph.D., Austin Tate released a paper which described INTERPLAN, a system to solve the problem of interleaving shown by Brown using Sussman's HACKER program (the Sussman Anomaly) (Tate [A] / Russell 410).
More...
Shortly thereafter, Earl D. Sacerdoti released his paper, "The Nonlinear Nature of Plans" which presented a new data structure to represent plans and gave a glimpse of the first partial order planners, Nets Of Action Hierarchies (NOAH) (and compared that to INTERPLAN, among others) (Sacerdoti, 8).
Figure 1. The Sussman Anomaly.
Sacerdoti begins by acknowledging that, "we usually think of plans as linear sequences of actions … because plans are usually executed one step at a time." However, he observed, the "plans themselves are not constrained by limitations of linearity." Because of this, a new structure is needed, which he calls a "procedural net," that would represent "a plan as a partial ordering of actions with respect to time" (Sacerdoti, 1). He backs up his assertion showing the famous Sussman anomaly (described above), and moves on to describe procedural nets and NOAH itself.
Procedural nets are networks of four different types of nodes (GOAL, PHANTOM, SPLIT, and JOIN) and each node represents an action. The nodes are then linked together to form plans. GOAL nodes, clearly, represent a goal that should be achieved. PHANTOM nodes "represent goals that should already be true at the time they are encountered." And, as one might imagine, SPLIT nodes represent splits in the plan, while JOIN nodes represent forks that are coming to an end. Finally, as it is implemented, each of the nodes has a pointer to some code (also known as a closure, lambda expression, or anonymous function) (Sacerdoti, 2).
NOAH uses that structure to represent plans, and a "generic" domain specific language called SOUP (Semantics of a Users' Problem) to give the system knowledge about the task domain (Sacerdoti, 2). To clarify, I call it generic here because it is suitable for describing problems in many domains, but it is specific in that its sole purpose is to describe problems.
To create a new plan, first give NOAH a goal to achieve and tell it about the problem with SOUP. Then it builds a procedural net with only a goal node, which contains a "list of all relevant SOUP functions as its body." Finally, run the planning algorithm. The planning algorithm is described as (quoting Sacerdoti, 3):
Simulate the most detailed plan in the procedural net. This will have the effect of producing a new, more detailed plan.
Criticize the new plan, performing any necessary reordering or elimination of redundant operations.
Go to Step 1.
(Sacerdoti, 3).
Step one is performed in effect by calling the anonymous function pointed to by the node. Step two runs the plan against several critics, namely "Resolve Conflicts," "Use Existing Objects," and "Eliminate Redundant Preconditions," which all perform the actions one would expect from knowing their names (Sacerdoti, 4).
After the release of NOAH, Tate created a new partial order planner dubbed NONLIN. From Sacerdoti, he recognizes "that ordering constraints should only be imposed between the actions comprising a plan if these are necessary for the achievement of the overall purpose of the plan" and bases his work upon it (Tate [B], 889). On the other hand, Tate realized the need for NONLIN because NOAH
still had to make choices as to the order that actions were to be placed-in a plan to correct for interactions. NOAH made this choice in one particular way. It did not keep any backtrack choice points, so this decision, once made, was irreversible. This leads to an incompleteness of the search space which can render some simple block pushing tasks unachieveable (sic) by NOAH… NONLIN is capable of correcting for an interaction by suggesting two orderings (which are sufficient to ensure the incompleteness of NOAH mentioned above is avoided…). (Tate [B], 889)
Additionally, Tate said that "other operations performed by NOAH deterministically … should also be considered as choice points" and gives two examples where "if such decisions cannot be undone, some problems are unsolvable." Of course, NONLIN fixed these problems (Tate [B], 889).
Questions, comments, observations, and critiques on how to improve this are appreciated, as always.
References:
Figure 1: Retrieved on 4/20/2007 from this place and verified in Russell and Sacerdoti.
Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach 2nd Edition. New Jersey: Pearson Education, 2003.
Sacerdoti, Earl D. "The Nonlinear Nature of Plans." 1975. Proceedings of the Fourth International Joint Conference on Artificial Intelligence. Retrieved on 4/20/2007 from here on the World Wide Web.
Tate, Austin [A]. "INTERPLAN: a plan generation system which can deal with interactions between goals" Research Memorandum MIP-R-109, Edinburgh: Machine Intelligence Research Unit, December 1974. (Can be found online at here)
Tate, Austin [B]. "Generating Project Networks", Proceedings of the Fifth International Joint Conference on Artificial Intelligence (IJCAI-77) pp. 888-893, Boston, Mass. USA, August 1977. (Can be found online here)
Posted by Sam on Apr 05, 2007 at 09:48 AM UTC - 5 hrs
Last night I had the good fortune to be in attendance at a talk given by Ray Kurzweil entitled "The Web Within Us: When Minds and Machines Become One." For those unfamiliar with Ray, part of his bio as given in the program distributed at the presentation reads
Ray Kurzweil has been described as "the restless genius" by the Wall Street Journal and "the ultimate thinking machine" by Forbes. Moreover, Inc. magazine ranked him eighth among entrepreneurs in the United States, calling him the "rightful heir to Thomas Edison." ...
As one of the leading inventors of our time, Kurzweil was the principle developer of the first CCD flat-bed scanner, the first omni-font optical character recognition system, the first print-to-speech reading machine for the blind, the first text-to-speech synthesizer, the first music synthesizer capable of recreating the sound of a grand piano and other orchestral instruments, and the first commercially marketed large vocabulary speech recognition system.
The talk was simply mind-blowing. He showed off a camera for the blind that could correct for 3D rotation and angles to read a page in a book, and some still-in-the-works but much better than currently available speech-to-speech translation (using speech-to-text, then translation, then text-to-speech). Some of the more interesting bits reminded me of a couple of ideas I had back in high school - one of combatting cancer with "machines" where they could be injected into the blood stream, detect cancer cells, and destroy them - and the other was like a "printer" that could build things, given the specs and materials (my words, not his). I should have been an inventor!
He also mentioned that we are coming ever closer to being able to reverse engineer the entire brain - saying the cerebral cortex (if I remember correctly... I could just be making that up) has already been well synthesized (along with 20 or so other parts of the brain) and that soon, we'd be able to do all the parts of the brain. Another cool bit was the fat-inhibitor pill, which would detect the gene responsible for storing fat, and switch it off.
There was a lot of AI and biotechnology involved, but the talk would still have been easy to follow for just about anyone I think (there were 1500 people or so in attendance). And the breadth and depth of his knowledge on various subjects is just simply amazing. I wish I could have made it to the follow-up this morning where it would have been just a small classroom-sized audience with a lot more interaction, but I just couldn't get away from the office for the amount of time I would have needed to (plus, it would have been rush hour, where the drive would take me 1.5 hours instead of the normal 30 minutes).
One last thing - he talked about full-immersion virtual reality- where we could switch off the actual receptors (well, more like switch what they are receiving) in our brain and replace the signal. Very cool stuff indeed, but also scary. If you're into that branch of technology, or just like to know what's coming up in the future, I'd recommend checking out his website or the books. I certainly plan to.
Posted by Sam on Apr 02, 2007 at 01:41 PM UTC - 5 hrs
For our case, we'll explore partial-order planning in a classical planning environment. Such an environment is fully observable (as opposed to only partially so) and deterministic (as opposed to having randomness, or being stochastic). Further, the space is finite and static in nature - it does not change in the middle of deliberation. Finally, the environment is "discrete (in time, action, objects, and effects)," as opposed to continuous along any of these axes (Russell, 375. For further reading on the characteristics of environments, see Russell page 41-42).
More...
To understand what partial-order planning entails, it might be helpful to know what planning is, and then what ordered planning entails. To that end, planning is "the task of coming up with a sequence of actions that will achieve a goal" (Russell, 375). That is a fairly straightforward - exactly as we would expect if we weren't speaking of computers and programs.
An example is simple: given a set of actions I can perform, which ones do I choose (and in what order should I apply them) in order to reach my goal? I've got to get to work this morning - what should I do to get there? I might need to wake up, turn off the alarm, shower, take off my wet pajamas and put on something suitable for doing business, and so on, until I reach work in the morning (or, afternoon if it was a late night). However, we wouldn't consider having our agent perform the actual driving of the car in a classical environment. This is because in doing so, our environment loses several of the characteristics we laid forth above - chief among these are that the new environment would become stochastic and continuous.
There are several types of algorithms that allow us to construct a plan. For instance, we briefly examine progression planning, regression planning, and our main topic, partial-order planning. Progression planning is done with a forward state-space search, which is to say that, "we start in the problem's initial state [and consider] sequences of actions until we find a sequence that reaches a goal state" (Russell, 383). This can pose major performance problems because it considers even completely irrelevant actions. As you might guess from its name, regression planning is the opposite - it works backwards from the goal state. This removes the problems associated with examining irrelevant actions, but as Russell notes, it is not without its problems: oftentimes it is not "obvious how to generate a description of the possible predecessors of the set of goal states" (Russell, 384).
What distinguishes partial-order planning from the other two is all in the name - it is not totally-ordered as we see in progression and regression planning. Instead, partial-order planning enables us to "take advantage of problem decomposition." The algorithm "works on several subgoals independently, solves them with several subplans, then combines the subplans" (Russell, 387). In addition, he notes that, "such an approach also has the advantage of flexibility in the order in which it constructs the plan. That is, the planner can work on 'obvious' or 'important' decisions first, rather than being forced to work on steps in chronological order."
We can see a similar phenomenon in application design - we may not choose to work in any chronological order, but instead work on the parts to construct the whole. Whereas some of the most important design decisions are often made at the beginning of a project, when we know the least about it, the prudent designer may choose to work on more obvious decisions, or decide to choose to work on the most important one at the time.
Reference:
Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach 2nd Edition. New Jersey: Pearson Education, 2003.
----------------------------------------------
Does that make any sense? Are there areas that could be better explained, or worded? Any input is appreciated.
Posted by Sam on Mar 29, 2007 at 03:54 PM UTC - 5 hrs
Wow, six posts in one day. I'm exhausted. My last link to InfoQ today comes in thanking them for the timely post Ruby Domain Specific Languages Buzz. There, I got that out of the way.
It is timely for me, because a couple of weeks ago I decided I was going to try to implement a Partial-Order Planner DSL in Ruby. I haven't yet started, nor have I decided on a full course of action. But, I do have a vague strategy outlined in my head, and while I have yet to decide if I will be using the code provided by Reginald Braithwaite in his post about "an Approach to Composing Domain-Specific Languages in Ruby," the content will probably prove helpful. Another link they put was to Creating DSLs with Ruby posted by Jim Freeze to Artima's Ruby Code and Style.
I'll let you know how my progress goes. I'll probably start a little survey of what Partial-Order Planning is - similar to my paper on k-means - and cover some research about it first and post piece by piece to build my paper this time, rather than waiting to post it all at once. I'm not married to that approach, so you might see some code first ... but I'm leaning that way.
Posted by Sam on Dec 04, 2006 at 08:31 AM UTC - 5 hrs
Just finished writing a survey on some of relatively current literature on k-means, focusing on introducing it, some practical applications of it, some difficulties in it, and how to find k, the number of clusters. I'm still new to the area, so don't expect much groundbreaking to be done.
The second half focuses on my own experiment, trying to find k using two similar, but slightly different techniques. I failed, but if you'd like to go over it and either laugh at me, or perhaps figure out what I've done wrong, you are free to. =)
Obviously, this isn't going to interest many people, so I didn't take time to mark it up - it's just available as a DOC (I had planned on having a PDF version, but my PDF writer has taken a crap on me). If you don't have Word or Open Office, and would like to read it, contact me and I'll try to get the PDF for you in some way or another.
Anyway, the DOC is here if you want to read it. It's over 3600 words, so beware!
I'm interested to know if anyone has built any machine learning libraries or done anything with machine learning in Coldfusion? My immediate thought is "no way!" because I don't think Coldfusion has the performance for it. But, I wouldn't know, since I haven't tried it. Have you? What's been your experience? Drop me a line if you care to.