My Secret Life as a Spaghetti Coder
home | about | contact | privacy statement
It's a small step, but emcee-3PO can now identify the staves in an image of sheet music for my single test case of "My Darling Clementine." I need to include hundreds more test cases, and I plan to when I implement code to make the tests mark the sheet music with what emcee3po detected so I can visually inspect the accuracy.

Ichiro Fujinaga's "Optical Music Recognition using Projections" (PDF) explains the process in detail, but it turns out to be relatively simple.

To locate the staves:
  1. Do a y-projection on the image.
    A projection just reduces the number of dimensions in an image. In this case, we just take the number of dark-colored pixels in a row of the image. It's similar in theory to 3D projection, but instead of projecting three dimensions onto a plane, we're projecting a plane onto a line.

    I used a threshold of 50% to determine if a pixel was dark enough to include in the projection. So, if R+G+B < (FF+FF+FF) / 2, I count the pixel as dark.

  2. Find the local maxima.
    We want to find the places where the number of dark pixels in a row is highest - those will indicate the horizontal lines on the staff. To do that, we find all the places where the number of pixels stops growing and starts getting smaller -- or where the slope changes from positive to negative. To ignore noise, we set a threshold as Fujinaga suggests at the average of each row, so we don't include anything less than that in our collection of local maxima.

  3. Find the tightest groups of 5.
    We want to find all the places where 5 local maxima are the smallest distance apart, which should indicate the 5 lines in a staff. This part is accomplished by examining each 5-element window in the array of local maxima, and finding the one with the smallest distance between its points. Then you can remove all the windows that include any of those points, and continue until there are no more windows.

  4. Expand those indexes to collect the places where the notes fall outside the staff lines.
    I don't remember Fujinaga mentioning this in the paper I linked to above, but I'm thinking it must be in there. Essentially, since the local maxima get us only what's in between the 5 lines of the staff, we need to expand it a bit so we can get the notes that don't fall directly between the 5 lines. Right now, I've used 1/4 of the average of the rows in the projection, but I think it will need to be an even smaller threshold because I'm still not reliably getting all of the notes.
Up next: reading the notes on the staves. That's going to be cool.

Hey! Why don't you make your life easier and subscribe to the full post or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!


Comments
Leave a comment

Shouldn't this be a prime example for Hough Transform for lines? Projecting along the y axis will screw up if the image is not straight (or if you have keystone effect due to distance parallax).

Posted by Dat Chu on Oct 28, 2011 at 11:33 AM UTC - 5 hrs

I think if the image is reasonably straight (meaning the slope of the staves is low) it corrects for itself, because it's looking for maxima in groups of 5, and then expands outward until the magnitude of the projection is lower than some threshold.

That said, I do have an example of a scan-gone-wrong that made the music not straight, so when I get a chance, I'll run that through it to see how well it does.

Also, it does sound like a great example to use Hough Transform. I'll look into doing an implementation of that too, if for no other reason than the fun of it.

Thanks for pointing it out to me!

Posted by Sammy Larbi on Oct 31, 2011 at 02:18 PM UTC - 5 hrs

Leave a comment

Leave this field empty
Your Name
Email (not displayed, more info?)
Website

Comment:

Subcribe to this comment thread
Remember my details
Google
Web CodeOdor.com

Me
Picture of me

Topics
.NET (19)
AI/Machine Learning (14)
Answers To 100 Interview Questions (10)
Bioinformatics (2)
Business (1)
C and Cplusplus (6)
cfrails (22)
ColdFusion (78)
Customer Relations (15)
Databases (3)
DRY (18)
DSLs (11)
Future Tech (5)
Games (5)
Groovy/Grails (8)
Hardware (1)
IDEs (9)
Java (38)
JavaScript (4)
Linux (2)
Lisp (1)
Mac OS (4)
Management (15)
MediaServerX (1)
Miscellany (76)
OOAD (37)
Productivity (11)
Programming (168)
Programming Quotables (9)
Rails (31)
Ruby (67)
Save Your Job (58)
scriptaGulous (4)
Software Development Process (23)
TDD (41)
TDDing xorblog (6)
Tools (5)
Web Development (8)
Windows (1)
With (1)
YAGNI (10)

Resources
Agile Manifesto & Principles
Principles Of OOD
ColdFusion
CFUnit
Ruby
Ruby on Rails
JUnit



RSS 2.0: Full Post | Short Blurb
Subscribe by email:

Delivered by FeedBurner