In an effort to bring about a discussion (and hopefully diffuse some information on) scaling, last week I told the story of Origin Shabamtech and Gulfomatic's Solutioneers, bringing about an open question regarding the myth of
how Gulfomatic solved Shabamtech's scaling woes.
After telling the story, I asked,
I want to figure out the mystery. Do you have any ideas? How would you determine what's causing the site to crash? What might you look at? What might you do to fix it?
Let's recount what we know about the situation:
- It was making a ton of money.
- There was an application running it.
- There was no source code for the application, and no vendor to contact to get it.
- The application had to access a database by knowing the DB file location.
- Gulfomatic's Solutioneers tried several potential solutions before finding the one that worked
That's all we know initially. What does that tell us?
- Our solution needs to be implemented quickly, and we have money to spend if we need to.
- Even if the application itself is the cause of our problems, we can't change it because we lack the source code, so our solution has to be done outside of the app itself.
- Whatever solution we come up with has to take into account the fact that we're using a file path to access a database.
- You can work methodically - trying the simplest solutions first, and revise them as more information presents itself.
But where do we go from here? The answer came from
shag in
the comments to the original post:
any clue as to what we're working with here? what kind of app are we talking about? what are the pieces in the puzzle that makes it go? what are the symptoms? what kind of access do we have? what kind of layers to we have? what kind of hardware do we have? what kind of os do we have?
i think we need to address what we are dealing with prior to the how to resolve the problem.
that being said, from a high level (and i mean like jupiter), some basic things are:
- identify symptoms
- scour logs
We start by identifying the symptoms. In this case, the application is crashing. What are the potential bottlenecks in the system that might cause it to crash? Are we lacking bandwidth? Is the web server crawling to a halt? Is the application itself using too many resources? Can the DB handle the load being thrown its way? Maybe the combination of these things is just causing the computer to crash.
Reading the logs and monitoring the different processes will probably give you an idea of who is the culprit.
Check the application logs if it has them. Check the web server and DB logs. Check your OS logs. Just like in programming, you cannot just make changes in random places to improve performance. You need to analyze the system's behavior to find the bottleneck, which will tell you where your changes will be most effective.
That gives us several potential bottlenecks:
- Insufficient computational ability in processor speed, disk space / speed, or memory. Completely a hardware problem.
-
The web server + database + application combined are just too much for the current hardware to handle.
-
The database itself just cannot handle the number of requests being sent its way.
-
The application (which lacks source code!) is the bottleneck. It hogs processor cycles and memory like a squirrel hoards nuts.
(What did I miss?)
Given what we know about the situation, what would you do in each scenario?
Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!
Leave a comment
There are no comments for this entry yet.
Leave a comment