low cou-pling and high co-he-sion
n.
-
A standard bit of advice for people who are learning to design their code better, who want to
write software with intention as opposed to coincidence, often parroted by the advisor
with no attempt to explain the meaning.
Motivation
It's a great scam, don't you think? Someone asks a question about how to design their code,
and we have these two nebulous words to throw back at them: coupling and cohesion.
We even memorize a couple of adjectives that go with the words: low and high.
Cohesion Good. Coupling, Baaaaad!
It's great because it shuts up the newbie who asks the question -- he doesn't want to appear dumb, after all --
and it gets all of those in-the-know to nod their heads in approval. "Yep, that's right. He's got it. +1."
But no one benefits from the exchange. The newbie is still frustrated, while the professional doesn't
give a second thought to the fact that
he probably doesn't know what he means. He's just parroting
back the advice that someone gave to him. It's not malicious or even conscious, but nobody is getting smarter
as a result of the practice.
Maybe we think the words are intuitive enough. Coupling means that something is depending on something else, multiple
things are tied together. Cohesion means ... well, maybe the person asking the question heard something about
it in high school chemistry and can recall it has something to do with sticking together.
Maybe they don't know at all.
Maybe, if they're motivated enough (and not that we've done anything to help in that department), they'll look it
up:
Types of Cohesion and Coupling
Types of Cohesion
Coincidental cohesion (worst)
is when parts of a module are grouped arbitrarily (at random); the parts have no significant relationship (e.g. a module of frequently used functions).
Logical cohesion
is when parts of a module are grouped because they logically are categorised to do the same thing, even if they are different by nature (e.g. grouping all I/O handling routines).
Temporal cohesion
is when parts of a module are grouped by when they are processed - the parts are processed at a particular time in program execution (e.g. a function which is called after catching an exception which closes open files, creates an error log, and notifies the user).
Procedural cohesion
is when parts of a module are grouped because they always follow a certain sequence of execution (e.g. a function which checks file permissions and then opens the file).
Communicational cohesion
is when parts of a module are grouped because they operate on the same data (e.g. a module which operates on the same record of information).
Sequential cohesion
is when parts of a module are grouped because the output from one part is the input to another part like an assembly line (e.g. a function which reads data from a file and processes the data).
Functional cohesion (best)
is when parts of a module are grouped because they all contribute to a single well-defined task of the module
Types of Coupling
Content coupling (high)
is when one module modifies or relies on the internal workings of another module (e.g. accessing local data of another module).
Therefore changing the way the second module produces data (location, type, timing) will lead to changing the dependent module.
Common coupling
is when two modules share the same global data (e.g. a global variable).
Changing the shared resource implies changing all the modules using it.
External coupling
occurs when two modules share an externally imposed data format, communication protocol, or device interface.
Control coupling
is one module controlling the logic of another, by passing it information on what to do (e.g. passing a what-to-do flag).
Stamp coupling (Data-structured coupling)
is when modules share a composite data structure and use only a part of it, possibly a different part (e.g. passing a whole record to a function which only needs one field of it).
This may lead to changing the way a module reads a record because a field, which the module doesn't need, has been modified.
Data coupling
is when modules share data through, for example, parameters. Each datum is an elementary piece, and these are the only data which are shared (e.g. passing an integer to a function which computes a square root).
Message coupling (low)
is the loosest type of coupling. Modules are not dependent on each other, instead they use a public interface to exchange parameter-less messages (or events, see Message passing).
No coupling
[is when] modules do not communicate at all with one another.
What does it all mean?
The Wikipedia entries mention that "low coupling often correlates with high cohesion" and
"high cohesion often correlates with loose coupling, and vice versa."
However, that's not the intuitive result of simple evaluation, especially on the part of someone who doesn't
know in the first place.
In the context of the prototypical question
about how to improve the structure of code, one does not lead to the other. By reducing coupling, on the face of
it the programmer is going to merge unrelated units of code, which would also reduce cohesion. Likewise, removing
unrelated functions from a class will introduce another class on which the original will need to depend, increasing
coupling.
To understand how the relationships become inversely correlated requires a larger step in logic, where examples
of the different types of coupling and cohesion would prove helpful.
Examples from each category of cohesion
Coincidental cohesion often looks like this:
class Helpers;
class Util;
int main(void) {
where almost all of your code goes here;
return 0;
}
In other words, the code is organized with no special thought as to
how it should be organized.
General helper and utility classes,
God Objects,
Big Balls of Mud, and other
anti-patterns
are epitomes of coincidental cohesion.
You might think of it as the lack of cohesion: we normally talk about cohesion being a good thing, whereas
we'd like to avoid this type as much as possible.
(However, one interesting property of coincidental cohesion is that even though the code in question should
not be stuck together,
it tends to remain in that state because programmers are too afraid to touch it.)
With
logical cohesion, you start to have a bit of organization. The Wikipedia example mentions "grouping
all I/O handling routines." You might think, "what's wrong with that? It makes perfect sense." Then consider that
you may have one file:
IO.somelang
function diskIO();
function screenIO();
function gameControllerIO();
While logical cohesion is much better than coincidental cohesion, it doesn't necessarily go far enough in terms
of organizing your code. For one, we've got all IO in the same folder in the same file, no matter what type of
device is doing the inputting and outputting. On another level, we've got functions that handle both input and
output, when separating them out would make for better design.
Temporal cohesion
is one where you might be thinking "duh, of course code that's executed based on some other
event is tied to that event." Especially considering the Wikipedia example:
a function which is called after catching an exception which closes open files,
creates an error log, and notifies the user.
But consider we're not talking about simple the relationship in time. We're really interested in the code's structure.
So to be temporally cohesive, your code in that error handling situation should keep the
closeFile,
logError, and
notifyUser functions close to where they are used. That doesn't mean
you'll always do the lowest-level implementation in the same file -- you can create small functions that take
care of setting up the boilerplate needed to call the real ones.
It's also important to note that you'll almost never want to implement all of that directly in the
catch
block. That's sloppy, and the antithesis of good design. (I say "almost" because I am wary of absolutes, yet I cannot think
of a situation where I would do so.) Doing so violates functional cohesion, which is what we're really
striving for.
Procedural cohesion
is similar to temporal cohesion, but instead of time-based it's sequence-based. These are similar because
many things we do close together in time are also done in sequence, but that's not always the case.
There's not much to say here. You want to keep the definitions of functions that do things together structurally
close together in your code, assuming they have a reason to be close to begin with. For instance,
you wouldn't put two modules of code together if they're not at least logically cohesive to begin with. Ideally,
as in every other type of cohesion, you'll strive for functional cohesion first.
Communicational cohesion
typically looks like this:
some lines of code;
data = new Data();
function1(Data d) {...};
function2(Data d) {...};
some more lines of code;
In other words, you're keeping functions together that work on the same data.
Sequential cohesion
is much like procedural and temporal cohesion, except the reasoning behind it is that functions would
chain together where the output of one feeds the input of another.
Functional cohesion is the ultimate goal.
It's
The Single Responsibility Principle [PDF] in
action. Your methods are short and to the point. Ones that are related are grouped together locally in a file.
Even files or classes contribute to one purpose and do it well. Using the IO example from above, you might have
a directory structure for each device, and within it, a class for Input and one for Output. Those would be children
of abstract I/O classes that implemented all but the device-specific pieces of code.
(I use inheritance terminology here only
as a subject that I believe communicates the idea to others. Of course, you don't have to even have inheritance
available to you to achieve the goal of keeping device agnostic code in one locale while keeping the device
specific code apart from it).
Examples from each category of coupling
Content coupling is horrific. You see it all over the place. It's probably in a lot of your code, and
you don't realize it. It's often referred to a violation of encapsulation in OO-speak, and it looks like one
piece of code reaching into another, without regard to any specified interfaces or respecting privacy. The problem
with it is that when you rely on an internal implementation as opposed to an explicit interface, any time that
module you rely on changes, you have to change too:
module A
data_member = 10
end
module B
10 * A->data_member
end
What if data_member was really called
num_times_accessed? Well, now you're screwed since you're
not calculating it.
Common coupling
occurs all the time too. The Wikipedia article mentions global variables, but this could be just a member in a class
where two or more functions rely on it if you consider it. It's not as bad when its encapsulated behind an interface,
where instead of accessing the resource directly, you do so indirectly, which allows you to change internal
behavior behind the wall, and keeps your other units of code from having to change every time the shared resource
changes.
An example of
external coupling is a program where one part of the code reads a specific file format that
another part of the code wrote. Both pieces need to know the format so when one changes, the other must as well.
unit A
write_csv_format();
end
unit B // in another file, probably
read_csv_format();
end
Control coupling
might look like:
// unit A
function do(what){
if(what == 1) do_wop;
else if (what == 2) ba_ba_da_da_da_do_wop;
}
// unit B
A.do(1);
Stamp coupling (Data-structured coupling)
involved disparate pieces of code touching the same data structure in different ways. For example:
employee = { :age => 24, :compensation=> 2000 }
def age_range(employee)
range = 1 if employee[:age] < 10
range = 2 if employee[:age] > 10 && < 20;
...
return range
end
def compensation_range(employee)
... only relies on employee[:compensation] ...
end
The two functions don't need the employee structure, but they rely on it and if it changes, those two functions
have to change. It's much better to just pass the values and let them operate on that.
Data coupling
is starting to get to where we need to be. One module depends on another for data. It's a typical function call with parameters:
// in module A
B.add(2, 4)
Message coupling
looks like data coupling, but it's even looser because two modules communicate indirectly without ever passing
each other data. Method calls have no parameters, in other words.
No coupling, like Wikipedia says, is when "modules do not communicate at all with one another."
There is no dependency from code A to code B.
Concluding Remarks
So how do we reconcile the thought that "if I separate code to increase functional cohesion, I introduce dependencies
which is going to increase coupling" with the assertion that low coupling and high cohesion go hand in hand? To do that,
you must recognize that the dependencies already exist. Perhaps not at the class level, but they do at the lines of
code level. By pulling them out into related units, you remove the spaghetti structure (if you can call it that)
and turn it into something much more manageable.
A system of code can never be completely de-coupled unless it does nothing. Cohesion is a different story.
I can't claim that your code cannot be perfectly cohesive, but I can't claim that it can. My belief is it
can be very close, but at some point you'll encounter diminishing returns on your quest to make it so.
The key takeaway is to start looking at your code and think about what you can do to improve it as you notice
the relationships between each line you write start take shape.
Comments and corrections(!) are encouraged.
What are your thoughts?
Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!