My Secret Life as a Spaghetti Coder
home | about | contact | privacy statement | getting started with cfrails
low cou-pling and high co-he-sion
n.
  1. A standard bit of advice for people who are learning to design their code better, who want to write software with intention as opposed to coincidence, often parroted by the advisor with no attempt to explain the meaning.

Motivation

It's a great scam, don't you think? Someone asks a question about how to design their code, and we have these two nebulous words to throw back at them: coupling and cohesion. We even memorize a couple of adjectives that go with the words: low and high.

Cohesion Good. Coupling, Baaaaad!

Metallica Good, Napster Bad.

It's great because it shuts up the newbie who asks the question -- he doesn't want to appear dumb, after all -- and it gets all of those in-the-know to nod their heads in approval. "Yep, that's right. He's got it. +1."

But no one benefits from the exchange. The newbie is still frustrated, while the professional doesn't give a second thought to the fact that he probably doesn't know what he means. He's just parroting back the advice that someone gave to him. It's not malicious or even conscious, but nobody is getting smarter as a result of the practice.

Maybe we think the words are intuitive enough. Coupling means that something is depending on something else, multiple things are tied together. Cohesion means ... well, maybe the person asking the question heard something about it in high school chemistry and can recall it has something to do with sticking together. Maybe they don't know at all.

Maybe, if they're motivated enough (and not that we've done anything to help in that department), they'll look it up:

Types of Cohesion and Coupling

Types of Cohesion
Coincidental cohesion (worst) is when parts of a module are grouped arbitrarily (at random); the parts have no significant relationship (e.g. a module of frequently used functions).

Logical cohesion is when parts of a module are grouped because they logically are categorised to do the same thing, even if they are different by nature (e.g. grouping all I/O handling routines).

Temporal cohesion is when parts of a module are grouped by when they are processed - the parts are processed at a particular time in program execution (e.g. a function which is called after catching an exception which closes open files, creates an error log, and notifies the user).

Procedural cohesion is when parts of a module are grouped because they always follow a certain sequence of execution (e.g. a function which checks file permissions and then opens the file).

Communicational cohesion is when parts of a module are grouped because they operate on the same data (e.g. a module which operates on the same record of information).

Sequential cohesion is when parts of a module are grouped because the output from one part is the input to another part like an assembly line (e.g. a function which reads data from a file and processes the data).

Functional cohesion (best) is when parts of a module are grouped because they all contribute to a single well-defined task of the module
Types of Coupling
Content coupling (high) is when one module modifies or relies on the internal workings of another module (e.g. accessing local data of another module). Therefore changing the way the second module produces data (location, type, timing) will lead to changing the dependent module.

Common coupling is when two modules share the same global data (e.g. a global variable). Changing the shared resource implies changing all the modules using it.

External coupling occurs when two modules share an externally imposed data format, communication protocol, or device interface.

Control coupling is one module controlling the logic of another, by passing it information on what to do (e.g. passing a what-to-do flag).

Stamp coupling (Data-structured coupling) is when modules share a composite data structure and use only a part of it, possibly a different part (e.g. passing a whole record to a function which only needs one field of it). This may lead to changing the way a module reads a record because a field, which the module doesn't need, has been modified.

Data coupling is when modules share data through, for example, parameters. Each datum is an elementary piece, and these are the only data which are shared (e.g. passing an integer to a function which computes a square root).

Message coupling (low) is the loosest type of coupling. Modules are not dependent on each other, instead they use a public interface to exchange parameter-less messages (or events, see Message passing).

No coupling [is when] modules do not communicate at all with one another.

What does it all mean?

The Wikipedia entries mention that "low coupling often correlates with high cohesion" and "high cohesion often correlates with loose coupling, and vice versa."

However, that's not the intuitive result of simple evaluation, especially on the part of someone who doesn't know in the first place.

In the context of the prototypical question about how to improve the structure of code, one does not lead to the other. By reducing coupling, on the face of it the programmer is going to merge unrelated units of code, which would also reduce cohesion. Likewise, removing unrelated functions from a class will introduce another class on which the original will need to depend, increasing coupling.

To understand how the relationships become inversely correlated requires a larger step in logic, where examples of the different types of coupling and cohesion would prove helpful.

Examples from each category of cohesion

Coincidental cohesion often looks like this:

class Helpers;

class Util;

int main(void) {
  where almost all of your code goes here;
  return 0;
}

In other words, the code is organized with no special thought as to how it should be organized. General helper and utility classes, God Objects, Big Balls of Mud, and other anti-patterns are epitomes of coincidental cohesion. You might think of it as the lack of cohesion: we normally talk about cohesion being a good thing, whereas we'd like to avoid this type as much as possible.

(However, one interesting property of coincidental cohesion is that even though the code in question should not be stuck together, it tends to remain in that state because programmers are too afraid to touch it.)

With logical cohesion, you start to have a bit of organization. The Wikipedia example mentions "grouping all I/O handling routines." You might think, "what's wrong with that? It makes perfect sense." Then consider that you may have one file:

IO.somelang
function diskIO();
function screenIO();
function gameControllerIO();

While logical cohesion is much better than coincidental cohesion, it doesn't necessarily go far enough in terms of organizing your code. For one, we've got all IO in the same folder in the same file, no matter what type of device is doing the inputting and outputting. On another level, we've got functions that handle both input and output, when separating them out would make for better design.

Temporal cohesion is one where you might be thinking "duh, of course code that's executed based on some other event is tied to that event." Especially considering the Wikipedia example:
a function which is called after catching an exception which closes open files, creates an error log, and notifies the user.
But consider we're not talking about simple the relationship in time. We're really interested in the code's structure. So to be temporally cohesive, your code in that error handling situation should keep the closeFile, logError, and notifyUser functions close to where they are used. That doesn't mean you'll always do the lowest-level implementation in the same file -- you can create small functions that take care of setting up the boilerplate needed to call the real ones.

It's also important to note that you'll almost never want to implement all of that directly in the catch block. That's sloppy, and the antithesis of good design. (I say "almost" because I am wary of absolutes, yet I cannot think of a situation where I would do so.) Doing so violates functional cohesion, which is what we're really striving for.

Procedural cohesion is similar to temporal cohesion, but instead of time-based it's sequence-based. These are similar because many things we do close together in time are also done in sequence, but that's not always the case. There's not much to say here. You want to keep the definitions of functions that do things together structurally close together in your code, assuming they have a reason to be close to begin with. For instance, you wouldn't put two modules of code together if they're not at least logically cohesive to begin with. Ideally, as in every other type of cohesion, you'll strive for functional cohesion first.

Communicational cohesion typically looks like this:

some lines of code;
data = new Data();
function1(Data d) {...};
function2(Data d) {...};
some more lines of code;

In other words, you're keeping functions together that work on the same data.

Sequential cohesion is much like procedural and temporal cohesion, except the reasoning behind it is that functions would chain together where the output of one feeds the input of another.

Functional cohesion is the ultimate goal. It's The Single Responsibility Principle [PDF] in action. Your methods are short and to the point. Ones that are related are grouped together locally in a file. Even files or classes contribute to one purpose and do it well. Using the IO example from above, you might have a directory structure for each device, and within it, a class for Input and one for Output. Those would be children of abstract I/O classes that implemented all but the device-specific pieces of code.

(I use inheritance terminology here only as a subject that I believe communicates the idea to others. Of course, you don't have to even have inheritance available to you to achieve the goal of keeping device agnostic code in one locale while keeping the device specific code apart from it).

Examples from each category of coupling

Content coupling is horrific. You see it all over the place. It's probably in a lot of your code, and you don't realize it. It's often referred to a violation of encapsulation in OO-speak, and it looks like one piece of code reaching into another, without regard to any specified interfaces or respecting privacy. The problem with it is that when you rely on an internal implementation as opposed to an explicit interface, any time that module you rely on changes, you have to change too:

module A
  data_member = 10
end

module B
  10 * A->data_member end

What if data_member was really called num_times_accessed? Well, now you're screwed since you're not calculating it.

Common coupling occurs all the time too. The Wikipedia article mentions global variables, but this could be just a member in a class where two or more functions rely on it if you consider it. It's not as bad when its encapsulated behind an interface, where instead of accessing the resource directly, you do so indirectly, which allows you to change internal behavior behind the wall, and keeps your other units of code from having to change every time the shared resource changes.

An example of external coupling is a program where one part of the code reads a specific file format that another part of the code wrote. Both pieces need to know the format so when one changes, the other must as well.

unit A
  write_csv_format();
end

unit B // in another file, probably
  read_csv_format();
end

Control coupling might look like:
// unit A
function do(what){
  if(what == 1) do_wop;
  else if (what == 2) ba_ba_da_da_da_do_wop;
}

// unit B
A.do(1);

Stamp coupling (Data-structured coupling) involved disparate pieces of code touching the same data structure in different ways. For example:

employee = { :age => 24, :compensation=> 2000 }

def age_range(employee)
  range = 1 if employee[:age] < 10
  range = 2 if employee[:age] > 10 && < 20;
  ...
  return range
end

def compensation_range(employee)
  ... only relies on employee[:compensation] ...
end

The two functions don't need the employee structure, but they rely on it and if it changes, those two functions have to change. It's much better to just pass the values and let them operate on that.

Data coupling is starting to get to where we need to be. One module depends on another for data. It's a typical function call with parameters:

// in module A
B.add(2, 4)

Message coupling looks like data coupling, but it's even looser because two modules communicate indirectly without ever passing each other data. Method calls have no parameters, in other words.

No coupling, like Wikipedia says, is when "modules do not communicate at all with one another." There is no dependency from code A to code B.

Concluding Remarks

So how do we reconcile the thought that "if I separate code to increase functional cohesion, I introduce dependencies which is going to increase coupling" with the assertion that low coupling and high cohesion go hand in hand? To do that, you must recognize that the dependencies already exist. Perhaps not at the class level, but they do at the lines of code level. By pulling them out into related units, you remove the spaghetti structure (if you can call it that) and turn it into something much more manageable.

Increasing cohesion and decreasing coupling.

A system of code can never be completely de-coupled unless it does nothing. Cohesion is a different story. I can't claim that your code cannot be perfectly cohesive, but I can't claim that it can. My belief is it can be very close, but at some point you'll encounter diminishing returns on your quest to make it so.

The key takeaway is to start looking at your code and think about what you can do to improve it as you notice the relationships between each line you write start take shape.

Comments and corrections(!) are encouraged. What are your thoughts?

Hey! Why don't you make your life easier and subscribe to the full post or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!


Comments
Leave a comment

Thanks for this Sam.

Just two questions ...

In your opinion ...
i). Is Control Coupling an acceptable form of Coupling ? State rationale.
ii). Is Communication Cohesion an acceptable form of Cohesion. State rationale.

Many thnx.

Posted by John Barrett on Jan 07, 2010 at 07:32 AM UTC - 6 hrs

@John: as with most things in programming, the answer resides somewhere between the extremes of yes and no.

In the case of control coupling, in general I would say it is unacceptable. If I have a function:

doThisOrThat(whichOne);

I'd generally rather see two functions:

doThis();
doThat();

However, what happens in the case that "this" and "that" are very similar? I wouldn't violate DRY to have two functions that are not control-coupled. So, if the this and that are very cohesive, I'd keep them together.

Regarding communication cohesion: It's acceptable in that I think it's a good thing to keep things like that together - if you're working with a change in the data object, often you might need to change the functions that work on it.

However, if you can see a better way to split it up, I definitely would. It's kind of lazy. It's better than no cohesion, but if you can drill down further, you definitely should.

Posted by Sammy Larbi on Jan 19, 2010 at 09:07 AM UTC - 6 hrs

Leave a comment

Leave this field empty
Your Name
Email (not displayed, more info?)
Website

Comment:

Subcribe to this comment thread
Remember my details
Google
Web CodeOdor.com

Me
Picture of me