The DRY principal exists primarily to make enhancement and maintenance easier. The idea being that every time a duplication is introduced maintainability and extensibility goes down. Here are two examples:
Duplicated Code
public class Dog {
int barkCount = 0;
public void bark(){
println "bark";
barkCount++;
}
public void defendHouse(){
println "bark";
barkCount++;
println "run in circles";
}
}
While this example is somewhat primitive, you'll see that the logic in bark is duplicated in defendHouse. This is undesirable for several reasons:
- The code is harder to read (longer, more parts, developer has to notice duplication)
- The code is harder to maintain (a developer may forget to update defendHouse() logic when they update bark() logic.)
Both of these bullet points are big considerations in long lived software (Hint: all software is long lived) because they are recurring costs that are incurred with each change/read. This is even worse if the duplication happens over greater distances-- duplicated logic may be in different files or packages for example.
Duplicated Data
public class Person {
String birthDay = null;
Date birthDate = null;
public void setBirthDate(Date newDate){
birthDate = newDate;
birthDay = newDate.getDayOfWeek();
}
public void clearBirthDate(){
birthDate = null;
birthDay = null;
}
public String getBirthDay(){
if(newDate == null){
return null;
} else {
return newDate.getDayOfWeek();
}
}
}
The issue here is that the birthDay is a subset of birthDate. The biggest issues here are:
- Data integrity: a developer may fail to update one field when another field changes. It can be difficult to guarantee consistency (for example, if newDate.getDayOfWeek() throws an exception then the fields may get out of sync).
- Readability: This code is harder to read because a developer has to notice that birthDay and birthDate are associated (but only by convention).
For the sake of completeness, here are the two examples improved and my thoughts on when to violate the DRY principal...
Cleaned up: Duplicated Code
public class Dog {
int barkCount = 0;
public void bark(){
println "bark";
barkCount++;
}
public void defendHouse(){
bark();
println "run in circles";
}
}
Cleaned up: Duplicated Data
public class Person {
Date birthDate = null;
public void setBirthDate(Date newDate){
birthDate = newDate;
}
public void clearBirthDate(){
birthDate = null;
}
public String getBirthDay(){
if(newDate == null){
return null;
} else {
return newDate.getDayOfWeek();
}
}
}
Additional Thoughts
So when is it okay to duplicate code/data? This section is going to be heavily based on my experiences/opinions, so be ready to disagree.
- For very simple code (like simple expressions) duplication may be acceptable. This is only true if the expression is trivial to read, hard to get wrong, and not easily grouped with some logical entity nearby.
- When the language doesn't support abstractions to remove the duplication. For example, because Java doesn't have closures it can be wearying to remove duplication from comparators and other 'function-objects'. Certain kinds of duplication are common as a result.
- Once a performance issue has been experienced data duplication may be needed to speed things up.
- You don't have enough time to get it 'just right'. This point is really more about picking your battles. Some kinds of duplication are more hazardous than others. Often times, a change in requirements can force duplication into a well designed system. The only fix may be expensive. In these circumstances it makes sense to talk to your team/managers and decide how important the fix is.