Clean Development Series: Part 3, Dirty Code (how to spot/smell it)
Lately I have been talking about clean application development, and how developers can do a better job of it. Due to the strong focus on this I decided to write some content from my latest talks into a series of blog posts. Enabling everyone to reap the benefits of the subject matter, and allow attendees of my talks to refresh the content covered at one of the events I spoke at.
In part 1 of this series we discussed some causes of dirty code, and then talked about the things that dirty code can cause. In part 2 we followed that by giving a possible scenario of how developers get caught in a situation where they end up with dirty code. Now in this part we will cover how to spot dirty code, and some possible situations that can be a bad “smell” and highlight potential problems.
Most experienced developers, and development “how to” books, have come to associate “smells” to help find dirty code. Meaning that most dirty code can be found by looking for certain clues/smells, which like a bad odor can tell us something rotten is nearby. When something smells we usually are forced to deal with the problem. (If milk smells bad we throw it out.) The same is true of smells in our code. Below are some common stenches we may encounter to help point out problems.
How to spot “common” dirty code
NOTE: All of the items below are not necessarily dirty code, but are merely indications that something may be wrong and require more investigation.
- Incomplete error handling – While the lack of good error handling doesn’t automatically mean the code is dirty, it is usually a good sign that the developer did not care enough to complete things. It’s only reasonable to think this carelessness probably washed over to other areas. End-users should never see “raw” error messages, and developers should be alerted to errors.
- Memory leaks – If PHP faces a memory limitation it is usually a good sign there is bad code lurking under the hood. We kick off an application and it is churning along, then BAM the browser goes all white. There is no indication of a problem, except we get no results when something was expected. Error reporting was turned on (in development), but returned nothing. This is typical of hitting a memory limit. In most cases this means memory was being used frivolously and not returned…poor coding. (Do not confuse this with the need to build large files or handle large chunks of data, though these can also be handled in a different manner in many cases.)
- Race conditions – Though this is rare, it is always a pain when it happens. There are many ways this can be caused. A simple explanation is the application execute, and expect an answer to be returned. However, the script continues along trying to use the return prior to it being ready. Perhaps the script is creating a file on the file-system, and tries to access the file before the file-system has finished creating it. The script failed, and when we check the file appeared to be there. So what happened? Or maybe we do a few AJAX calls back-to-back and our script is not sure which answer came first since AJAX doesn’t always return the results in the same order they were called.
- Inconsistent naming conventions – Whether it’s a class, a function, or a variable, we should ensure our application is consistent in naming convention. If we use StudlyCaps for the class name, do it everywhere. If we are using camelCase for functions, do it consistently. The same can be said for variableNames, that should use the same convention throughout the entire application to ensure those who follow behind us will be able to read our code. This will also help in certain situations where we may have named a function and a variable by a similar name. The convention will help us figure out what it actually is. Also we should ensure that our names are descriptive and meaningful, so we can pretty much tell what is happening simply by looking at the name. Doesn’t the function name ‘getFinanceCharge()’ make more sense than ‘charge()’?
- Inconsistent coding standard – Coding standards can encompass many areas, and there are a few different standards available (PEAR, Zend, PSR-2, etc.) for those who are not currently following one. (I will talk more about this in later parts.) The important thing is that we have a coding standard in place, and that our team is also using it on the same projects. There is nothing worse than starting to work on a new project and discovering that 3 different developers were working on it and they all used their own flavor of coding standard. We spend most of our time reading code versus writing code. A coding standard makes reading that much easier when it is consistent. For instance, putting the opening curly braces for a function on a new line throughout an entire class does make it easier to scroll through and pick out the functions starting point, if done consistently.
- Un-initialized variables – By allowing our application to run without initialized variables we are opening ourselves up to problems. In a loop an unset variable will retain the previous value it had from a previous iteration, causing our application to act in an inaccurate manner. This could also cause security issues where the application freely uses a variable that may have been set wrong early on, then did not initialize it again for later use.
- Code lacks clear purpose – If we look at our code and say, “Huh?”, it is usually a good sign that this “smell” is strong. Code within an application should be clear on what it is doing, rather than like a large puzzle to be solved. Often this is caused by poor planning, and “flying by the seat of our pants”, as we build the application.
- Large classes – Often if a class is very large it is usually because we have stuffed too many things into it. Perhaps we have combined two or more different types of things into it and we should have actually created other objects out of them. If a class is very long, and we find ourselves sifting through method names to figure out the one we need, we should start planning to break it up into more objects.
- Functions do too much – (more than one thing) A function should be “single minded” and do only one thing. For me this is a difficult one I often find myself fighting internally with, and often need to return to a function and refactor to break out new functions. But that’s OK. We should often refactor to simplify our code. This one goes hand in hand with unit testing (covered later), which should only test one thing. In order to make functions testable, and single minded, they should inherently be short. How short? We should be able to look at a function and immediately see that it does pretty much what we expected. Nothing fancy, or hard to follow. Functions should be short, to the point, and easy to understand at a glance. If a function is 100, 50 or even 30 lines long, how can we possibly tell what it is doing at a glance?
- Long parameter list – Referring back to the previous item, if a function/method is truly doing only one thing how many parameters do we really need? Long parameter lists are usually caused by trying to keep our application DRY (Don’t Repeat Yourself). We are working on a new part of our application and realize that we can reuse a function we created earlier, but it will require another bit of info to perform this new task. So we add another parameter to the list and alter the function to multi-task. Before we know it our list of parameters has grown to be 4, 5, maybe even 6. (I won’t go larger than that, because nobody should.) I would challenge that if a function’s parameter list is longer than 2 it is doing more than one thing and should be broken down into smaller functions. Agreed?
- Globals used – I will only say this once…Globals are dangerous, and should not be used. In a situation where we feel a Global must be used, we need to find another way…period! There are huge security risks, inconsistencies to be accounted for, not to mention it takes longer to code by using them. Remember, where do we spend most of our coding time? Yes, we spend most of our time reading code. If we need to track down a Global, it is hard to truly comprehend the amount of time we will spend tracking down where the Global may have been set. Imagine someone taking over an application from another developer who used Globals! Ouch!
- Too many comments – Yeah, yeah, everyone says that commenting code is good. So why do I feel it is a “smell”? I feel it can be a bad smell if our code is so complex that we need to explain it. If our functions/methods are only doing one thing, they tend to be small. And if we add well named variables and functions to a small amount of code doesn’t that mean the code can be “self documenting”? Instead of adding a comment, doesn’t it make more sense to simplify the code to be drop-dead simple and self documenting?
- Notices, Warnings, Fatal Errors – When I take over a project the first thing I do is turn on full error reporting and try to use the application. This is the best way to see how “responsible” the previous developer was, and will usually give a fast indication on what size of mess I got myself into. Developers should ALWAYS code with error reporting turn ON for their development environments. If I see a bunch of notices and errors I know there is no way the previous developer was doing this, and that means there are probably errors that were never fixed.
- Switch statements – A switch statement is not really a bad coding practice, but it could be a missed opportunity to use polymorphism. I have seen many applications in the past where the developer used switch statements in far too many places, and this made the application far less dynamic. If polymorphism had been used instead and broken down into functions it would have made the application far more flexible. A switch statement is also a sure sign we will need to return to the application at some point and edit the code when the customer requests that another criteria be added to the application. Properly done that can be alleviated.
- Temporary variables – We have all done it, and will continue to do it. Developers will use a temporary variable to carry a value for later use by the application. It is unavoidable in some places, but it can also be over-used and can lead to memory leaks if not handled properly. Temporary variable usage should be limited, and only used when absolutely necessary.
- No unit tests – Lack of unit tests are a sure sign that a developer was in a hurry. It is also a sure sign that every line of code was not necessarily analyzed to ensure it functioned as expected. It is also a good “smell” to highlight other potential issues, such as long functions, or functions that do too much, or large classes. Unit testing, done properly, tests only one thing and ensures that we pay closer attention to our classes and functions in order to make unit testing work best. If there are no tests it means we can get away with more, and be lazy.
While these are not the only “smells” that may be encountered, they are the most common indicators that ugly and spoiled things lurk inside. Like a spoiled package of meat, you don’t need to open the package to know something is very wrong. We should always analyze our code to look for these and other smells, then fix them quickly. Remember, one bad apple will ruin a bunch if left unattended. Don’t let these bad apples infect the rest of the code. Fix them, take them out before they multiply, leave the code better than when you arrived.
As I have said many times, we read code more than we write code. By eliminating smells we ensure that our reading, or other developers reading, of the code goes faster and enables creation of new code to be faster. Not to mention how this affects our reputation. If you write dirty code please believe that other developers WILL talk about you, and it can determine what jobs you get hired for in the future. So please, write clean code and clean up old code.