Monday, December 15, 2008

Open Your Eyes!

Here's a sweet piece of code (names have been changed to protect the guilty):


if (calculated_condition == some_test()) {
// Ancient comment.
// T intermediate = partial_value(val);
// calculated_val = more_calculations(intermediate);
// Merely elderly comment.
// calculated_val = better_calculations(val);
calculated_val = val;
}
else {
calculated_val = val;
}


Good lord! Maybe if there wasn't a compulsion to comment out old code, it would have been clearer what was happening here, and this would have been replaced with the one-liner. Instead, a few years later I have to spend a couple minutes reading the whole thing, maybe pausing right at the beginning to figure out when the condition is met, only to find that none of it matters.

It's even more fun when several lines of code are commented with a single "/*", or the hated "#if 0". Then as you're grepping to figure out where something decrepit like more_calculations() is used, it looks like it's used in hundreds of places.

Delete dead code!

Friday, December 12, 2008

C++ Coding Standard: Complete

Over the course of several weeks, I presented 12 sections of a C++ coding standard, along with some rationalizing verbiage. For your convenience, here is is in one shot, with no adornment.

The standard may be food for thought for you; or you might find it laughably pompous or incomplete. But I hope that you'll get your team to agree to some written standard, and use it to speed code reviews by letting reviewers make their points with little more than a chapter and verse -- something like "line 435 violates 3.8".
  • 1. Source files.
    • 1.1. Every class must have a header file called "{class}.hpp".
    • 1.2. Every class must have an implementation file called "{class}.cpp".
      • 1.2.1. Exception to 1.2: template class with only inline members.
      • 1.2.2. Exception to 1.1 and 1.2: class in anonymous namespace.
    • 1.3. {class}.hpp and {class}.cpp must be in the same directory.
    • 1.4. {class}.hpp must define one and only one class.
      • 1.4.1. Exception to 1.4: nested classes or structs.
    • 1.5. Filenames must match the class names exactly, including case.
  • 2. Header file layout.
    • 2.1. Header files must begin with the standard C++ file comment.
    • 2.2. Then comes the "#ifndef" of the include guard.
      • 2.2.1. Guard macro format _<DIR>_<CLASS>_H (all caps).
    • 2.3. Then include directives and external class declarations (optional).
    • 2.4. Then outside-the-class typedefs (optional).
    • 2.5. Then the class definition.
    • 2.6. Then other interface declarations (optional).
    • 2.7. Then inline function definitions (optional).
    • 2.8. Then the "#endif" of the include guard.
  • 3. Header file contents.
    • 3.1. A .hpp must never include an implementation file.
    • 3.2. A .hpp must include every file needed to compile itself.
    • 3.3. A .hpp must not include files not needed to compile itself.
    • 3.4. Class definitions must be preceded by a documentation comment.
    • 3.5. Function declarations must be preceded by a documentation comment.
      • 3.5.1. Document function pre- and postconditions when relevant.
    • 3.6. A .hpp must not declare variables or constants outside the class.
    • 3.7. A header file must not declare functions outside the class.
      • 3.7.1. Exception to 3.7: operators or template specializations may sometimes be declared outside the class.
    • 3.8. A header file must not use "using namespace" directives.
    • 3.9. Any functions declared in the .hpp which are to be inline, must be defined in the .hpp.
  • 4. Implementation file layout has only a few restrictions.
    • 4.1. Implementation files must begin with the standard C++ file comment.
    • 4.2. The first non-comment line of {class}.cpp must be: #include "{class}.hpp".
    • 4.3. {class}.cpp definitions should be in the same order as {class}.hpp declarations.
  • 5. Implementation file (.cpp) contents.
    • 5.1. A .cpp must never include an implementation file.
    • 5.2. A .cpp must not include files not needed to compile itself.
    • 5.3. Every non-inline function declared in {class}.hpp must be defined in {class}.cpp.
      • 5.3.1. Exception to 5.3: copy constructor and assignment operator may be declared private and not implemented, to override C++ defaults.
    • 5.4. Every static member variable declared in {class}.hpp must be defined in {class}.cpp.
    • 5.5. Anything defined in {class}.cpp which is not declared in {class}.hpp must be in the anonymous namespace.
    • 5.6. Every anonymous namespace function must be preceded by a documentation comment.
  • 6. Class design.
    • 6.1. Non-static data members must be private.
    • 6.2. A class must implement at least one public constructor.
    • 6.3. A class must declare its copy constructor and assignment operator.
      • 6.3.1. Exception to 6.3: not if declared private in base class.
    • 6.4. Member functions which do not change data members must be declared "const".
    • 6.5. Protected members should not be used.
    • 6.6. Consider changing private member functions to anonymous namespace functions in the implementation file.
    • 6.7. Classes must not declare "friend" classes.
    • 6.8. Classes should not declare "friend" functions for non-inline functions.
    • 6.9. Declare blank constructor if and only if a client needs it.
    • 6.10. Consider calculating data instead of storing it.
    • 6.11. Data members allocated on the heap must have type auto_ptr<T> or boost::shared_ptr<T>.
    • 6.12. Member functions marked "const" must be idempotent.
  • 7. Class layout.
    • 7.1. All public things first, then all protected, then all private.
    • 7.2. Within a public or private section, use the following order:
      • 7.2.1. First any nested type definitions or typedefs.
      • 7.2.2. Then constructors, destructors, and the assignment operator.
      • 7.2.3. Then any const member functions.
      • 7.2.4. Then any non-const member functions.
      • 7.2.5. Then any static member functions.
      • 7.2.6. Then any data members.
    • 7.3. Define inline functions outside of the class definition.
  • 8. Inheritance.
    • 8.1. Public inheritance must only be used to model the "is a" relation.
    • 8.2. Use private and protected inheritance sparingly, and only to model the "looks like a" relation.
    • 8.3. Only inherit publicly from an abstract base class.
    • 8.4. Destructors of public base classes must be pure virtual (but implemented).
    • 8.5. Don't use multiple inheritance.
  • 9. Preprocessor issues.
    • 9.1. Preprocessor macros should be rarely used.
      • 9.1.1. Obvious exception to 9.1: multiple-include guard.
      • 9.1.2. Use enum constants instead of constant macros.
      • 9.1.3. Use inline functions instead of function macros (when possible).
    • 9.2. Never use a macro to invent new syntax.
    • 9.3. Conditional compilation should be rarely used.
    • 9.4. Operating system #includes use angle brackets, e.g. <unistd.h>.
    • 9.5. Non-operating system #includes use double quotes, e.g. "Object.hpp".
    • 9.6. Use the standard C++ system header files, e.g. <string> not <string.h>.
  • 10. Naming conventions.
    • 10.1. Constant names must be in all caps.
    • 10.2. Private data and function member names must begin with '_'.
    • 10.3. Type names must begin with a capital letter.
    • 10.4. Function macro names must be in all caps.
    • 10.5. Names not covered above must begin with a lower-case letter.
    • 10.6. Do not use underscores and mixed-case in a single name.
    • 10.7. Typedefs must not obscure the fact that a type is a pointer, reference, or array type.
    • 10.8. All class definitions must be within a namespace.
  • 11. Function design.
    • 11.1. Constructors must initialize all data members in the initialization list, and in the order they're declared in the class definition.
    • 11.2. Class member functions which return a pointer or reference to a data member, must return it as "const T*" or "const T&".
    • 11.3. A class function must not deallocate memory that was allocated outside of the class implementation file.
    • 11.4. A class destructor must deallocate all of the memory allocated for data members of the instance.
    • 11.5. Function arguments which are pointer, array, or reference types, which are not changed by the function, must be declared "const".
    • 11.6. A function must contain at most one "return" statement.
    • 11.7. Do not use "goto" or "continue".
    • 11.8. Only use "break" in switch statements.
    • 11.9. Declare variables at the latest possible location.
    • 11.10. Use assert() to check for programming or interface errors.
    • 11.11. Functions with "true/false" semantics must return "bool".
    • 11.12. Declare default arguments in the .hpp file, not the .cpp file.
    • 11.13. Pass objects (especially containers) by (const) reference, not value.
    • 11.14. Return objects (especially containers) by value, unless it makes sense to return a const reference.
    • 11.15. Exceptions must be caught by reference, not by value or pointer.
    • 11.16. When re-throwing an exception, use "throw;", not "throw ex;".
    • 11.17. Destructors must catch all exceptions and throw none.
    • 11.18. Exception specifications must not be used.
      • 11.18.1. Exception to 11.18: a destructor which catches all exceptions can be marked with "throw()".
  • 12. Other stylistic issues.
    • 12.1. Mark 64-bit constants "LL" or "ULL", e.g. 0xffffffffffffffffULL.
    • 12.2. Mark 32-bit constants >= 0x80000000 "U", e.g. 0xffffffffU.
    • 12.3. Use C++-style typecasts instead of C-style typecasts.
    • 12.4. A source file must not include tab characters (only spaces).
    • 12.5. Lines must contain no more than 80 characters, including the (invisible) newline character at the end.
    • 12.6. A source line must not contain two statements.
    • 12.7. Use 2 spaces per level of indentation.
    • 12.8. Indent the body of a compound statement ("if", "while", etc.) one level more than the head of the statement.
    • 12.9. An "else" must have the same indentation as its "if".
    • 12.10. Two statements in a sequence must have the same indentation.
    • 12.11. A comment is indented the same as the statement following it.
    • 12.12. No character must appear to the right of a brace ("{" or "}").
    • 12.13. A closing right brace must be indented to the same level as the line containing its corresponding left brace.
    • 12.14. There must be no space between the function name and the left parenthesis of a function call (e.g. "printf(...)").
    • 12.15. There must be a space between a keyword and a left parenthesis (e.g. "if (x != NULL)").
    • 12.16. Do not use C-style comments (except for documentation comments).
    • 12.17. Do not comment out or "#if 0" dead code. Remove it.
    • 12.18. Do not use external include guards.

Thursday, December 4, 2008

Tcl is Still a Toy Language

Let's extend the chain of complaints about Tcl.

6. Expressions sometimes have to be wrapped in [expr {}], but not always:


if { $x == "a" || $x == "b" } {
return [expr {$x == "a"}]
}


7. Variable argument lists are handled with a special argument name: "args". It's not a keyword, just a variable treated differently from every other variable. That's just demented.

Monday, December 1, 2008

Stupid, stupid makedepend

One of the most frustrating, time-wasting debugging experiences you can have is when the code you thought you compiled didn't really get compiled. Or something you didn't even think about because someone else updated it didn't get compiled.

This is why the utility makedepend was introduced, so that compile dependencies get taken care of automatically, every time. So why is the Linux makedepend so stupid that it doesn't work every time?!?!

In the interest of speed, the Linux makedepend was written to parse include files only once per run (usually this is per directory). That's usually OK, but when the include file is preprocessed differently by different files (or multiple times in one file), it doesn't work. I thought the first rule of optimization was: make the common case fast, but make every case work.

Moral: use the dependency scheme described in the GNU make info pages. It goes like this:


# In the variables area:
sources = foo.cpp bar.cpp ...
OBJS = $(sources:.cpp=.o)

# In the targets area, somewhere after "all:"
ifneq ($(MAKECMDGOALS),clean)
include $(sources:.cpp=.d)
endif

%.d: %.cpp
@(set -o pipefail; \
$(CXX) -MM $(CXXFLAGS) $< \
| sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' \
> $@;) || $(RM) $@


Of course this assumes that CXX = g++. By the way, that little gem about not building the dependencies during "clean" is in the info pages, but not the page that describes the generic dependency rules. Yes, I have read every make info page.

The "pipefail" bit is my own little contribution; it saves you some headaches when the g++ command fails when you're not watching, because the sed command will succeed and allow make to continue ignorantly on its way. That's a bash option, so it might not work on platforms other than Linux.

Monday, November 10, 2008

Timestamped Shell History

Occasionally I save myself a lot of head-scratching by being able to refer to a timestamped history of my work in a particular shell session. You know the feeling, when you say either "I thought I already did that" or "Didn't I do X before Y?" Of course, usually you can piece together a timeline by looking at file modification times or revision control logs. Sometimes you absolutely cannot figure out the history -- a deleted file has no modification time -- and other times it would just be handy to have a transcript telling you when you did what.

Here are the ingredients to keeping a timestamped history:
  1. Emacs shell buffer: M-x shell.
  2. Timestamp script running in the background.
  3. Lisp code to remove shell output, leaving prompts and history.
Once you get used to running your shell within emacs, you'll never go back. It keeps a complete history for you, which you can search or navigate just like any emacs buffer. History keystrokes -- M-p, M-n, and M-r as opposed to most shells' C-p, C-n, C-r -- survive past subshells, su, ssh, and even things like ftp. If you need multiple shell buffers (say, on different hosts), use M-x rename-buffer.

The hitch is that after a few weeks in your emacs session, a shell buffer can grow very large. Here is a Lisp function to put in your .emacs which knocks it back down to size (make sure the emacs shell-prompt-pattern variable matches your shell prompt):


(defun engisneering-clean-shell-buffer ()
(interactive)
(let ((beg (point-min)))
(goto-char (point-min))
(if (> 10000 (buffer-size)) (buffer-disable-undo))
(while (re-search-forward shell-prompt-pattern nil t)
(if (not (= (point) (line-end-position)))
(progn
(forward-line 0) ;; beginning of line, ignore prompt boundary
(if (not (= (point) beg))
(delete-region beg (point)))
(forward-line 1)
(setq beg (point)))))
(buffer-enable-undo)
)
)


That will also be useful when you're ready to save your timestamped history. The timestamping is easy -- put this shell script into a file called "timestamp", and give it execute permission:


#!/bin/bash

while ((1)); do
echo -n "timestamp> "
date
echo -n "prompt> "
sleep 3600
done


There are two things to note about the script. First, when it outputs a timestamp, it will match the prompt pattern, so that it won't get deleted by our Lisp cleaning function. Second, it ends with "prompt> " so that the next time you hit return, emacs doesn't think the output from the timestamp is part of your next command.

So, whenever you start a shell buffer, run "timestamp&", to get the date and time printed out once an hour while you do other things.

Finally, here is a little script to help you save your shell histories. Create a directory to hold your histories; name the history files after the host the shell was on, and date them (including the year). Save them periodically, just like any of your work.


(defun engisneering-save-shell-buffer ()
(interactive)
(let ((name (buffer-name)))
(save-buffer 0)
(setq buffer-file-name nil)
(rename-buffer name)
(buffer-disable-undo)
(delete-region (point-min) (point-max))
(buffer-enable-undo)
)
)

Thursday, October 30, 2008

All About Emacs Macros

Emacs macros allow you to record a repetitive sequence of keystrokes, so they can be replayed. It's a pretty basic and useful emacs skill, but there are a couple of commands related to macros that can make them even more useful to you.

To record a macro, type "C-x (" (if you're new to emacs, C-x means "control-x"; M-x means "meta-x", usually entered by holding down the "Alt" key while pressing "x"). Then everything you type will be recorded until you type "C-x )". The string "Def" appears in the emacs status line while the macro is being recorded. Macro recording is aborted by any error operation -- basically anything that makes the bell ring.

Macros can contain any sequence of keystrokes, including control and meta characters (hence searches and other commands). After you type "C-x )", your macro can be replayed by typing "C-x e". If your macro is such that it leaves the cursor in a place where it makes sense to run the macro again, you can run many iterations of the macro by using a prefix argument. For example, to run the macro 100 times, type "C-u 1 0 0 C-x e".

Saving Macros

One good thing often leads to another, and sometimes you find yourself wanting to use two macros at the same time. No problem, the command "M-x name-last-kbd-macro" lets you give a name to a macro that you've entered. Then, instead of typing "C-x e" you type "M-x" and the name you gave your macro.

If you have a macro that you think will be useful to you again and again, you can save it in your .emacs file. Name the macro, then open your .emacs file, and run "M-x insert-kbd-macro". Lisp code which defines and names your macro is written into the file, so the command will be available to you forever.

Editing macros

Sometimes you go to a lot of effort to record a macro but you have one typo in it, or can make it more useful by tweaking it a little. In that case, there is a way to edit either a named or unnamed keyboard macro. Just use "M-x edit-named-kbd-macro" or "M-x edit-last-kbd-macro".

Wednesday, October 15, 2008

C++ Coding Standard: Other Stylistic Issues

I can't believe it! The last section of the C++ coding standard. Mostly these are issues of clarity, readability, or grep-ability.
  • 12.1. Mark 64-bit constants "LL" or "ULL", e.g. 0xffffffffffffffffULL.
  • 12.2. Mark 32-bit constants >= 0x80000000 "U", e.g. 0xffffffffU.
  • 12.3. Use C++-style typecasts instead of C-style typecasts.
  • 12.4. A source file must not include tab characters (only spaces).
  • 12.5. Lines must contain no more than 80 characters, including the (invisible) newline character at the end.
  • 12.6. A source line must not contain two statements.
  • 12.7. Use 2 spaces per level of indentation.
  • 12.8. Indent the body of a compound statement ("if", "while", etc.) one level more than the head of the statement.
  • 12.9. An "else" must have the same indentation as its "if".
  • 12.10. Two statements in a sequence must have the same indentation.
  • 12.11. A comment is indented the same as the statement following it.
  • 12.12. No character must appear to the right of a brace ("{" or "}").
  • 12.13. A closing right brace must be indented to the same level as the line containing its corresponding left brace.
  • 12.14. There must be no space between the function name and the left parenthesis of a function call (e.g. "printf(...)").
  • 12.15. There must be a space between a keyword and a left parenthesis (e.g. "if (x != NULL)").
  • 12.16. Do not use C-style comments (except for documentation comments).
  • 12.17. Do not comment out or "#if 0" dead code. Remove it.
  • 12.18. Do not use external include guards.
Everyone has an opinion about where the braces go on if, while, etc. I try to avoid religious battles like that, and allow you to put the opening brace on the same line or the next line, as you wish. However, 12.12 does prohibit the despicable Tcl style of "} else {". Code is much more readable if the else lines up with the if. The same reasoning leads to 12.13: line up the closing brace with the line that opened it. This works no matter where you put the opening brace.

Of course 12.7 can be tailored to suit your needs: I chose 2 because I think less is more when it comes to indentation. However, 12.4 is absolute. Tabs mess everything up.

Although this section is mostly about style, there is a practical reason for 12.6. The effectiveness of measuring line coverage is diminished if there can be more than one statement per line.

Friday, October 10, 2008

C++ Coding Standard: Function Design

We're almost done laying out the sections of the C++ coding standard. All of your C++ code lies in some function or other. Section 11 provides some rules on what those functions should look like.
  • 11.1. Constructors must initialize all data members in the initialization list, and in the order they're declared in the class definition.
  • 11.2. Class member functions which return a pointer or reference to a data member, must return it as "const T*" or "const T&".
  • 11.3. A class function must not deallocate memory that was allocated outside of the class implementation file.
  • 11.4. A class destructor must deallocate all of the memory allocated for data members of the instance.
  • 11.5. Function arguments which are pointer, array, or reference types, which are not changed by the function, must be declared "const".
  • 11.6. A function must contain at most one "return" statement.
  • 11.7. Do not use "goto" or "continue".
  • 11.8. Only use "break" in switch statements.
  • 11.9. Declare variables at the latest possible location.
  • 11.10. Use assert() to check for programming or interface errors.
  • 11.11. Functions with "true/false" semantics must return "bool".
  • 11.12. Declare default arguments in the .hpp file, not the .cpp file.
  • 11.13. Pass objects (especially containers) by (const) reference, not value.
  • 11.14. Return objects (especially containers) by value, unless it makes sense to return a const reference.
  • 11.15. Exceptions must be caught by reference, not by value or pointer.
  • 11.16. When re-throwing an exception, use "throw;", not "throw ex;".
  • 11.17. Destructors must catch all exceptions and throw none.
  • 11.18. Exception specifications must not be used.
    • 11.18.1. Exception to 11.18: a destructor which catches all exceptions can be marked with "throw()".
The first five rules are basic hygiene. Perhaps 11.3 needs a little explanation: it enforces symmetric deallocation by saying you shouldn't have to deallocate something allocated in another module.

The next three -- 11.6 - 11.8 -- impose structured-programming doctrine, which says that goto statements impede automated reasoning about functions (break, continue, and out-of-place return are synonyms for goto). Not many people are doing proofs of program correctness, but good structure also aids human reasoning about programs, as well as human debugging. Occasionally you can make a case for simplified functions by using an early return statement; or sometimes a goto is needed to improve loop performance. Make those be rare exceptions.

Passing objects by reference (11.13) avoids copy construction during function calls, which is usually a timesaver -- though you might pass by value if your first action inside the function is to make a copy anyway. Of course if an argument is to be modified within the function, the correct choice is a non-const reference. For return values (11.14), it's almost always right to return an object instead of a reference. You might be tempted to return const references for some trivial "get" members, but even then, if the internal data layout of the class changes sometime in the future, you'll have a lot of work to do chasing down the function calls because the return type will change. Better to just start off returning by value.

Finally, there are some things to keep in mind about exceptions. Scott Meyers points out that catching by value leaves you susceptible to slicing off data in polymorphic classes; catching by pointer brings up confusing issues of ownership and deallocation. Catching by reference is the only solution. For a similar reason, a simple "throw;" is preferred to throwing the exception by name: fewer chances to make a mistake or accidentally involve a copy constructor. As for exception specifications, it's tricky to get them correct, and they add little value. The best solution is to only allow them for the situation where we absolutely expect no exceptions: in a destructor.

Next: the final chapter! Stylistic issues.

Sunday, October 5, 2008

Obfuscated Directory Names

I have to rant about some confusing directory/file/object names in a project I inherited at work. The guy who did it is a competent, experienced engineer. This isn't an attack on him or his work habits. If there's a moral to this rant, it's that good developers can make this kind of mistake, so the rest of us have to be extra careful not to make it.

Now, on to my rant. This C++ project -- for purposes of anonymity let's call it the "zoo" project -- interfaces with a Tcl interpreter. One confusing thing about the project organization is that it doesn't follow Rules 1.1 and 1.2 of my C++ coding standard: each source file might define many different classes. That compounds some other class naming issues, like the presence of classes with ridiculously similar names, like "ZooObj" and "ZooObject", or naming files after an abstract base class, but changing the capitalization: zooObject.hpp.

But the real problem has to do with directory names that don't describe their contents well, and that all sound the same. Here's the directory structure:

...
zoo/
tclApi/
zooDb/
zooDb/
zooDbTcl/

The morpheme "db" here just means "I am defining a set of classes that model the zoo data". Well, no kidding, it's the zoo project. Things that go without saying should just not be said. In object-oriented design, responsibility for every function belongs to some class or another. You should never need special directories called "db" or "model" or "classes" or anything like that -- they don't add any information.

The next indication that something has gone wrong is that we have a directory called "zooDb/zooDb". That's one duplication that we could surely live without; you also have to wonder why there are two directories with "tcl" in the name. It turns out that one rationale for the directory structure is that there are a set of classes which will be compiled into a library with no dependencies on the larger project, so that it can provide a Tcl package to any interpreter. So maybe there's a method to this madness, we need the zoo/zooDb/zooDbTcl directory for that separate Tcl package code. Oops, turns out that's not the case, it's zoo/zooDb/zooDb that holds that. Arrrggggh!

What can we learn from this bad example? First, don't add directory structure where it's not needed. You probably need one directory for each object library you're building; any more than that just makes it confusing when someone is trying to walk through your code. Second, directory names and class names should not contain superfluous strings like "db", "model", or "class". Finally, don't add misleading strings -- like "tcl" in the example above -- that lead away from the directories or files that should be described with them.

If we apply all that, we end up with a much cleaner directory structure:

...
zoo/
zooTclPackage/

With that structure, when a new developer is looking for the classes that are in the dependency-free package, he knows just where to look. When he's looking for other zoo-project files, he knows where to look. It doesn't matter that some code interfacing with Tcl is mingled in a directory with stuff that knows nothing about Tcl -- like the food on your plate, it all gets mixed together when you eat it anyway. As an added benefit, the use of that flat hierarchy and putting every class in a separate header file will hopefully cut down on the temptation to create evil twins like zooObj and zooObject.

Saturday, October 4, 2008

C++ Coding Standard: Naming Conventions

Now the C++ coding standard takes on the contentious subject of naming conventions. Substitute your own conventions for these if you like, but conventions of some kind can make for more readable code.
  • 10.1. Constant names must be in all caps.
  • 10.2. Private data and function member names must begin with '_'.
  • 10.3. Type names must begin with a capital letter.
  • 10.4. Function macro names must be in all caps.
  • 10.5. Names not covered above must begin with a lower-case letter.
  • 10.6. Do not use underscores and mixed-case in a single name.
  • 10.7. Typedefs must not obscure the fact that a type is a pointer, reference, or array type.
  • 10.8. All class definitions must be within a namespace.
  • 10.9. Do not use negative names that begin with "no" or "not".
All-caps constants are such a common convention that you would be foolish to do otherwise -- remember to make your enum constants all-caps also. I extend the all-caps convention to preprocessor function macros also, because it is often useful to highlight the fact that you're using a macro instead of a real function.

I'm not crazy about Hungarian notation, like tacking a _p onto the end of pointer variables, but you might consider using a _t suffix for type names instead of the leading-capital convention of 10.3, since emacs will colorize the type for you. But only if the resulting names don't break 10.6.

Rule 10.6 only says not to have crazy names like underscore_MixedCase_name; there is no mention of whether to choose underscore_names or mixedCaseNames. You should choose one or the other so that you don't accidentally end up with homophone names like my_var and myVar. If your type names begin with a capital letter, you should probably go mixed-case; if they are denoted with _t, maybe underscore names are for you.

The prohibition on obscuring pointer/reference types is more than a simple stylistic issue. If you typedef Object *ObjectPtr, then const ObjectPtr does not mean const Object *, it means Object * const. The latter -- meaning an unchanging address to a changeable Object -- is of very little practical use. The former -- a pointer to an Object that can't be changed -- is useful both as a way to self-document your code and as a way to let the compiler catch silly mistakes when you try to change something you shouldn't.

Next: function design.

Wednesday, October 1, 2008

C++ Coding Standard: Preprocessor Issues

One of the strengths that C++ inherited from C is the preprocessor. Text macros and conditional compilation provide expressive power that can greatly simplify certain programming tasks. On the other hand, that expressive power can also lead to obfuscated code -- or worse, undebuggable code.

So our C++ coding standard guides you gently away from preprocessor abuse.
  • 9.1. Preprocessor macros should be rarely used.
    • 9.1.1. Obvious exception to 9.1: multiple-include guard.
    • 9.1.2. Use enum constants instead of constant macros.
    • 9.1.3. Use inline functions instead of function macros (when possible).
  • 9.2. Never use a macro to invent new syntax.
  • 9.3. Conditional compilation should be rarely used.
  • 9.4. Operating system #includes use angle brackets, e.g. <unistd.h>.
  • 9.5. Non-operating system #includes use double quotes, e.g. "Object.hpp".
  • 9.6. Use the standard C++ system header files, e.g. <string> not <string.h>.
Let's get the last three items out of the way first, since they're the easiest. The issue is that compilers do a slightly different search for header files depending on the form of the #include. So choose the correct form.

Section 9.1 suggests alternatives to preprocessor macros. What's the difference? For constants, the differences are that the enum name is visible from a debugger, and an accidental re-definition of a constant becomes a compile-time error, instead of -- if you're lucky -- a warning. For functions, again, you get visibility in the debugger if you use an inline function, but not if you use a function macro. Function macros can also be harder for humans to read, and you have to remember to put every instance of a macro argument into parenteses in the macro definition, to ensure that the text expansion has the same semantics as the macro reference.

When should you use preprocessor macros? One place that I find them useful is in specifying error messages. It would be handy to keep the message text together with the message name, and indeed to have a message enum in the program that also gets printed in the output. I'll describe the details of this in a future post, but you can accomplish the goal above by having a single macro call that puts the name of the message together with its text, but that accomplishes the different tasks by defining the macro differently.

Section 9.3 discourages conditional compilation. Almost any project that has to be compiled on more than one platform or compiler will require some conditional compilation. The key here is to isolate as much of it as possible into a single header file. That way most of your project remains readable, but each platform gets supported correctly.

Next: everyone's favorite topic, naming conventions.

Sunday, September 28, 2008

C++ Coding Standard: Inheritance

Since polymorphic behavior is one of the cornerstones of object-oriented design, our C++ coding standard has to provide some pointers on how to apply inheritance.
  • 8.1. Public inheritance must only be used to model the "is a" relation.
  • 8.2. Use private and protected inheritance sparingly, and only to model the "looks like a" relation.
  • 8.3. Only inherit publicly from an abstract base class.
  • 8.4. Destructors of public base classes must be pure virtual (but implemented).
  • 8.5. Don't use multiple inheritance.
Apart from Rule 8.1, which is an unbreachable commandment, you may from time to time find yourself wanting to break one of the other rules. Fine, but you better have a good reason for doing so.

Rule 8.2 used to read "Use only public inheritance".  The original rationale was this:  You might find private or protected inheritance useful to save a few lines of code somewhere. But I claim you should just duplicate the lines of code, rather than couple together classes which are so different conceptually that public inheritance can't be used. They might be structurally the same today, but if they are conceptually different, that structural similarity may change over time and give you a huge headache.

There's some truth to what's said above, so use private inheritance sparingly. But I softened my stance about private inheritance when I realized how useful the C++ using declaration is. You can re-use as many of the member functions of a private base class as you want with simple one-liners, without allowing polymorphic access to the base class. That way your class can expose all the useful parts of, say, std::map<A,B>, without the risk of someone confusing your class with the underlying map.

I know that the textbooks say private inheritance models "is implemented in terms of", and it's OK to think of it that way. But containment seems like a better model of "in terms of", and this "using" idiom seems to me to be the most useful part of private inheritance, so I'm going to start thinking of private inheritance modeling "looks like". It looks like std::map, for instance, but it isn't one.

Rule 8.3 springs from the same concern about coupling as 8.2. If you inherit from a non-abstract class, you're tying yourself to an interface that may change over time to meet the evolving needs of the parent class. Keeping base classes abstract means that you can change the behavior or implementation of any concrete class without shaking up some other concrete class.

It sounds odd to implement a pure virtual function, as Rule 8.4 demands, but it is in fact a legal thing to do. Legal and useful: there's little chance that someone will later remove the destructor from the class, so it will always be there to make the class abstract. When might you break this rule? If you have a memory-sensitive class where subclasses will not be used polymorphically (or do not require polymorphic destruction), you can disregard the rule and get rid of the virtual table pointer. But that's really weird: why didn't you use private inheritance if the classes are not to be used polymorphically? It sounds like you are not really modeling "is a", and you're probably skating on thin ice.

I suppose there are applications for multiple inheritance, but even if you avoid the diamond problem, you still often end up with extra complications in code that uses multiple inheritance. Before declaring a class public Fish, public Fowl, ask yourself if there is another solution, and if the derived class truly "is a" Fish and a Fowl.

Next: preprocessor issues.

Thursday, September 25, 2008

C++ Coding Standard: Class Layout

The previous section of the C++ coding standard dealt with important questions of class design. This section, dealing with how to lay out the class definition, is less important. But you should choose some organization for the declarations in a class, just for readability. Here's one way to do it.
  • 7.1. All public things first, then all protected, then all private.
  • 7.2. Within a public or private section, use the following order:
    • 7.2.1. First any nested type definitions or typedefs.
    • 7.2.2. Then constructors, destructors, and the assignment operator.
    • 7.2.3. Then any const member functions.
    • 7.2.4. Then any non-const member functions.
    • 7.2.5. Then any static member functions.
    • 7.2.6. Then any data members.
  • 7.3. Define inline functions outside of the class definition.
Although the default visibility for classes is private, most people agree that public information should go at the top, since that's the most interesting to a user of the class. Constructors are also of primary interest and should go near the top.

For a similar reason, I think inline definitions should be below the class definition. They are really an implementation detail that should not be of interest to a user of the class -- so put them at the bottom of the file. I do not agree with putting the inline definitions in a separate file. When you're trying to figure out what some function does, you usually start in the header file. If you need to look at the implementation, you might find it inline in the file you're already looking at; otherwise it's in the .cpp file. It's confusing to have a third file containing the inline functions -- just don't do that.

Next: inheritance.

Monday, September 22, 2008

C++ Coding Standard: Class Design

Now the C++ coding standard gets into interesting territory: rules for classes.
  • 6.1. Non-static data members must be private.
  • 6.2. A class must implement at least one public constructor.
  • 6.3. A class must declare its copy constructor and assignment operator.
    • 6.3.1. Exception to 6.3: not if declared private in base class.
  • 6.4. Member functions which do not change data members must be declared "const".
  • 6.5. Protected members should not be used.
  • 6.6. Consider changing private member functions to anonymous namespace functions in the implementation file.
  • 6.7. Classes must not declare "friend" classes.
  • 6.8. Classes should not declare "friend" functions for non-inline functions.
  • 6.9. Declare blank constructor if and only if a client needs it.
  • 6.10. Consider calculating data instead of storing it.
  • 6.11. Data members allocated on the heap must have type auto_ptr<T> or boost::shared_ptr<T>.
  • 6.12. Member functions marked "const" must be idempotent.
Why are protected and friend disallowed? This is an idea I first encountered in the Lakos book. I don't recommend that book because of wacky stuff in it like external include guards, but his argument in this area is convincing: if you allow friend classes, you might as well make everything in the class public, because someone can just declare their own class with the same name as your friend, and access anything in your class. Similarly, to get access to protected members, all someone has to do is derive from your class. Protected members should usually just be public.

Rule 6.11 is a way to leak-proof your code. Obviously, for some applications shared pointers are going to take too much memory. But you should think seriously whether each class has austere memory requirements; if not, use shared pointers.

Next: class layout.

Friday, September 19, 2008

CVS Is Dead, Long Live Subversion

Let's take a breather from C++ coding standards, and give a thought to version control. For years the best choice for a source code repository that anyone could make was CVS, which had these things going for it:
  1. It's free.
  2. You don't have to lock a file before working on it.
  3. It does pretty good merging.
  4. You can branch your repository.
  5. It doesn't require one full-time employee to manage it, like commercial systems do.
There were a few flies in the ointment. You could start a checkin and get some directories checked in, only to find that some other ones failed because of out-of-date files. Moving files essentially amounted to a delete and an add, losing the change history. You had to remember to check in binary files a special way. And tagging, branching, and locking were loaded with gotchas -- one wrong step and you were in the drink.

As luck would have it, the CVS developers didn't like flies in their own ointment, and some of them went out and developed a CVS-like revision-control system which takes care of a lot of the shortcomings of CVS: subversion. All the things that you like about CVS are there; many things that are tedious in CVS are simply no-ops in subversion.

The best example is branching. In CVS a branch tag will go on every file in the repository. When you make the tag, you have to be careful that you're tagging file versions that belong together -- which pretty much necessitates that you either branch off of -rHEAD or an exact date like -D2008-09-19 20:05. Hopefully that specification doesn't hit in the middle of someone's check-in and give you a mismatched set of files. There's some weirdness with adding files on a branch and later moving them to HEAD. And the version numbers of files on the branch are like 1.187.4.3 -- if you branch from a branch you get monstrosities like 1.148.24.1.2.2.

In subversion, branching is painless for the developer, and an extremely fast, no-gotcha operation on the repository. You just choose the numbered repository version that you want to branch from, and do an svn copy command to a different directory of your repository. Under the hood, the copy command is just a symbolic link, so it takes up almost no space. There's no danger of mismatched files, thanks to subversion's atomic checkins -- a commit either fails or succeeds, and the files aren't tagged with individual version numbers, there's just a single revision number for the entire repository.

Another beneficial side-effect of atomic commits is that when you're investigating why a change was made to a file sometime in the forgotten past, often you would like to see what other files changed at the same time -- or even just what other files looked like at that time. Of course both of those things can be discovered in CVS-land with a certain amount of effort. In SVN-land, they come for free: you can know what any file in the same revision looked like by checking out the file with that revision number; you can know what files were checked in at the same time as your file by running svn log -v on it.

Everytime there's some kind of forehead-slapping or head-scratching at work due to CVS/RCS, I have to say, "In the future, when we're using subversion, we'll look back at this and laugh". Ditch CVS as soon as you can and move up to subversion.

Wednesday, September 17, 2008

C++ Coding Standard: Implementation Files

Section 4 of the C++ coding standard is so short, that it makes sense to cover the next two sections in one post. We just covered the layout and contents of header files; these sections refer to the layout and contents of implementation files.
  • 4. Implementation file layout has only a few restrictions.
    • 4.1. Implementation files must begin with the standard C++ file comment.
    • 4.2. The first non-comment line of {class}.cpp must be: #include "{class}.hpp".
    • 4.3. {class}.cpp definitions should be in the same order as {class}.hpp declarations.
  • 5. Implementation file (.cpp) contents.
    • 5.1. A .cpp must never include an implementation file.
    • 5.2. A .cpp must not include files not needed to compile itself.
    • 5.3. Every non-inline function declared in {class}.hpp must be defined in {class}.cpp.
      • 5.3.1. Exception to 5.3: copy constructor and assignment operator may be declared private and not implemented, to override C++ defaults.
    • 5.4. Every static member variable declared in {class}.hpp must be defined in {class}.cpp.
    • 5.5. Anything defined in {class}.cpp which is not declared in {class}.hpp must be in the anonymous namespace.
    • 5.6. Every anonymous namespace function must be preceded by a documentation comment.
Rule 4.2 is a trick which ensures that {class}.hpp includes every file needed to compile it (see Rule 3.2). Note that the order of variables and functions in the .cpp file is just a "should" -- it makes for easier reference, but there may be some good reason why the order is different.

I've come to the conclusion that most classes should have a copy constructor and an assignment operator, and I would always implement them rather than let the compiler generate them, even if the naive bitwise copy is correct for the class. But sometimes you decide that an object must not be copyable. Then you use the trick in 5.3.1, you don't define the functions, but you do declare them:

private:
Object(const Object &orig);
Object &operator=(const Object &orig);

Then anyone outside of Object.cpp that tries to take a copy of an Object will get a compile error (any member of Object who takes a copy will get a link error).

Not everyone has yet heard of anonymous namespaces, but about 10 years ago they became the official approved way to declare things which aren't visible outside their source file. You used to declare them static, now do this:

namespace {

int globalForThisFileOnly = 1;
bool functionForThisFileOnly()
{
return true;
}

} // anonymous namespace

Next: Class design.

Monday, September 15, 2008

C++ Coding Standard: Header File Contents

Our C++ coding standard provides some easy rules about what does and does not go into a class header file.
  • 3.1. A .hpp must never include an implementation file.
  • 3.2. A .hpp must include every file needed to compile itself.
  • 3.3. A .hpp must not include files not needed to compile itself.
  • 3.4. Class definitions must be preceded by a documentation comment.
  • 3.5. Function declarations must be preceded by a documentation comment.
    • 3.5.1. Document function pre- and postconditions when relevant.
  • 3.6. A .hpp must not declare variables or constants outside the class.
  • 3.7. A header file must not declare functions outside the class.
    • 3.7.1. Exception to 3.7: operators or template specializations may sometimes be declared outside the class.
  • 3.8. A header file must not use "using namespace" directives.
  • 3.9. Any functions declared in the .hpp which are to be inline, must be defined in the .hpp.
By "documentation comment", I mean a javadoc-style comment formatted for Doxygen. At a minimum, this should provide a brief description of the class or function. Even better, for each function precondition, add to the function comment a @pre tag consisting of a C++ boolean expression that can be assert()ed at the beginning of the function. I have an emacs function that scans header files for @pre tags, then locates the function definition in the .cpp file and adds the assert()s -- I'll post it here some rainy day.

The standard seems to insist that everything in your program falls into a C++ class. I would say that is an ideal, but one that no one has yet attained. Still, we can interpret the rules to mean that if a header file defines a class, then it only defines that class. Perhaps there are some functions that don't make sense as member functions, but which have something to do with the class, and you're tempted to declare them in {class}.hpp. Well, don't. They should go into a different header/implementation file pair (and once you take that step, you might find that they really belong in some kind of singleton class).

Rule 3.8 is basic namespace hygiene; rule 3.1 seems like it goes without saying, but unfortunately it doesn't. There's a nifty trick to enforce rule 3.2 at compile time: it will be described in section 4.

Next: .cpp file layout/contents.

Saturday, September 13, 2008

C++ Coding Standard: Header File Layout

Section 2 of the C++ code standard specifies how the header file for a class is structured. As the specification of the class, the header file should be uncluttered and easy to navigate. Section 7 will go into more details about how to organize the class definition itself. This general outline can still be followed, for those few project headers which don't define a class.
  • 2.1. Header files must begin with the standard C++ file comment.
  • 2.2. Then comes the "#ifndef" of the include guard.
    • 2.2.1. Guard macro format _<DIR>_<CLASS>_H (all caps).
  • 2.3. Then include directives and external class declarations (optional).
  • 2.4. Then outside-the-class typedefs (optional).
  • 2.5. Then the class definition.
  • 2.6. Then other interface declarations (optional).
  • 2.7. Then inline function definitions (optional).
  • 2.8. Then the "#endif" of the include guard.
The standard file comment usually contains your company or organization name and copyright statement. Even if you don't require those items, at least you should have an ID string to be expanded by your version control system -- for subversion and RCS/CVS a good choice is $Id:$.

You can choose whatever format for the include guard makes sense for you; I suggest including the directory name, in case the unthinkable happens and your project ends up with two files with the same name. Incidentally, don't use external include guards -- they solve a problem that doesn't exist while cluttering your source files.

Next: header file contents.

Thursday, September 11, 2008

C++ Coding Standard: Source Files

Since all of your code is in files, and since a lot of C++ code consists of class definitions and class implementations, a good place for a C++ coding standard to start is by specifying some rules for putting classes into files.
  • 1.1. Every class must have a header file called "{class}.hpp".
  • 1.2. Every class must have an implementation file called "{class}.cpp".
    • 1.2.1. Exception to 1.2: template class with only inline members.
    • 1.2.2. Exception to 1.1 and 1.2: class in anonymous namespace.
  • 1.3. {class}.hpp and {class}.cpp must be in the same directory.
  • 1.4. {class}.hpp must define one and only one class.
    • 1.4.1. Exception to 1.4: nested classes or structs.
  • 1.5. Filenames must match the class names exactly, including case.
These rules haven't addressed directory structure at all, and there's a lot of advice that could be given about that. But those issues usually get tackled at the beginning of the project, and sit at the intersection of software architecture and build/release processes. The above rules help with an issue that comes up frequently: where to put a new class.

The rationale for the one class == one header + one implementation is simple: so that when I need to look at or change a given class later, I know exactly which file it is in.

Next up: header file layout.

Tuesday, September 9, 2008

C++ Coding Standard: Introduction

No matter what language you develop software in, some consistency will help make the code more readable for whoever ends up maintaining the code in the future -- even if that whoever is your future self. Furthermore, every language has strengths and weaknesses, and quirks that you can either learn yourself the hard way, or learn from others who have already done battle with the quirks.

A coding standard can help with both of these issues. Look at it as a set of guidelines that make for more maintainable code and help everyone avoid well-known language-specific pitfalls. I've done a lot of C++ work over the years, and I have a nice, concise coding standard that I've been carrying around for years. It is descended from a standard Kerry wrote when we worked together, that Brady and I distilled down and added to, and that Khalil later suggested changes to. I'm going to roll it out a section at a time in this blog.

The style is terse, consisting of numbered one-line items. Nearly every item is a "must" or a "never"; a few are "should"s; a few are exceptions to the preceding item. No rationale is given in the text of the standard itself, so as to preserve the terseness and make it easy to refer to the standard during code inspections or reviews. Of course, in blog-land I will provide some commentary for each section.

The sections are:
  1. Source files
  2. Header file layout
  3. Header file contents
  4. Implementation file layout
  5. Implementation file contents
  6. Class design
  7. Class layout
  8. Inheritance
  9. Preprocessor issues
  10. Naming conventions
  11. Function design
  12. Other stylistic issues
Finally, here is the complete standard, terse, with no commentary.

Thursday, September 4, 2008

Emacs File Mode Tricks

Usually emacs figures out what mode to be in -- such as which programming or script language -- based on the extension of the filename you're editing. It can also figure it out from a "pound-bang" line, like #!/bin/bash. You want the right file mode: it colorizes the text correctly, makes indentation easier, and lets you easily comment regions of text.

But sometimes you have a file with no extension and no pound-bang. You can still make emacs automatically choose the right file mode, by specifying it in a comment on the first line of the file, like:

# -*-tcl-*-

or

##############-*-makefile-*-##############

The -*- markers tell emacs to look for the mode name between them. You can have any other text you need in the line.

Another use for file-mode comments is if your C++ header files have the extension ".h", which emacs decides is a C file. Just make the first line be a comment containing -*-c++-*-.

Sometimes you do have a file extension which is meaningful to you, but emacs just doesn't recognize it. For example, if you're crazy enough to use the Inline Guard Macro idiom, emacs doesn't automatically recognize ".ipp" as a C++ extension.

Rather than put mode comments in every ".ipp" file, put this line in your .emacs file:

(if (null (assoc "\\.ipp\\'" auto-mode-alist))
(setq auto-mode-alist
(cons '("\\.ipp\\'" . c++-mode) auto-mode-alist)))

Sunday, August 31, 2008

Eschew Case Analysis

It ticks me off when I see code like this:
if (str != NULL && *str != '\0') {
exists = true;
}
else {
exists = false;
}

This is a pretty simplistic example, and most -- not all! -- programmers would get rid of the exists variable and the accompanying code, and replace it all with (str && *str). But it's all too common to find more subtle versions of this same kind of logic bloat. And when it's even more complicated and nested... can you say "cyclomatic complexity"?

This is a special case of what Dijkstra very cogently described as "spurious case analysis". That link is well worth reading, by the way.

Dijkstra was certainly on the science side of the engineering-science frontier. I once heard him say that the world would be a better place if all computer programs had to be chiseled in marble. I don't agree with that, but his commandment to avoid case analysis resonates with good software engineering practice: be concise, self-document, code for testability.

If you can avoid case analysis, do so. Otherwise....

Thursday, August 28, 2008

More Tcl Lunacy

Funny how when you start making a list, you might even leave off the thing that inspired you to make the list in the first place. The aggravation that led me to my previous rant didn't even make it into that post. So let's extend the "Tcl is a Toy" list a little:

4. Comments don't always begin with "#". At the beginning of a line, "#" is the comment character. But if you want to put a comment at the end of a line of code, you have to use ";#". I'm sorry, but that's ridiculous. So ridiculous that I had to start a blog just to rant about it, but when I went to post to the blog, I got sidetracked by three other ridiculous things.

5. Backslashes don't increase the line-number count. If you have some fancy scripts, or embedded Tcl, where you want to report the line number of certain actions, backslashes throw off the count. Why did you have backslashes in your script to begin with? Because Tcl is a toy language that would do something different if you started a new line there without the backslash. Argggh!

Tuesday, August 26, 2008

Tcl is a Toy Language

I just took over some Tcl code at work. What a toy language! The following are some things that are completely wrong about Tcl.

1. "else/elseif" has to be to the right of a brace. It's one thing if you like using this style of idiotic formatting for your if-else statements:

if { $x == 0 } {
do_zero()
} else {
do_non_zero()
}

which hides the structure by putting the "else" in a different column than the "if". But for the language to require you to do such a stupid thing is, well, stupid. It's also bad that it requires you to cleverly put the open brace of the "if" on the same line. It's what I'd usually do anyway, but bad requirement.

2. $x is assigned to as "x". How insane is that? You have two names for the same thing, and you can silently do the wrong thing if you set $x "abc".

3. Whitespace is significant in multi-dimensional arrays. Excuse me? $matrix(1, 1) (with a space) is different than $matrix(1,1)?

I could go on, but that's enough ranting for today.