Sunday, September 28, 2008

C++ Coding Standard: Inheritance

Since polymorphic behavior is one of the cornerstones of object-oriented design, our C++ coding standard has to provide some pointers on how to apply inheritance.
  • 8.1. Public inheritance must only be used to model the "is a" relation.
  • 8.2. Use private and protected inheritance sparingly, and only to model the "looks like a" relation.
  • 8.3. Only inherit publicly from an abstract base class.
  • 8.4. Destructors of public base classes must be pure virtual (but implemented).
  • 8.5. Don't use multiple inheritance.
Apart from Rule 8.1, which is an unbreachable commandment, you may from time to time find yourself wanting to break one of the other rules. Fine, but you better have a good reason for doing so.

Rule 8.2 used to read "Use only public inheritance".  The original rationale was this:  You might find private or protected inheritance useful to save a few lines of code somewhere. But I claim you should just duplicate the lines of code, rather than couple together classes which are so different conceptually that public inheritance can't be used. They might be structurally the same today, but if they are conceptually different, that structural similarity may change over time and give you a huge headache.

There's some truth to what's said above, so use private inheritance sparingly. But I softened my stance about private inheritance when I realized how useful the C++ using declaration is. You can re-use as many of the member functions of a private base class as you want with simple one-liners, without allowing polymorphic access to the base class. That way your class can expose all the useful parts of, say, std::map<A,B>, without the risk of someone confusing your class with the underlying map.

I know that the textbooks say private inheritance models "is implemented in terms of", and it's OK to think of it that way. But containment seems like a better model of "in terms of", and this "using" idiom seems to me to be the most useful part of private inheritance, so I'm going to start thinking of private inheritance modeling "looks like". It looks like std::map, for instance, but it isn't one.

Rule 8.3 springs from the same concern about coupling as 8.2. If you inherit from a non-abstract class, you're tying yourself to an interface that may change over time to meet the evolving needs of the parent class. Keeping base classes abstract means that you can change the behavior or implementation of any concrete class without shaking up some other concrete class.

It sounds odd to implement a pure virtual function, as Rule 8.4 demands, but it is in fact a legal thing to do. Legal and useful: there's little chance that someone will later remove the destructor from the class, so it will always be there to make the class abstract. When might you break this rule? If you have a memory-sensitive class where subclasses will not be used polymorphically (or do not require polymorphic destruction), you can disregard the rule and get rid of the virtual table pointer. But that's really weird: why didn't you use private inheritance if the classes are not to be used polymorphically? It sounds like you are not really modeling "is a", and you're probably skating on thin ice.

I suppose there are applications for multiple inheritance, but even if you avoid the diamond problem, you still often end up with extra complications in code that uses multiple inheritance. Before declaring a class public Fish, public Fowl, ask yourself if there is another solution, and if the derived class truly "is a" Fish and a Fowl.

Next: preprocessor issues.

Thursday, September 25, 2008

C++ Coding Standard: Class Layout

The previous section of the C++ coding standard dealt with important questions of class design. This section, dealing with how to lay out the class definition, is less important. But you should choose some organization for the declarations in a class, just for readability. Here's one way to do it.
  • 7.1. All public things first, then all protected, then all private.
  • 7.2. Within a public or private section, use the following order:
    • 7.2.1. First any nested type definitions or typedefs.
    • 7.2.2. Then constructors, destructors, and the assignment operator.
    • 7.2.3. Then any const member functions.
    • 7.2.4. Then any non-const member functions.
    • 7.2.5. Then any static member functions.
    • 7.2.6. Then any data members.
  • 7.3. Define inline functions outside of the class definition.
Although the default visibility for classes is private, most people agree that public information should go at the top, since that's the most interesting to a user of the class. Constructors are also of primary interest and should go near the top.

For a similar reason, I think inline definitions should be below the class definition. They are really an implementation detail that should not be of interest to a user of the class -- so put them at the bottom of the file. I do not agree with putting the inline definitions in a separate file. When you're trying to figure out what some function does, you usually start in the header file. If you need to look at the implementation, you might find it inline in the file you're already looking at; otherwise it's in the .cpp file. It's confusing to have a third file containing the inline functions -- just don't do that.

Next: inheritance.

Monday, September 22, 2008

C++ Coding Standard: Class Design

Now the C++ coding standard gets into interesting territory: rules for classes.
  • 6.1. Non-static data members must be private.
  • 6.2. A class must implement at least one public constructor.
  • 6.3. A class must declare its copy constructor and assignment operator.
    • 6.3.1. Exception to 6.3: not if declared private in base class.
  • 6.4. Member functions which do not change data members must be declared "const".
  • 6.5. Protected members should not be used.
  • 6.6. Consider changing private member functions to anonymous namespace functions in the implementation file.
  • 6.7. Classes must not declare "friend" classes.
  • 6.8. Classes should not declare "friend" functions for non-inline functions.
  • 6.9. Declare blank constructor if and only if a client needs it.
  • 6.10. Consider calculating data instead of storing it.
  • 6.11. Data members allocated on the heap must have type auto_ptr<T> or boost::shared_ptr<T>.
  • 6.12. Member functions marked "const" must be idempotent.
Why are protected and friend disallowed? This is an idea I first encountered in the Lakos book. I don't recommend that book because of wacky stuff in it like external include guards, but his argument in this area is convincing: if you allow friend classes, you might as well make everything in the class public, because someone can just declare their own class with the same name as your friend, and access anything in your class. Similarly, to get access to protected members, all someone has to do is derive from your class. Protected members should usually just be public.

Rule 6.11 is a way to leak-proof your code. Obviously, for some applications shared pointers are going to take too much memory. But you should think seriously whether each class has austere memory requirements; if not, use shared pointers.

Next: class layout.

Friday, September 19, 2008

CVS Is Dead, Long Live Subversion

Let's take a breather from C++ coding standards, and give a thought to version control. For years the best choice for a source code repository that anyone could make was CVS, which had these things going for it:
  1. It's free.
  2. You don't have to lock a file before working on it.
  3. It does pretty good merging.
  4. You can branch your repository.
  5. It doesn't require one full-time employee to manage it, like commercial systems do.
There were a few flies in the ointment. You could start a checkin and get some directories checked in, only to find that some other ones failed because of out-of-date files. Moving files essentially amounted to a delete and an add, losing the change history. You had to remember to check in binary files a special way. And tagging, branching, and locking were loaded with gotchas -- one wrong step and you were in the drink.

As luck would have it, the CVS developers didn't like flies in their own ointment, and some of them went out and developed a CVS-like revision-control system which takes care of a lot of the shortcomings of CVS: subversion. All the things that you like about CVS are there; many things that are tedious in CVS are simply no-ops in subversion.

The best example is branching. In CVS a branch tag will go on every file in the repository. When you make the tag, you have to be careful that you're tagging file versions that belong together -- which pretty much necessitates that you either branch off of -rHEAD or an exact date like -D2008-09-19 20:05. Hopefully that specification doesn't hit in the middle of someone's check-in and give you a mismatched set of files. There's some weirdness with adding files on a branch and later moving them to HEAD. And the version numbers of files on the branch are like 1.187.4.3 -- if you branch from a branch you get monstrosities like 1.148.24.1.2.2.

In subversion, branching is painless for the developer, and an extremely fast, no-gotcha operation on the repository. You just choose the numbered repository version that you want to branch from, and do an svn copy command to a different directory of your repository. Under the hood, the copy command is just a symbolic link, so it takes up almost no space. There's no danger of mismatched files, thanks to subversion's atomic checkins -- a commit either fails or succeeds, and the files aren't tagged with individual version numbers, there's just a single revision number for the entire repository.

Another beneficial side-effect of atomic commits is that when you're investigating why a change was made to a file sometime in the forgotten past, often you would like to see what other files changed at the same time -- or even just what other files looked like at that time. Of course both of those things can be discovered in CVS-land with a certain amount of effort. In SVN-land, they come for free: you can know what any file in the same revision looked like by checking out the file with that revision number; you can know what files were checked in at the same time as your file by running svn log -v on it.

Everytime there's some kind of forehead-slapping or head-scratching at work due to CVS/RCS, I have to say, "In the future, when we're using subversion, we'll look back at this and laugh". Ditch CVS as soon as you can and move up to subversion.

Wednesday, September 17, 2008

C++ Coding Standard: Implementation Files

Section 4 of the C++ coding standard is so short, that it makes sense to cover the next two sections in one post. We just covered the layout and contents of header files; these sections refer to the layout and contents of implementation files.
  • 4. Implementation file layout has only a few restrictions.
    • 4.1. Implementation files must begin with the standard C++ file comment.
    • 4.2. The first non-comment line of {class}.cpp must be: #include "{class}.hpp".
    • 4.3. {class}.cpp definitions should be in the same order as {class}.hpp declarations.
  • 5. Implementation file (.cpp) contents.
    • 5.1. A .cpp must never include an implementation file.
    • 5.2. A .cpp must not include files not needed to compile itself.
    • 5.3. Every non-inline function declared in {class}.hpp must be defined in {class}.cpp.
      • 5.3.1. Exception to 5.3: copy constructor and assignment operator may be declared private and not implemented, to override C++ defaults.
    • 5.4. Every static member variable declared in {class}.hpp must be defined in {class}.cpp.
    • 5.5. Anything defined in {class}.cpp which is not declared in {class}.hpp must be in the anonymous namespace.
    • 5.6. Every anonymous namespace function must be preceded by a documentation comment.
Rule 4.2 is a trick which ensures that {class}.hpp includes every file needed to compile it (see Rule 3.2). Note that the order of variables and functions in the .cpp file is just a "should" -- it makes for easier reference, but there may be some good reason why the order is different.

I've come to the conclusion that most classes should have a copy constructor and an assignment operator, and I would always implement them rather than let the compiler generate them, even if the naive bitwise copy is correct for the class. But sometimes you decide that an object must not be copyable. Then you use the trick in 5.3.1, you don't define the functions, but you do declare them:

private:
Object(const Object &orig);
Object &operator=(const Object &orig);

Then anyone outside of Object.cpp that tries to take a copy of an Object will get a compile error (any member of Object who takes a copy will get a link error).

Not everyone has yet heard of anonymous namespaces, but about 10 years ago they became the official approved way to declare things which aren't visible outside their source file. You used to declare them static, now do this:

namespace {

int globalForThisFileOnly = 1;
bool functionForThisFileOnly()
{
return true;
}

} // anonymous namespace

Next: Class design.

Monday, September 15, 2008

C++ Coding Standard: Header File Contents

Our C++ coding standard provides some easy rules about what does and does not go into a class header file.
  • 3.1. A .hpp must never include an implementation file.
  • 3.2. A .hpp must include every file needed to compile itself.
  • 3.3. A .hpp must not include files not needed to compile itself.
  • 3.4. Class definitions must be preceded by a documentation comment.
  • 3.5. Function declarations must be preceded by a documentation comment.
    • 3.5.1. Document function pre- and postconditions when relevant.
  • 3.6. A .hpp must not declare variables or constants outside the class.
  • 3.7. A header file must not declare functions outside the class.
    • 3.7.1. Exception to 3.7: operators or template specializations may sometimes be declared outside the class.
  • 3.8. A header file must not use "using namespace" directives.
  • 3.9. Any functions declared in the .hpp which are to be inline, must be defined in the .hpp.
By "documentation comment", I mean a javadoc-style comment formatted for Doxygen. At a minimum, this should provide a brief description of the class or function. Even better, for each function precondition, add to the function comment a @pre tag consisting of a C++ boolean expression that can be assert()ed at the beginning of the function. I have an emacs function that scans header files for @pre tags, then locates the function definition in the .cpp file and adds the assert()s -- I'll post it here some rainy day.

The standard seems to insist that everything in your program falls into a C++ class. I would say that is an ideal, but one that no one has yet attained. Still, we can interpret the rules to mean that if a header file defines a class, then it only defines that class. Perhaps there are some functions that don't make sense as member functions, but which have something to do with the class, and you're tempted to declare them in {class}.hpp. Well, don't. They should go into a different header/implementation file pair (and once you take that step, you might find that they really belong in some kind of singleton class).

Rule 3.8 is basic namespace hygiene; rule 3.1 seems like it goes without saying, but unfortunately it doesn't. There's a nifty trick to enforce rule 3.2 at compile time: it will be described in section 4.

Next: .cpp file layout/contents.

Saturday, September 13, 2008

C++ Coding Standard: Header File Layout

Section 2 of the C++ code standard specifies how the header file for a class is structured. As the specification of the class, the header file should be uncluttered and easy to navigate. Section 7 will go into more details about how to organize the class definition itself. This general outline can still be followed, for those few project headers which don't define a class.
  • 2.1. Header files must begin with the standard C++ file comment.
  • 2.2. Then comes the "#ifndef" of the include guard.
    • 2.2.1. Guard macro format _<DIR>_<CLASS>_H (all caps).
  • 2.3. Then include directives and external class declarations (optional).
  • 2.4. Then outside-the-class typedefs (optional).
  • 2.5. Then the class definition.
  • 2.6. Then other interface declarations (optional).
  • 2.7. Then inline function definitions (optional).
  • 2.8. Then the "#endif" of the include guard.
The standard file comment usually contains your company or organization name and copyright statement. Even if you don't require those items, at least you should have an ID string to be expanded by your version control system -- for subversion and RCS/CVS a good choice is $Id:$.

You can choose whatever format for the include guard makes sense for you; I suggest including the directory name, in case the unthinkable happens and your project ends up with two files with the same name. Incidentally, don't use external include guards -- they solve a problem that doesn't exist while cluttering your source files.

Next: header file contents.

Thursday, September 11, 2008

C++ Coding Standard: Source Files

Since all of your code is in files, and since a lot of C++ code consists of class definitions and class implementations, a good place for a C++ coding standard to start is by specifying some rules for putting classes into files.
  • 1.1. Every class must have a header file called "{class}.hpp".
  • 1.2. Every class must have an implementation file called "{class}.cpp".
    • 1.2.1. Exception to 1.2: template class with only inline members.
    • 1.2.2. Exception to 1.1 and 1.2: class in anonymous namespace.
  • 1.3. {class}.hpp and {class}.cpp must be in the same directory.
  • 1.4. {class}.hpp must define one and only one class.
    • 1.4.1. Exception to 1.4: nested classes or structs.
  • 1.5. Filenames must match the class names exactly, including case.
These rules haven't addressed directory structure at all, and there's a lot of advice that could be given about that. But those issues usually get tackled at the beginning of the project, and sit at the intersection of software architecture and build/release processes. The above rules help with an issue that comes up frequently: where to put a new class.

The rationale for the one class == one header + one implementation is simple: so that when I need to look at or change a given class later, I know exactly which file it is in.

Next up: header file layout.

Tuesday, September 9, 2008

C++ Coding Standard: Introduction

No matter what language you develop software in, some consistency will help make the code more readable for whoever ends up maintaining the code in the future -- even if that whoever is your future self. Furthermore, every language has strengths and weaknesses, and quirks that you can either learn yourself the hard way, or learn from others who have already done battle with the quirks.

A coding standard can help with both of these issues. Look at it as a set of guidelines that make for more maintainable code and help everyone avoid well-known language-specific pitfalls. I've done a lot of C++ work over the years, and I have a nice, concise coding standard that I've been carrying around for years. It is descended from a standard Kerry wrote when we worked together, that Brady and I distilled down and added to, and that Khalil later suggested changes to. I'm going to roll it out a section at a time in this blog.

The style is terse, consisting of numbered one-line items. Nearly every item is a "must" or a "never"; a few are "should"s; a few are exceptions to the preceding item. No rationale is given in the text of the standard itself, so as to preserve the terseness and make it easy to refer to the standard during code inspections or reviews. Of course, in blog-land I will provide some commentary for each section.

The sections are:
  1. Source files
  2. Header file layout
  3. Header file contents
  4. Implementation file layout
  5. Implementation file contents
  6. Class design
  7. Class layout
  8. Inheritance
  9. Preprocessor issues
  10. Naming conventions
  11. Function design
  12. Other stylistic issues
Finally, here is the complete standard, terse, with no commentary.

Thursday, September 4, 2008

Emacs File Mode Tricks

Usually emacs figures out what mode to be in -- such as which programming or script language -- based on the extension of the filename you're editing. It can also figure it out from a "pound-bang" line, like #!/bin/bash. You want the right file mode: it colorizes the text correctly, makes indentation easier, and lets you easily comment regions of text.

But sometimes you have a file with no extension and no pound-bang. You can still make emacs automatically choose the right file mode, by specifying it in a comment on the first line of the file, like:

# -*-tcl-*-

or

##############-*-makefile-*-##############

The -*- markers tell emacs to look for the mode name between them. You can have any other text you need in the line.

Another use for file-mode comments is if your C++ header files have the extension ".h", which emacs decides is a C file. Just make the first line be a comment containing -*-c++-*-.

Sometimes you do have a file extension which is meaningful to you, but emacs just doesn't recognize it. For example, if you're crazy enough to use the Inline Guard Macro idiom, emacs doesn't automatically recognize ".ipp" as a C++ extension.

Rather than put mode comments in every ".ipp" file, put this line in your .emacs file:

(if (null (assoc "\\.ipp\\'" auto-mode-alist))
(setq auto-mode-alist
(cons '("\\.ipp\\'" . c++-mode) auto-mode-alist)))