Software EngiSneering: October 2008

Thursday, October 30, 2008

All About Emacs Macros

Emacs macros allow you to record a repetitive sequence of keystrokes, so they can be replayed. It's a pretty basic and useful emacs skill, but there are a couple of commands related to macros that can make them even more useful to you.

To record a macro, type "C-x (" (if you're new to emacs, C-x means "control-x"; M-x means "meta-x", usually entered by holding down the "Alt" key while pressing "x"). Then everything you type will be recorded until you type "C-x )". The string "Def" appears in the emacs status line while the macro is being recorded. Macro recording is aborted by any error operation -- basically anything that makes the bell ring.

Macros can contain any sequence of keystrokes, including control and meta characters (hence searches and other commands). After you type "C-x )", your macro can be replayed by typing "C-x e". If your macro is such that it leaves the cursor in a place where it makes sense to run the macro again, you can run many iterations of the macro by using a prefix argument. For example, to run the macro 100 times, type "C-u 1 0 0 C-x e".

Saving Macros

One good thing often leads to another, and sometimes you find yourself wanting to use two macros at the same time. No problem, the command "M-x name-last-kbd-macro" lets you give a name to a macro that you've entered. Then, instead of typing "C-x e" you type "M-x" and the name you gave your macro.

If you have a macro that you think will be useful to you again and again, you can save it in your .emacs file. Name the macro, then open your .emacs file, and run "M-x insert-kbd-macro". Lisp code which defines and names your macro is written into the file, so the command will be available to you forever.

Editing macros

Sometimes you go to a lot of effort to record a macro but you have one typo in it, or can make it more useful by tweaking it a little. In that case, there is a way to edit either a named or unnamed keyboard macro. Just use "M-x edit-named-kbd-macro" or "M-x edit-last-kbd-macro".

Wednesday, October 15, 2008

C++ Coding Standard: Other Stylistic Issues

I can't believe it! The last section of the C++ coding standard. Mostly these are issues of clarity, readability, or grep-ability.

12.1. Mark 64-bit constants "LL" or "ULL", e.g. 0xffffffffffffffffULL.
12.2. Mark 32-bit constants >= 0x80000000 "U", e.g. 0xffffffffU.
12.3. Use C++-style typecasts instead of C-style typecasts.
12.4. A source file must not include tab characters (only spaces).
12.5. Lines must contain no more than 80 characters, including the (invisible) newline character at the end.
12.6. A source line must not contain two statements.
12.7. Use 2 spaces per level of indentation.
12.8. Indent the body of a compound statement ("if", "while", etc.) one level more than the head of the statement.
12.9. An "else" must have the same indentation as its "if".
12.10. Two statements in a sequence must have the same indentation.
12.11. A comment is indented the same as the statement following it.
12.12. No character must appear to the right of a brace ("{" or "}").
12.13. A closing right brace must be indented to the same level as the line containing its corresponding left brace.
12.14. There must be no space between the function name and the left parenthesis of a function call (e.g. "printf(...)").
12.15. There must be a space between a keyword and a left parenthesis (e.g. "if (x != NULL)").
12.16. Do not use C-style comments (except for documentation comments).
12.17. Do not comment out or "#if 0" dead code. Remove it.
12.18. Do not use external include guards.

Everyone has an opinion about where the braces go on if, while, etc. I try to avoid religious battles like that, and allow you to put the opening brace on the same line or the next line, as you wish. However, 12.12 does prohibit the despicable Tcl style of "} else {". Code is much more readable if the else lines up with the if. The same reasoning leads to 12.13: line up the closing brace with the line that opened it. This works no matter where you put the opening brace.

Of course 12.7 can be tailored to suit your needs: I chose 2 because I think less is more when it comes to indentation. However, 12.4 is absolute. Tabs mess everything up.

Although this section is mostly about style, there is a practical reason for 12.6. The effectiveness of measuring line coverage is diminished if there can be more than one statement per line.

Friday, October 10, 2008

C++ Coding Standard: Function Design

We're almost done laying out the sections of the C++ coding standard. All of your C++ code lies in some function or other. Section 11 provides some rules on what those functions should look like.

11.1. Constructors must initialize all data members in the initialization list, and in the order they're declared in the class definition.
11.2. Class member functions which return a pointer or reference to a data member, must return it as "const T*" or "const T&".
11.3. A class function must not deallocate memory that was allocated outside of the class implementation file.
11.4. A class destructor must deallocate all of the memory allocated for data members of the instance.
11.5. Function arguments which are pointer, array, or reference types, which are not changed by the function, must be declared "const".
11.6. A function must contain at most one "return" statement.
11.7. Do not use "goto" or "continue".
11.8. Only use "break" in switch statements.
11.9. Declare variables at the latest possible location.
11.10. Use assert() to check for programming or interface errors.
11.11. Functions with "true/false" semantics must return "bool".
11.12. Declare default arguments in the .hpp file, not the .cpp file.
11.13. Pass objects (especially containers) by (const) reference, not value.
11.14. Return objects (especially containers) by value, unless it makes sense to return a const reference.
11.15. Exceptions must be caught by reference, not by value or pointer.
11.16. When re-throwing an exception, use "throw;", not "throw ex;".
11.17. Destructors must catch all exceptions and throw none.
11.18. Exception specifications must not be used.
- 11.18.1. Exception to 11.18: a destructor which catches all exceptions can be marked with "throw()".

The first five rules are basic hygiene. Perhaps 11.3 needs a little explanation: it enforces symmetric deallocation by saying you shouldn't have to deallocate something allocated in another module.

The next three -- 11.6 - 11.8 -- impose structured-programming doctrine, which says that goto statements impede automated reasoning about functions (break, continue, and out-of-place return are synonyms for goto). Not many people are doing proofs of program correctness, but good structure also aids human reasoning about programs, as well as human debugging. Occasionally you can make a case for simplified functions by using an early return statement; or sometimes a goto is needed to improve loop performance. Make those be rare exceptions.

Passing objects by reference (11.13) avoids copy construction during function calls, which is usually a timesaver -- though you might pass by value if your first action inside the function is to make a copy anyway. Of course if an argument is to be modified within the function, the correct choice is a non-const reference. For return values (11.14), it's almost always right to return an object instead of a reference. You might be tempted to return const references for some trivial "get" members, but even then, if the internal data layout of the class changes sometime in the future, you'll have a lot of work to do chasing down the function calls because the return type will change. Better to just start off returning by value.

Finally, there are some things to keep in mind about exceptions. Scott Meyers points out that catching by value leaves you susceptible to slicing off data in polymorphic classes; catching by pointer brings up confusing issues of ownership and deallocation. Catching by reference is the only solution. For a similar reason, a simple "throw;" is preferred to throwing the exception by name: fewer chances to make a mistake or accidentally involve a copy constructor. As for exception specifications, it's tricky to get them correct, and they add little value. The best solution is to only allow them for the situation where we absolutely expect no exceptions: in a destructor.

Next: the final chapter! Stylistic issues.

Sunday, October 5, 2008

Obfuscated Directory Names

I have to rant about some confusing directory/file/object names in a project I inherited at work. The guy who did it is a competent, experienced engineer. This isn't an attack on him or his work habits. If there's a moral to this rant, it's that good developers can make this kind of mistake, so the rest of us have to be extra careful not to make it.

Now, on to my rant. This C++ project -- for purposes of anonymity let's call it the "zoo" project -- interfaces with a Tcl interpreter. One confusing thing about the project organization is that it doesn't follow Rules 1.1 and 1.2 of my C++ coding standard: each source file might define many different classes. That compounds some other class naming issues, like the presence of classes with ridiculously similar names, like "ZooObj" and "ZooObject", or naming files after an abstract base class, but changing the capitalization: zooObject.hpp.

But the real problem has to do with directory names that don't describe their contents well, and that all sound the same. Here's the directory structure:


 ...
   zoo/
     tclApi/
     zooDb/
       zooDb/
       zooDbTcl/

The morpheme "db" here just means "I am defining a set of classes that model the zoo data". Well, no kidding, it's the zoo project. Things that go without saying should just not be said. In object-oriented design, responsibility for every function belongs to some class or another. You should never need special directories called "db" or "model" or "classes" or anything like that -- they don't add any information.

The next indication that something has gone wrong is that we have a directory called "zooDb/zooDb". That's one duplication that we could surely live without; you also have to wonder why there are two directories with "tcl" in the name. It turns out that one rationale for the directory structure is that there are a set of classes which will be compiled into a library with no dependencies on the larger project, so that it can provide a Tcl package to any interpreter. So maybe there's a method to this madness, we need the zoo/zooDb/zooDbTcl directory for that separate Tcl package code. Oops, turns out that's not the case, it's zoo/zooDb/zooDb that holds that. Arrrggggh!

What can we learn from this bad example? First, don't add directory structure where it's not needed. You probably need one directory for each object library you're building; any more than that just makes it confusing when someone is trying to walk through your code. Second, directory names and class names should not contain superfluous strings like "db", "model", or "class". Finally, don't add misleading strings -- like "tcl" in the example above -- that lead away from the directories or files that should be described with them.

If we apply all that, we end up with a much cleaner directory structure:


 ...
   zoo/
   zooTclPackage/

With that structure, when a new developer is looking for the classes that are in the dependency-free package, he knows just where to look. When he's looking for other zoo-project files, he knows where to look. It doesn't matter that some code interfacing with Tcl is mingled in a directory with stuff that knows nothing about Tcl -- like the food on your plate, it all gets mixed together when you eat it anyway. As an added benefit, the use of that flat hierarchy and putting every class in a separate header file will hopefully cut down on the temptation to create evil twins like zooObj and zooObject.

Saturday, October 4, 2008

C++ Coding Standard: Naming Conventions

Now the C++ coding standard takes on the contentious subject of naming conventions. Substitute your own conventions for these if you like, but conventions of some kind can make for more readable code.

10.1. Constant names must be in all caps.
10.2. Private data and function member names must begin with '_'.
10.3. Type names must begin with a capital letter.
10.4. Function macro names must be in all caps.
10.5. Names not covered above must begin with a lower-case letter.
10.6. Do not use underscores and mixed-case in a single name.
10.7. Typedefs must not obscure the fact that a type is a pointer, reference, or array type.
10.8. All class definitions must be within a namespace.
10.9. Do not use negative names that begin with "no" or "not".

All-caps constants are such a common convention that you would be foolish to do otherwise -- remember to make your enum constants all-caps also. I extend the all-caps convention to preprocessor function macros also, because it is often useful to highlight the fact that you're using a macro instead of a real function.

I'm not crazy about Hungarian notation, like tacking a _p onto the end of pointer variables, but you might consider using a _t suffix for type names instead of the leading-capital convention of 10.3, since emacs will colorize the type for you. But only if the resulting names don't break 10.6.

Rule 10.6 only says not to have crazy names like underscore_MixedCase_name; there is no mention of whether to choose underscore_names or mixedCaseNames. You should choose one or the other so that you don't accidentally end up with homophone names like my_var and myVar. If your type names begin with a capital letter, you should probably go mixed-case; if they are denoted with _t, maybe underscore names are for you.

The prohibition on obscuring pointer/reference types is more than a simple stylistic issue. If you typedef Object *ObjectPtr, then const ObjectPtr does not mean const Object *, it means Object * const. The latter -- meaning an unchanging address to a changeable Object -- is of very little practical use. The former -- a pointer to an Object that can't be changed -- is useful both as a way to self-document your code and as a way to let the compiler catch silly mistakes when you try to change something you shouldn't.

Next: function design.

Wednesday, October 1, 2008

C++ Coding Standard: Preprocessor Issues

One of the strengths that C++ inherited from C is the preprocessor. Text macros and conditional compilation provide expressive power that can greatly simplify certain programming tasks. On the other hand, that expressive power can also lead to obfuscated code -- or worse, undebuggable code.

So our C++ coding standard guides you gently away from preprocessor abuse.

9.1. Preprocessor macros should be rarely used.
- 9.1.1. Obvious exception to 9.1: multiple-include guard.
- 9.1.2. Use enum constants instead of constant macros.
- 9.1.3. Use inline functions instead of function macros (when possible).
9.2. Never use a macro to invent new syntax.
9.3. Conditional compilation should be rarely used.
9.4. Operating system #includes use angle brackets, e.g. <unistd.h>.
9.5. Non-operating system #includes use double quotes, e.g. "Object.hpp".
9.6. Use the standard C++ system header files, e.g. <string> not <string.h>.

Let's get the last three items out of the way first, since they're the easiest. The issue is that compilers do a slightly different search for header files depending on the form of the #include. So choose the correct form.

Section 9.1 suggests alternatives to preprocessor macros. What's the difference? For constants, the differences are that the enum name is visible from a debugger, and an accidental re-definition of a constant becomes a compile-time error, instead of -- if you're lucky -- a warning. For functions, again, you get visibility in the debugger if you use an inline function, but not if you use a function macro. Function macros can also be harder for humans to read, and you have to remember to put every instance of a macro argument into parenteses in the macro definition, to ensure that the text expansion has the same semantics as the macro reference.

When should you use preprocessor macros? One place that I find them useful is in specifying error messages. It would be handy to keep the message text together with the message name, and indeed to have a message enum in the program that also gets printed in the output. I'll describe the details of this in a future post, but you can accomplish the goal above by having a single macro call that puts the name of the message together with its text, but that accomplishes the different tasks by defining the macro differently.

Section 9.3 discourages conditional compilation. Almost any project that has to be compiled on more than one platform or compiler will require some conditional compilation. The key here is to isolate as much of it as possible into a single header file. That way most of your project remains readable, but each platform gets supported correctly.

Next: everyone's favorite topic, naming conventions.