Monday, May 18, 2009

Emacs Registers

A few weeks ago when I was talking about tricks to get back to your editing location in emacs, I neglected to mention the jump-to-register command, by default bound to C-x r j. Actually, the first step is to save a location to a register: C-x r SPC (point-to-register). You'll be prompted to enter a character to refer to the location.

The nice thing is, when you jump-to-register, you don't have to be in the same buffer or file as the saved location. So you can mark places that are interesting, go explore some other stuff, and then immediately leap back to the buffer and line that you need.

Do you use emacs registers much? There are a few things you can do with them, the most common being point-to-register/jump-to-register and copy-to-register/insert-register. Whereas the kill-ring is like a text clipboard vector, copy-to- and insert-register are like a clipboard associative array.

I find it useful to keep register identifiers (the single character that names a register) partitioned into namespaces:
  • punctuation marks: canned useful strings defined with set-register that my .emacs loads at startup
  • numbers: locations for jump-to-register
  • letters: temporary copy-to-register strings for ad hoc cut/paste operations
It was only a few weeks ago that I realized that control-keystrokes or meta-keystrokes can be register names. It probably makes more sense to let the letter namespace represent canned strings along with punctuation marks, and use control characters for the quotidian things that letters represent for me right now. I might gradually shift to that way of doing business.

Sunday, May 3, 2009

Handy C++ Visitor Macros

This is the fifth post on the subject of the Visitor design pattern. First I tried to explain the motivation for Visitor, then presented the textbook implementation, what I consider to be some improvements on that implementation, and finally an example of the power of the Default Visitor variant of the design pattern. In this post, I want to show some preprocessor macros that make Visitor even handier. Brace yourself, this is going to be a long post. There's a lot to discuss.

The C preprocessor is a dangerous weapon when used indiscriminately. In the case of our C++ Visitors, however, some preprocessor magic will provide us with the following benefits:
  1. Compiler errors (not warnings) when a concrete Visitor class has not been defined correctly.
  2. Automatic generation of tedious parts of the Default Visitor implementation.
  3. Linker errors when Visitors which must not implement Default Visitor are not updated to visit a newly-added class.
We need to add the Statement hierarchy described in the last post to the Visitor class we declared earlier. Instead of hand-coding everything, we are going to write a header file that uses some preprocessor macros to describe the hierarchy of visited classes. If we also add a header file containing #include directives for each concrete visited class, then we won't have to edit Visitor.hpp or Visitor.cpp when we add classes to the hierarchy in the future. Nifty!

Let's call the first header file VisitorSpecs.hpp. It looks like this:

// NOTE: No include guard for this header!

#ifndef VISITED_ABSTRACT_CLASS
#define VISITED_ABSTRACT_CLASS(cls,parent)
#endif

#ifndef VISITED_ROOT_CLASS
#define VISITED_ROOT_CLASS(cls) \
VISITED_ABSTRACT_CLASS(cls,0)
#endif

VISITED_ROOT_CLASS(Expression)
VISITED_CLASS(Variable, Expression)
VISITED_CLASS(Number, Expression)
VISITED_CLASS(Operation, Expression)

VISITED_ROOT_CLASS(Statement)
VISITED_CLASS(Assignment, Statement)
VISITED_ABSTRACT_CLASS(MultiStatement, Statement)
VISITED_CLASS(Block, MultiStatement)
VISITED_CLASS(IfElse, MultiStatement)

#undef VISITED_ROOT_CLASS
#undef VISITED_ABSTRACT_CLASS
#undef VISITED_CLASS

Hang on a second... This defines a couple of macros as no-ops, and leaves another macro undefined. What gives? The answer is that Visitor.hpp defines the macros one way before including VisitorSpecs.hpp, and Visitor.cpp defines the macros another way, and also includes VisitorSpecs.hpp. That's why the header must not have an include guard: it will get included into Visitor.cpp multiple times. You may even find other consumers for VisitorSpecs.hpp, as you'll see a little later.

The macros are going to end up a little bit complicated, to provide some of the wonderful compiler fail-safes I've been hyping. But to make things clear, let's present the basic implementation first. Visitor.hpp has to include VisitorSpecs.hpp two separate times. First to declare the class names:

#define VISITED_CLASS(cls,parent) class cls;
#define VISITED_ABSTRACT_CLASS(cls, parent) \
VISITED_CLASS(cls,parent)
#include "VisitorSpecs.hpp"


Then to declare the member functions, Visitor.hpp defines the macros as follows:

#define VISITED_CLASS(cls,parent) \
virtual bool enter(const cls &); \
virtual void exit(const cls &);
#define VISITED_ABSTRACT_CLASS(cls, parent) \
VISITED_CLASS(cls,parent)
#include "VisitorSpecs.hpp"

Remember, the "root" classes (those with no parent) are defined in terms of abstract classes. So in fact, Visitor.hpp declares enter() and exit() for every visited class.

Visitor.cpp defines default implementations for the member functions by defining the macros like this:

// No parent default: don't recurse, do nothing.
#define VISITED_ROOT_CLASS(cls) \
bool Visitor::enter(const cls &) \
{ \
VISITOR_CATCH_MISSING_HEADER(cls); \
return false; \
} \
void Visitor::exit(const cls &) \
{ \
}

// Child classes: do what the parent would do.
#define VISITED_CLASS(cls,parent) \
bool Visitor::enter(const cls &c) \
{ \
VISITOR_CATCH_MISSING_HEADER(cls); \
const parent &p = c; \
return enter(p); \
} \
void Visitor::exit(const cls &c) \
{ \
const parent &p = c; \
exit(p); \
}

// Abstract classes: same as for concrete classes.
#define VISITED_ABSTRACT_CLASS(cls, parent) \
VISITED_CLASS(cls,parent)

#include "VisitorSpecs.hpp"

The macro definitions for Visitor.cpp implement the Default Visitor scheme: if a concrete Visitor doesn't define enter() for a given type, the default is picked up from the base class, and looks in the virtual table for the enter() belonging to the parent class. We have Visitor.cpp include the header file VisitedClasses.hpp, which includes the .hpp for every concrete visited class. Thus, when a new class is added to the hierarchy, Visitor.hpp and Visitor.cpp do not require any change whatsoever.

Notice the lines in the function definitions that say VISITOR_CATCH_MISSING_HEADER(cls). The reason for these lines, is that if you add a new class to the hierarchy, but forget to add an include to VisitedClasses.hpp, most compilers give you a confusing error message about initializing the variable p. On the versions of g++ and Solaris CC I've looked at, the message says nothing about an undefined class, let alone a missing header file. So, in Visitor.cpp we define a macro which will give you an error message that points in the direction of the problem, if you have forgotten that include. The macro is:

#ifndef NDEBUG
#define VISITOR_CATCH_MISSING_HEADER(cls) \
typedef cls INCLUDE_ ## cls ## \
_hpp_IN_VisitedClasses_hpp; \
struct Dummy : \
public INCLUDE_ ## cls ## \
_hpp_IN_VisitedClasses_hpp { \
Dummy(); \
}
#else
#define VISITOR_CATCH_MISSING_HEADER(cls) /* nothing */
#endif

We make the macro a no-op for non-debug builds, though it shouldn't really affect the code even if you enable it for release builds. Compare the messages in g++ 4.1.2. Suppose you omitted #include "IfElse.hpp" from VisitedClasses.hpp. Without the macro, the message says VisitorSpecs.hpp:38: error: invalid initialization of reference of type 'const MultiStatement&' from expression of type 'const IfElse'; with the macro, you still get that message, but before it you get: VisitorSpecs.hpp:38: error: invalid use of undefined type 'struct Visitor::enter(const IfElse&)::INCLUDE_IfElse_hpp_IN_VisitedClasses_hpp'. A strange message, but it gets the point across better.

Speaking of compiler errors, we will beef up the macro definitions in Visitor.hpp to catch a common coding error at compile time instead of having to debug strange behavior at runtime. The issue is this: since the enter()/exit() functions in our Visitor are not pure virtual, if you mistakenly add a concrete Visitor with one or more incorrect prototypes for those functions, the code will compile, link, and run -- silently skipping the functions with the incorrect prototypes, because the abstract base Visitor implementations will be found in the virtual table.

For example, if your concrete Visitor declared bool enter(const MultiStatement *) -- with a pointer argument instead of a reference -- then you wouldn't know anything was wrong until you ran the program. In the worst case, you would not even notice that whatever behavior was supposed to happen in that MultiStatement enter() didn't happen; even if you did notice it, you would have to step through in the debugger to watch the visit go into the wrong function. To preclude some of the more common silent-but-deadly typos, we bulk up the Visitor.hpp macros like so:

// The "real" prototypes.
#define VISITED_CLASS_ENTER_EXIT(cls,parent) \
virtual bool enter(const cls &); \
virtual void exit(const cls &);

// Fakes that turn typos into compile errors.
#define FORBID_PTR_ARG(cls,parent) \
virtual _USE_REFERENCE_ARG *enter(const cls *) \
{ return 0; } \
virtual _USE_REFERENCE_ARG *exit(const cls *) \
{ return 0; }
#define FORBID_NON_CONST_ARG(cls,parent) \
virtual _USE_CONST_REF *enter(cls &) \
{ return 0; } \
virtual _USE_CONST_REF *exit(cls &) \
{ return 0; }
#define FORBID_CONST_FUNC(cls,parent) \
virtual _USE_NON_CONST_FUNC *enter(const cls &) \
const { return 0; } \
virtual _USE_NON_CONST_FUNC *exit(const cls &) \
const { return 0; }

// Final macro definitions.
#ifdef NDEBUG
#define VISITED_CLASS(cls,parent) \
VISITED_CLASS_ENTER_EXIT(cls,parent)
#else
class _USE_REFERENCE_ARG;
class _USE_CONST_REF;
class _USE_NON_CONST_FUNC;
#define VISITED_CLASS(cls,parent) \
VISITED_CLASS_ENTER_EXIT(cls,parent) \
FORBID_PTR_ARG(cls,parent) \
FORBID_NON_CONST_ARG(cls,parent) \
FORBID_CONST_FUNC(cls,parent)
#endif

#define VISITED_ABSTRACT_CLASS(cls, parent) \
VISITED_CLASS(cls,parent)

#include "VisitorSpecs.hpp"

Without the FORBID* macros, the incorrect pointer-arg prototype above would prevent our FindLhsVars visitor in our previous example from finding assigned-to variables in complex statements such as blocks or if-else statements. With the macros in place, we get the following compile-time error:
VisitorSpecs.hpp:36: error: overriding 'virtual Visitor::_USE_REFERENCE_ARG* Visitor::enter(const MultiStatement*)'. That is money in the bank.

Are you still with me? Then there's one more feature of this setup that I'd like to point out. Notice that VISITED_CLASS and its related macros can be defined in any file that includes VisitorSpecs.hpp. As useful as Default Visitor is, there are some concrete Visitors that must be updated whenever a new class is added to the hierarchy -- for example, the pretty-printer Visitor from the earlier post. In that case, instead of hard-coding the enter() function declarations in the concrete Visitor class definition, do this:

#define VISITED_CLASS(cls,parent) \
virtual bool enter(const cls &);
#include "VisitorSpecs.hpp"

Now, if you forget to define the enter function for a newly-added class, you will get a link error when the linker tries to build the virtual table. From g++ 4.1.2: Print.cpp:61: undefined reference to `Print::enter(IfElse const&)'. Again, money in the bank. You might also find that there is some boilerplate code you would like to generate for every visited class -- a thought that comes to mind is an enum representing class types in the hierarchy. Simply extend the macro to include a third argument for that extra information -- if it can't be created with a '##' concatenation -- and include VisitorSpecs.hpp in the appropriate place.

Whew! This is a huge post, but when you apply these macro definitions to the improved Visitor in the earlier post, you have a very powerful idiom to work with. I can see a couple more posts coming up in this series: 1. a listing of the final Statement, Expression, and Visitor code presented here (I've been compiling it as I wrote these posts, to make sure it all worked), and an example main() to exercise it; and 2. discussion of a few Visitor issues I've glossed over to get this thing out.

Friday, May 1, 2009

The Default Visitor Idiom

The text of this post was originally in the next post, about our preprocessor framework for C++ Visitors. But I realized that it was a huge distraction from that material, and deserved its own post. To illustrate some points, it builds on the toy Expression hierarchy described here, but using the improved Visitor idiom described here.

Recall that in the Default Visitor idiom, concrete Visitor classes need not implement every enter() function. Whichever classes do not have an enter() will call the enter() of their parent class (the same goes for exit() of course). It's hard to imagine the usefulness in the toy Expression hierarchy of our earlier Visitor posts. To make the example a little more complicated, we'll add some simple Statement classes to our toy visited hierarchy, including an abstract class which is not entirely realistic: MultiStatement, representing a Statement that itself contains one or more other Statements.

Here are the definitions of our new Statement classes (as before, with the disclaimer that it's a toy and not too cleverly designed):

class Statement {
public:
Statement();
virtual ~Statement() throw() = 0;
virtual void accept(Visitor &v) const = 0;
};

class Assignment : public Statement {
public:
Assignment(Expression *lhs, Expression *rhs);
virtual ~Assignment() throw();

const Expression &lhs() const;
const Expression &rhs() const;
void accept(Visitor &v) const;

private:
Expression *_lhs;
Expression *_rhs;
};

class MultiStatement : public Statement {
public:
MultiStatement();
virtual ~MultiStatement() throw();
};

class Block : public MultiStatement {
public:
Block();
virtual ~Block() throw();

const std::vector<Statement *> &stmts() const;
void accept(Visitor &v) const;

void addStmt(Statement *stmt);

private:
std::vector<Statement *> _stmts;
};

class IfElse : public MultiStatement {
public:
IfElse(Expression *condition, Statement *ifBranch,
Statement *elseBranch = NULL);
virtual ~IfElse() throw();

const Expression &condition() const;
const Statement &ifBranch() const;
const Statement *elseBranch() const;
void accept(Visitor &v) const;

private:
Expression *_condition;
Statement *_ifBranch;
Statement *_elseBranch;
};

The Statement::accept() functions look like this:

void Assignment::accept(Visitor &v) const
{
if (v.enter(*this)) {
v.visit(_lhs);
v.visit(_rhs);
v.exit(*this);
}
}

void Block::accept(Visitor &v) const
{
if (v.enter(*this)) {
std::vector<Statement *>::const_iterator it;
for (it = stmts().begin();
it != stmts().end(); ++it) {
const Statement *s = *it;
v.visit(s);
}
v.exit(*this);
}
}

void IfElse::accept(Visitor &v) const
{
if (v.enter(*this)) {
v.visit(*_condition);
v.visit(*_ifBranch);
v.visit(_elseBranch);
v.exit(*this);
}
}

Note that Visitor pushes polymorphism a little farther than C++ inheritance does. As shown in IfElse::accept(), it can visit Statements or Expressions, despite the fact that they do not have a common ancestor! Depending on your project, you may find it convenient to group various independent hierarchies into common Visitor classes. Minor detail: we do need to add Visitor::visit() inline functions for Statement pointers and references, as we did for Expressions.

Here's an example that shows the expressive power of Default Visitor: a class that collects the set of Variables assigned to in a piece of code. For brevity I'll just show the class function definitions, not the class declaration:

FindLhsVars::FindLhsVars()
: Visitor(), _vars(), _lhs(false)
{
}

bool FindLhsVars::enter(const MultiStatement &)
{
return true;
}

bool FindLhsVars::enter(const Assignment &a)
{
_lhs = true;
visit(a.lhs());
_lhs = false;
return false;
}

bool FindLhsVars::enter(const Variable &v)
{
if (_lhs) {
_vars.insert(v.name());
}
return false;
}

const std::set<std::string> &FindLhsVars::
vars() const
{
return _vars;
}

It might not seem like much in our toy example, but consider if you had a realistic language hierarchy, and your Visitor was not a Default Visitor. You would have to tediously define an enter() function for every class in the hierarchy, instead of for three classes. Furthermore, unlike the textbook concrete Visitor, FindLhsVars is likely to still work correctly without any changes if you add a class to the hierarchy. I can't say enough good things about Default Visitor -- it takes one of the sourest Visitor lemons and makes it into a very sweet lemonade.

The next post provides some preprocessor tricks that take a lot of the grunt-work out of adding classes to a Visitor, while also bulletproofing your derived concrete Visitors.