Software EngiSneering: 2009

Friday, December 11, 2009

Emacs vs. Nicer dirs

In the last post I described a nicer version of the shell dirs command that labels each directory with its position, so that you can pushd directly to the one you want.

One problem with it is that emacs shell-mode generally uses dirs to track which directory you're in. And you do want it to track directories for you. Our nicer version of dirs produces output that isn't readable to emacs.

Of course the natural solution is:


 (setq shell-dirstack-query "command dirs")

to pick up the shell's builtin dirs. Sadly, you can't just throw that line into your .emacs file, because shell-mode sets shell-dirstack-query every time you start a new shell. It sets it unconditionally and globally, which is a ridiculous design flaw, but that's how it goes. There is no hook that runs after it sets the value, either.

So you're stuck with needing a workaround:

Name your command something other than dirs .
Use a flag on your new dirs to get the nicer behavior.
Write your own emacs function to start a shell:


  (defun engisneering-shell ()
    (interactive)
    (shell)
    (setq shell-dirstack-query "command dirs"))

Tuesday, November 17, 2009

Nicer pushd, popd, and dirs

Writing up the safer, friendlier bash which command reminded me of another wrapper function that's handy to have. If you use pushd/popd to keep a stack of directories in your shell window, sometimes you want to switch to a directory far down the stack. If you want to switch to the third directory down, you do pushd +3.

The problem is, you have to count along the list of directories, since dirs, pushd, and popd all just spit out a one-line list of the stack, like this:


  $ dirs
  /usr/local/share ~ ~/work ~/tmp

Not an impossible task, but it can get annoying if you have some long paths in there, especially once your stack gets past two or three deep. Why not have dirs print one directory per line, and label them with their depths? To do so, add this function to your .bashrc file:


  function dirs {
      ds=(`command dirs`)
      i=0
      while [ "${ds[$i]}" != "" ]; do
          echo $i: ${ds[$i]};
          i=$((i+1));
      done
  }

Now you get more readable output:


  $ dirs
  0: /usr/local/share
  1: ~
  2: ~/work
  3: ~/tmp

If you want to remove ~/work from the stack, just do popd +2.

Since pushd and popd also print the directory stack, let's add the same style of output to them:


  function pushd {
      if command pushd $@ > /dev/null; then
          dirs
      fi
  }

  function popd {
      if command popd $@ > /dev/null; then
          dirs
      fi
  }

One final note. In case some wacky systemwide file has aliased these commands to something else, add the following unaliasing code to the top of your .bashrc:


  for func in dirs pushd popd; do
      if alias $func > /dev/null 2>&1 ; then unalias $func; fi
  done

Otherwise, your functions will be hidden (aliases take precedence).

PS: emacs shell-mode has a conflict with this dirs: here's how to fix it.

Monday, November 2, 2009

Keep from Killing Emacs Shell Buffers

If you're someone who lives in emacs like I do, then you probably rely on shell buffers (M-x shell) to interact with the operating system. That way you can search through output and edit history commands in a natural way that a mere terminal doesn't allow.

But occasionally I've been left kicking the wall after accidentally killing a shell buffer that had lots of work in it that I still needed. It really hurts when you accidentally kill a buffer where you're waiting for a long-running process to finish. To prevent industrial accidents like that, add a guard to kill-buffer-hook in your .emacs file:


  (add-hook 'kill-buffer-hook
    '(lambda()
       (and
         (string= "Shell" mode-name)
         (or
           (yes-or-no-p
             (format "Kill buffer `%s'? " (buffer-name)))
           (error "Aborted")))))

If you really do want to kill the shell buffer, type "yes", and you're done. But it will save you some grief if you just hit C-x k in the wrong buffer.

What's more common for me is that my eyes are on a file that I want to refresh or replace with C-x C-v (find-alternate-file), but the cursor is in a shell buffer. In that case, emacs will indeed load the alternate file, and you'll be asked "Kill buffer ` **lose**'?" Say no, then you'll have C-x b to " **lose**" (note the leading space), and rename the shell buffer to "*shell*" or whatever you had previously named it.

Monday, October 26, 2009

Bash vs. Which

The Unix which command is useful for searching your $PATH to find which version of an executable is going to run. But did you know that it doesn't always tell you the truth? For instance, it doesn't tell you when the command you're asking about is hidden by a shell builtin (or function, keyword, or alias):


  $ which echo
  /usr/bin/echo

which found /usr/bin/echo on your path, but echo is a shell builtin (at least in bash), so that's what will get run, not the one on the path. The GNU man page for which presents workarounds that will pick up functions and aliases, but shell builtins still slip through the cracks. The situation is even weirder on Solaris, where which is a csh shell-script, that sources your .cshrc file as part of its operation -- very strange if your shell is bash.

Clearly, which is not to be trusted. The easiest workaround, if you use bash, is the type builtin:


  $ type -a echo
  echo is a shell builtin
  echo is /bin/echo

A slightly nicer workaround is to make a bash function for which, that uses type to do the right thing. I also like which to do an ls -l where applicable. Here's what I have in my .bashrc:


  if alias which > /dev/null 2>&1; then unalias which; fi

  function which {
      hash -r
      t=`type -t $@`

      if [ -z "${t%%file*}" ]; then
          for p in `type -p $@`; do
              ls -l ${p};
          done
      else
          type $@;
      fi
  }

Now which gives you much more information:


  $ which -a wish
  lrwxrwxrwx  1 root  root  13 Dec 23 2008 /usr/local/bin/wish -> /usr/bin/wish*
  lrwxrwxrwx  1 root  root   7 Jan  8 2007 /usr/bin/wish -> wish8.3*

The unalias protects you in case some systemwide .bashrc or .profile aliases which in some way (unlikely, but good hygiene anyway). The call to hash clears the hashed function list, so that any changes to your PATH are reflected. The reason for the if statement in the function, is that we certainly want to report builtins, functions, etc., but if there are none, we can use type -p to make it easier to pass the filenames to ls -l. If there are builtins or such, we just punt and give the unaltered type output.

You can pass -a to this which if you want to see all the matches for the command, in order of precedence.

Note that this has to be a function, not a script, because running the script will source your .bashrc, which might change the PATH (all PATH changes should be in .profile, but sometimes people do it in .bashrc).

Friday, October 2, 2009

Csh Stupidity

One of the great things about the Linux era, is that bash seems to be taking over as the preferred shell to use, instead of csh/tcsh. The projects at my current job began during the SunOS era, so when I started there a couple years ago I went with the flow and let the sysadmin set up my default as tcsh just like everyone else there uses. When I was getting started, no one had a reasonable .profile or .bashrc I could use, so I went with tcsh so I could use the tribal .cshrc.

In terminals, the first command I run is "bash"; most of my work is in an emacs shell window, so my .emacs says (setq shell-file-name "/bin/bash") -- actually there's an (if) in there to choose cygwin's bash if I happen to be on Windows. I do a lot of shell programming at the command line, and I simply can't live with csh-style programming.

A few days ago I was working on my .cshrc file, and I noticed how dim-witted csh's if-else handling is. The classic anti-csh diatribe is Csh Programming Considered Harmful, which is an entertaining read, but there's some insanity that isn't even covered in that worthy rant. Let me start by presenting a correct csh script:

setenv FOO hello
if ($FOO == hello) then
echo hi
else if ($FOO == goodbye) then
echo see ya
endif

Source that script, and the output is:

hi

Various typos can have effects on this script that range from annoying to silent but deadly. Here's an annoying one: put the second if on a different line from the else. The output is:

hi
else: endif not found.

Well, at least you got an error message telling you something is up, although requiring two keywords to be on the same line is a level of stupidity worthy of Tcl. What if you do the opposite, and have too many endifs? Well, the output is just "hi", with no error message. I suppose that doesn't bite too much, until later when you nest another if into the script, and it suddenly gets closed by one of your stealth extra endifs.

But here's a bad one. Take the original script. Add an else at the beginning, and an endif at the end. Seems like an else without an accompanying if should get a syntax error, but it doesn't. The extra endif -- which might have been there because we always had an extra one, but never got an error about it -- closes the weird else. And guess what? The script runs with no errors, and no output. The else is kind of a "if not true". How can this be a good thing?

[Update 2013/11/21: Ha! Now this has actually happened to me, in almost the same way described here. I had to add some functionality to an existing script at work. It was a branch added near the top of an if with several branches. I mistakenly added an endif at the end of my new branch. The new functionality (and the branch above it) tested fine, no error reported, so I checked it in. A few days later someone noticed that the later branches weren't running (this script gets launched from a cron job, so the problem wasn't something that would get immediately noticed). The later branches -- starting with an else for which there was no active if -- were silently skipped. Grrrr...]

I did these experiments with csh and tcsh on RedHat. Avoid csh. Long live bash!

Monday, September 21, 2009

Makedepend: Error: Failed to Read ...

I complained a few months ago about makedepend not picking up dependencies, and gave a recipe for avoiding makedepend.

I'm stuck with it for a project, and a couple of times I've gotten a makedepend failure, with only this useless warning:


makedepend: error:  failed to read [some directory name]

The problem ended up being an #include somewhere, where the opening double-quote (") was missing. One time I accidentally had a single-quote at the beginning, closed with a double-quote; the other time a co-worker left off the opening quote, but did have the closing quote.

No help from makedepend on which file has the faulty include: it just gives up on the whole directory. Thanks, makedepend. Notice that if you leave off both quotes, or have both as single-quotes, makedepend is kind enough to give you a meaningful message. Or, if you open with a double-quote, and forget the closing quote, it exits without an error! Though I'm not sure if it gets the dependencies right in that case.

Do yourself a favor, use gcc's dependency-generating scheme, and never use makedepend again.

Wednesday, July 15, 2009

Perl Cwd Performance

The Camel book says -- in the Efficiency section of Chapter 24 -- to use the Cwd module instead of calling pwd repeatedly.

Boy, it's not kidding. I wrote a test script that does almost nothing but change directories and print the current directory, repeating that about 7500 times. Using `pwd` to get the directory took an average wall-clock time of 11.7 seconds. If instead you import Cwd's version of chdir(), the script can use $ENV{PWD}, which took an average wall-clock time of 0.3 seconds.

Oddly, using Cwd::cwd() took an average of 15.4 seconds. This wasn't a very scientific test, but those numbers bore up under repeated tries (doing several runs in a row, throwing out the time for the first run). I used wall-clock time, because the sys/user time reported by Unix time was quite a bit lower, even though it was an unloaded system. Maybe because the directories were NFS mounts.

So here's a good strategy for using Cwd:


  use Cwd qw(chdir);
  sub cwd {
    return $ENV{PWD};
  }

Note: the Cwd documentation points out that $ENV{PWD} is only kept up to date if every module used in the script uses Cwd::chdir to change directories.

Monday, July 6, 2009

gcc Warning Options

A few years ago I was in a job interview, and the development manager was gloating because he had squashed all compiler warnings in the code, and then added the gcc -Werror switch to the build, so that warnings now caused the build to fail. At the time, I thought "That's annoying overkill". My feeling was that compiler warnings didn't live very long -- after someone saw one a few times, they would fix it.

But I've changed my mind. Where I work, the team had even been doing without -Wall for many years. They had been using -pedantic. In other words, the warnings were all noise, and no signal, since the pedantic warnings don't tell you about potential bugs in your code, just about adherence to language standards. Moreover, they were completely happy to ignore all compiler warnings -- even those for things so dangerous that gcc warns about them without any warning switches, like 64-bit size issues. I now understand why someone would insist on -Werror.

Really, the best set of warnings for g++ is -Wall -Wextra -Werror. [If you have an older gcc, like the 3.2.3 that's stock on RedHat 3, use -W instead of -Wextra.] When we switched these on recently, there were thousands of unique warnings. Many of them were harmless in the end of the day, but there were dozens that indicated serious code bugs and/or crashes waiting to happen.

Of course, you can disable some of the -Wextra warnings. On our RedHat 3 build, we find it convenient to add these disabling flags:

-Wno-parentheses: too picky
-Wno-unused-parameter: provoked by <fstream>
-Wno-deprecated: provoked by <strstream>

In the best possible world, we would use <sstream> and wouldn't need -Wno-deprecated. But that's a battle for another day. For today, I'm just very happy that we have real warnings turned on, and that they get attention due to -Werror.

Monday, May 18, 2009

Emacs Registers

A few weeks ago when I was talking about tricks to get back to your editing location in emacs, I neglected to mention the jump-to-register command, by default bound to C-x r j. Actually, the first step is to save a location to a register: C-x r SPC (point-to-register). You'll be prompted to enter a character to refer to the location.

The nice thing is, when you jump-to-register, you don't have to be in the same buffer or file as the saved location. So you can mark places that are interesting, go explore some other stuff, and then immediately leap back to the buffer and line that you need.

Do you use emacs registers much? There are a few things you can do with them, the most common being point-to-register/jump-to-register and copy-to-register/insert-register. Whereas the kill-ring is like a text clipboard vector, copy-to- and insert-register are like a clipboard associative array.

I find it useful to keep register identifiers (the single character that names a register) partitioned into namespaces:

punctuation marks: canned useful strings defined with set-register that my .emacs loads at startup
numbers: locations for jump-to-register
letters: temporary copy-to-register strings for ad hoc cut/paste operations

It was only a few weeks ago that I realized that control-keystrokes or meta-keystrokes can be register names. It probably makes more sense to let the letter namespace represent canned strings along with punctuation marks, and use control characters for the quotidian things that letters represent for me right now. I might gradually shift to that way of doing business.

Sunday, May 3, 2009

Handy C++ Visitor Macros

This is the fifth post on the subject of the Visitor design pattern. First I tried to explain the motivation for Visitor, then presented the textbook implementation, what I consider to be some improvements on that implementation, and finally an example of the power of the Default Visitor variant of the design pattern. In this post, I want to show some preprocessor macros that make Visitor even handier. Brace yourself, this is going to be a long post. There's a lot to discuss.

The C preprocessor is a dangerous weapon when used indiscriminately. In the case of our C++ Visitors, however, some preprocessor magic will provide us with the following benefits:

Compiler errors (not warnings) when a concrete Visitor class has not been defined correctly.
Automatic generation of tedious parts of the Default Visitor implementation.
Linker errors when Visitors which must not implement Default Visitor are not updated to visit a newly-added class.

We need to add the Statement hierarchy described in the last post to the Visitor class we declared earlier. Instead of hand-coding everything, we are going to write a header file that uses some preprocessor macros to describe the hierarchy of visited classes. If we also add a header file containing #include directives for each concrete visited class, then we won't have to edit Visitor.hpp or Visitor.cpp when we add classes to the hierarchy in the future. Nifty!

Let's call the first header file VisitorSpecs.hpp. It looks like this:


// NOTE: No include guard for this header!

#ifndef VISITED_ABSTRACT_CLASS
#define VISITED_ABSTRACT_CLASS(cls,parent)
#endif

#ifndef VISITED_ROOT_CLASS
#define VISITED_ROOT_CLASS(cls) \
  VISITED_ABSTRACT_CLASS(cls,0)
#endif

VISITED_ROOT_CLASS(Expression)
VISITED_CLASS(Variable, Expression)
VISITED_CLASS(Number, Expression)
VISITED_CLASS(Operation, Expression)

VISITED_ROOT_CLASS(Statement)
VISITED_CLASS(Assignment, Statement)
VISITED_ABSTRACT_CLASS(MultiStatement, Statement)
VISITED_CLASS(Block, MultiStatement)
VISITED_CLASS(IfElse, MultiStatement)

#undef VISITED_ROOT_CLASS
#undef VISITED_ABSTRACT_CLASS
#undef VISITED_CLASS

Hang on a second... This defines a couple of macros as no-ops, and leaves another macro undefined. What gives? The answer is that Visitor.hpp defines the macros one way before including VisitorSpecs.hpp, and Visitor.cpp defines the macros another way, and also includes VisitorSpecs.hpp. That's why the header must not have an include guard: it will get included into Visitor.cpp multiple times. You may even find other consumers for VisitorSpecs.hpp, as you'll see a little later.

The macros are going to end up a little bit complicated, to provide some of the wonderful compiler fail-safes I've been hyping. But to make things clear, let's present the basic implementation first. Visitor.hpp has to include VisitorSpecs.hpp two separate times. First to declare the class names:


#define VISITED_CLASS(cls,parent) class cls;
#define VISITED_ABSTRACT_CLASS(cls, parent) \
  VISITED_CLASS(cls,parent)
#include "VisitorSpecs.hpp"

Then to declare the member functions, Visitor.hpp defines the macros as follows:


#define VISITED_CLASS(cls,parent) \
  virtual bool enter(const cls &); \
  virtual void exit(const cls &);
#define VISITED_ABSTRACT_CLASS(cls, parent) \
  VISITED_CLASS(cls,parent)
#include "VisitorSpecs.hpp"

Remember, the "root" classes (those with no parent) are defined in terms of abstract classes. So in fact, Visitor.hpp declares enter() and exit() for every visited class.

Visitor.cpp defines default implementations for the member functions by defining the macros like this:


// No parent default: don't recurse, do nothing.
#define VISITED_ROOT_CLASS(cls) \
bool Visitor::enter(const cls &) \
{ \
  VISITOR_CATCH_MISSING_HEADER(cls); \
  return false; \
} \
void Visitor::exit(const cls &) \
{ \
}

// Child classes: do what the parent would do.
#define VISITED_CLASS(cls,parent) \
bool Visitor::enter(const cls &c) \
{ \
  VISITOR_CATCH_MISSING_HEADER(cls); \
  const parent &p = c; \
  return enter(p); \
} \
void Visitor::exit(const cls &c) \
{ \
  const parent &p = c; \
  exit(p); \
}

// Abstract classes: same as for concrete classes.
#define VISITED_ABSTRACT_CLASS(cls, parent) \
  VISITED_CLASS(cls,parent)

#include "VisitorSpecs.hpp"

The macro definitions for Visitor.cpp implement the Default Visitor scheme: if a concrete Visitor doesn't define enter() for a given type, the default is picked up from the base class, and looks in the virtual table for the enter() belonging to the parent class. We have Visitor.cpp include the header file VisitedClasses.hpp, which includes the .hpp for every concrete visited class. Thus, when a new class is added to the hierarchy, Visitor.hpp and Visitor.cpp do not require any change whatsoever.

Notice the lines in the function definitions that say VISITOR_CATCH_MISSING_HEADER(cls). The reason for these lines, is that if you add a new class to the hierarchy, but forget to add an include to VisitedClasses.hpp, most compilers give you a confusing error message about initializing the variable p. On the versions of g++ and Solaris CC I've looked at, the message says nothing about an undefined class, let alone a missing header file. So, in Visitor.cpp we define a macro which will give you an error message that points in the direction of the problem, if you have forgotten that include. The macro is:


#ifndef NDEBUG
#define VISITOR_CATCH_MISSING_HEADER(cls) \
  typedef cls INCLUDE_ ## cls ## \
    _hpp_IN_VisitedClasses_hpp; \
  struct Dummy : \
    public INCLUDE_ ## cls ## \
      _hpp_IN_VisitedClasses_hpp { \
        Dummy(); \
  }
#else
#define VISITOR_CATCH_MISSING_HEADER(cls) /* nothing */
#endif

We make the macro a no-op for non-debug builds, though it shouldn't really affect the code even if you enable it for release builds. Compare the messages in g++ 4.1.2. Suppose you omitted #include "IfElse.hpp" from VisitedClasses.hpp. Without the macro, the message says VisitorSpecs.hpp:38: error: invalid initialization of reference of type 'const MultiStatement&' from expression of type 'const IfElse'; with the macro, you still get that message, but before it you get: VisitorSpecs.hpp:38: error: invalid use of undefined type 'struct Visitor::enter(const IfElse&)::INCLUDE_IfElse_hpp_IN_VisitedClasses_hpp'. A strange message, but it gets the point across better.

Speaking of compiler errors, we will beef up the macro definitions in Visitor.hpp to catch a common coding error at compile time instead of having to debug strange behavior at runtime. The issue is this: since the enter()/exit() functions in our Visitor are not pure virtual, if you mistakenly add a concrete Visitor with one or more incorrect prototypes for those functions, the code will compile, link, and run -- silently skipping the functions with the incorrect prototypes, because the abstract base Visitor implementations will be found in the virtual table.

For example, if your concrete Visitor declared bool enter(const MultiStatement *) -- with a pointer argument instead of a reference -- then you wouldn't know anything was wrong until you ran the program. In the worst case, you would not even notice that whatever behavior was supposed to happen in that MultiStatement enter() didn't happen; even if you did notice it, you would have to step through in the debugger to watch the visit go into the wrong function. To preclude some of the more common silent-but-deadly typos, we bulk up the Visitor.hpp macros like so:


// The "real" prototypes.
#define VISITED_CLASS_ENTER_EXIT(cls,parent) \
  virtual bool enter(const cls &); \
  virtual void exit(const cls &);

// Fakes that turn typos into compile errors.
#define FORBID_PTR_ARG(cls,parent) \
  virtual _USE_REFERENCE_ARG *enter(const cls *) \
    { return 0; } \
  virtual _USE_REFERENCE_ARG *exit(const cls *) \
    { return 0; }
#define FORBID_NON_CONST_ARG(cls,parent) \
  virtual _USE_CONST_REF *enter(cls &) \
    { return 0; } \
  virtual _USE_CONST_REF *exit(cls &) \
    { return 0; }
#define FORBID_CONST_FUNC(cls,parent) \
  virtual _USE_NON_CONST_FUNC *enter(const cls &) \
    const { return 0; } \
  virtual _USE_NON_CONST_FUNC *exit(const cls &) \
    const { return 0; }

// Final macro definitions.
#ifdef NDEBUG
#define VISITED_CLASS(cls,parent) \
  VISITED_CLASS_ENTER_EXIT(cls,parent)
#else
  class _USE_REFERENCE_ARG;
  class _USE_CONST_REF;
  class _USE_NON_CONST_FUNC;
#define VISITED_CLASS(cls,parent) \
  VISITED_CLASS_ENTER_EXIT(cls,parent) \
  FORBID_PTR_ARG(cls,parent) \
  FORBID_NON_CONST_ARG(cls,parent) \
  FORBID_CONST_FUNC(cls,parent)
#endif

#define VISITED_ABSTRACT_CLASS(cls, parent) \
  VISITED_CLASS(cls,parent)

#include "VisitorSpecs.hpp"

Without the FORBID* macros, the incorrect pointer-arg prototype above would prevent our FindLhsVars visitor in our previous example from finding assigned-to variables in complex statements such as blocks or if-else statements. With the macros in place, we get the following compile-time error:
VisitorSpecs.hpp:36: error: overriding 'virtual Visitor::_USE_REFERENCE_ARG* Visitor::enter(const MultiStatement*)'. That is money in the bank.

Are you still with me? Then there's one more feature of this setup that I'd like to point out. Notice that VISITED_CLASS and its related macros can be defined in any file that includes VisitorSpecs.hpp. As useful as Default Visitor is, there are some concrete Visitors that must be updated whenever a new class is added to the hierarchy -- for example, the pretty-printer Visitor from the earlier post. In that case, instead of hard-coding the enter() function declarations in the concrete Visitor class definition, do this:


#define VISITED_CLASS(cls,parent) \
  virtual bool enter(const cls &);
#include "VisitorSpecs.hpp"

Now, if you forget to define the enter function for a newly-added class, you will get a link error when the linker tries to build the virtual table. From g++ 4.1.2: Print.cpp:61: undefined reference to `Print::enter(IfElse const&)'. Again, money in the bank. You might also find that there is some boilerplate code you would like to generate for every visited class -- a thought that comes to mind is an enum representing class types in the hierarchy. Simply extend the macro to include a third argument for that extra information -- if it can't be created with a '##' concatenation -- and include VisitorSpecs.hpp in the appropriate place.

Whew! This is a huge post, but when you apply these macro definitions to the improved Visitor in the earlier post, you have a very powerful idiom to work with. I can see a couple more posts coming up in this series: 1. a listing of the final Statement, Expression, and Visitor code presented here (I've been compiling it as I wrote these posts, to make sure it all worked), and an example main() to exercise it; and 2. discussion of a few Visitor issues I've glossed over to get this thing out.

Friday, May 1, 2009

The Default Visitor Idiom

The text of this post was originally in the next post, about our preprocessor framework for C++ Visitors. But I realized that it was a huge distraction from that material, and deserved its own post. To illustrate some points, it builds on the toy Expression hierarchy described here, but using the improved Visitor idiom described here.

Recall that in the Default Visitor idiom, concrete Visitor classes need not implement every enter() function. Whichever classes do not have an enter() will call the enter() of their parent class (the same goes for exit() of course). It's hard to imagine the usefulness in the toy Expression hierarchy of our earlier Visitor posts. To make the example a little more complicated, we'll add some simple Statement classes to our toy visited hierarchy, including an abstract class which is not entirely realistic: MultiStatement, representing a Statement that itself contains one or more other Statements.

Here are the definitions of our new Statement classes (as before, with the disclaimer that it's a toy and not too cleverly designed):


class Statement {
public:
  Statement();
  virtual ~Statement() throw() = 0;
  virtual void accept(Visitor &v) const = 0;
};

class Assignment : public Statement {
public:
  Assignment(Expression *lhs, Expression *rhs);
  virtual ~Assignment() throw();

  const Expression &lhs() const;
  const Expression &rhs() const;
  void accept(Visitor &v) const;

private:
  Expression *_lhs;
  Expression *_rhs;
};

class MultiStatement : public Statement {
public:
  MultiStatement();
  virtual ~MultiStatement() throw();
};

class Block : public MultiStatement {
public:
  Block();
  virtual ~Block() throw();

  const std::vector<Statement *> &stmts() const;
  void accept(Visitor &v) const;

  void addStmt(Statement *stmt);

private:
  std::vector<Statement *> _stmts;
};

class IfElse : public MultiStatement {
public:
  IfElse(Expression *condition, Statement *ifBranch,
         Statement *elseBranch = NULL);
  virtual ~IfElse() throw();

  const Expression &condition() const;
  const Statement &ifBranch() const;
  const Statement *elseBranch() const;
  void accept(Visitor &v) const;

private:
  Expression *_condition;
  Statement *_ifBranch;
  Statement *_elseBranch;
};

The Statement::accept() functions look like this:


void Assignment::accept(Visitor &v) const
{
  if (v.enter(*this)) {
    v.visit(_lhs);
    v.visit(_rhs);
    v.exit(*this);
  }
}

void Block::accept(Visitor &v) const
{
  if (v.enter(*this)) {
    std::vector<Statement *>::const_iterator it;
    for (it = stmts().begin();
         it != stmts().end(); ++it) {
      const Statement *s = *it;
      v.visit(s);
    }
    v.exit(*this);
  }
}

void IfElse::accept(Visitor &v) const
{
  if (v.enter(*this)) {
    v.visit(*_condition);
    v.visit(*_ifBranch);
    v.visit(_elseBranch);
    v.exit(*this);
  }
}

Note that Visitor pushes polymorphism a little farther than C++ inheritance does. As shown in IfElse::accept(), it can visit Statements or Expressions, despite the fact that they do not have a common ancestor! Depending on your project, you may find it convenient to group various independent hierarchies into common Visitor classes. Minor detail: we do need to add Visitor::visit() inline functions for Statement pointers and references, as we did for Expressions.

Here's an example that shows the expressive power of Default Visitor: a class that collects the set of Variables assigned to in a piece of code. For brevity I'll just show the class function definitions, not the class declaration:


FindLhsVars::FindLhsVars()
  : Visitor(), _vars(), _lhs(false)
{
}

bool FindLhsVars::enter(const MultiStatement &)
{
  return true;
}

bool FindLhsVars::enter(const Assignment &a)
{
  _lhs = true;
  visit(a.lhs());
  _lhs = false;
  return false;
}

bool FindLhsVars::enter(const Variable &v)
{
  if (_lhs) {
    _vars.insert(v.name());
  }
  return false;
}

const std::set<std::string> &FindLhsVars::
vars() const
{
  return _vars;
}

It might not seem like much in our toy example, but consider if you had a realistic language hierarchy, and your Visitor was not a Default Visitor. You would have to tediously define an enter() function for every class in the hierarchy, instead of for three classes. Furthermore, unlike the textbook concrete Visitor, FindLhsVars is likely to still work correctly without any changes if you add a class to the hierarchy. I can't say enough good things about Default Visitor -- it takes one of the sourest Visitor lemons and makes it into a very sweet lemonade.

The next post provides some preprocessor tricks that take a lot of the grunt-work out of adding classes to a Visitor, while also bulletproofing your derived concrete Visitors.

Wednesday, April 15, 2009

Gdb set input-radix

Today I messed up a debugging session I had going, when I forgot the gdb syntax for setting the value of a variable in the code. It wouldn't have been a problem for most variables, but I wanted to set the variable i. I typed:

  (gdb) set i 100

the surprising response to that -- from gdb version 6.3.0.0-1.132.EL3rh -- was:

  Input radix now set to decimal 100, hex 64, octal 144.

Say what? Of course the correct syntax for setting a variable in the debug program is:

  (gdb) set var i = 100

You see, gdb was trying to be nice by interpreting "i" as shorthand for "input-radix", a sub-command of "set". If my variable had some other name that didn't trigger the shorthand, I would have gotten a friendly error message and no ill effects:

  (gdb) set flags 100
  A syntax error in expression, near `100'.

Having numbers you enter interpreted as base-100 is not very useful, and I ended up killing my session and restarting because I didn't figure out the trick for setting the input-radix back to 10 until I had gotten gdb into what seemed to be a useless state. The actual sequence of events was like this:

  (gdb) set i 100
  Input radix now set to decimal 100, hex 64, octal 144.
  (gdb) set i 10
  Input radix now set to decimal 100, hex 64, octal 144.
  (gdb) # Huh? Oh yeah, base100("10") == base10("100").
  (gdb) set i 1
  Nonsense input radix ``decimal 1''; input radix unchanged.
  (gdb) # D'oh! base100("1") isn't 10, it's 1.
  (gdb) set i 10
  Invalid number "10".
  (gdb) # Oops!

Notice that even though it said the radix was unchanged, gdb really did change the radix to base-1! At that point, I thought I was toast, since it couldn't recognize any numbers except 0. I killed my session, losing my breakpoints and history. Later I realized that there is a way out: use a hex number:

  (gdb) set i 0xa
  Input radix now set to decimal 10, hex a, octal 12.

Now, set input-radix could be a handy little thing, if it were restricted to values you might actually want to use as the base for numbers you type in to the debugger. Say, 2, 8, 10, and 16. But 100? You can't even enter numbers between 16 and 100: it recognizes "F" as 15 -- as long as there's no variable called "F" in the debug context -- but it doesn't recognize "G" as 16. You can even set the input radix to 0! Watch this:

  (gdb) set i 0x0
  Input radix now set to decimal 4294967295, hex ffffffff, octal 37777777777.

The corresponding set output-radix command is restricted to 8, 10, and 16. Base 2 would have been nice, but whatever. It's ridiculous that set input-radix allows anything including 0 and 1.

Wednesday, April 8, 2009

Extra Characters After Close-Brace

As noted previously, Tcl forces you to put open-braces "{" on the same line as the if or proc that they belong to, and forces you to use the atrocious "} else {" style.

That's appalling, but get this: it also requires a space to the right of the close-brace "}". That's just plain stupid.

Compounding the problem, the error message isn't something like "a space is required after a close-brace". Oh no. It says "extra characters after close-brace". And if the body of your if is very long, like this:

if {$tcl_is_stupid} {
  # ... many lines of code ...
  # Next line is the error
}else{
  i_eat_my_hat
}

then the error message is truncated so that you only see the part of the command that isn't broken, along with the line number of the "if", not the offending "else". As icing on the cake, in the case I was trying to help someone figure out, the error was at the top of one of Tcl's Icelandic Saga stacktraces, and the line number reported to be in error was with respect to the proc it was in, not the file.

Thankfully, Google led me to the answer, because Tcl sure didn't. I love the subtitle of that Tcl wiki page: Purpose: to discuss one of the few 'gotchas' in Tcl. Ha! What a crock!

Sunday, April 5, 2009

Easier-to-Use C++ Visitors

A recent post described the basic structure of the Visitor design pattern for C++. We used the visitFoo()/accept() idiom presented in the GoF book.

Now let's tweak the design a bit, to make things a little prettier, and to make it easier to derive classes from Visitor. After that, I'll need to do another post on preprocessor tricks that generate boilerplate code and set up compile-time errors for common Visitor coding mistakes.

Cosmetic Changes

It's a little awkward to invoke the Visitor code by calling a member function of the visited base class -- obj.accept(v) -- so let's add inline functions to Visitor to provide the more natural idiom of v.visit(obj):


  void visit(const Expression &e) {
    e.accept(*this);
  }
  void visit(const Expression *e) {
    if (e) {
      e->accept(*this);
    }
  }

Note: this means that Visitor.hpp includes Expression.hpp. That is the correct dependency -- Visitors know more about Expressions than vice versa. Thus, Expression.hpp will declare class Visitor instead of including Visitor.hpp.

Don't be tempted to use operator()(const Expression &e) instead of visit(). At first it sounds clever that you will be able to write Print print(stream); print(obj);, but it's bewildering to new readers of your code, and it will even make you scratch your own head when you revisit that code a year into the future. It's much easier to read Print printer(stream); printer.visit(obj);. Trust me, I've been there, trying to figure out what my own program was doing.

Next, I disagree with encoding the type names into the Visitor function names -- it sounds silly to read the declarations aloud: visitOperation(Operation &) -- is there an echo in here? C++ has function overloading -- let's use it, and give all the Visitor members the same short name. We just used up visit(), so we'll have to think of something else. For reasons that will make more sense in a moment, let's use the name enter():


  virtual void enter(const Variable &);
  virtual void enter(const Number &);
  virtual void enter(const Operation &);

A pleasant side effect of this is that there is less room for error when you change the name of a class. If the type name is encoded in the function name, and you forget to change the function name in a concrete visitor, you have just disabled the visitor for that type.

Ease-of-use

Let's fix some usability defects in the basic Visitor implementation. The worst one is that a recursive operation has to know the structure of the visited objects -- see Print::visitOperation() in the earlier post. That is a breakdown of encapsulation, or responsibility, or both.

On the other hand, some concrete Visitors do not want to recurse into subobjects -- they just want to operate on the one visited object. To accomodate both kinds of Visitor, we move the recursiveness into accept() -- makes sense, it's the object's responsibility -- but change the signature of enter() to allow the derived Visitor to choose whether to recurse or not, by returning either true (recurse) or false (don't).

Now the enter() prototypes look like this:


  virtual bool enter(const Variable &);
  virtual bool enter(const Number &);
  virtual bool enter(const Operation &);

The no-op defaults in the base class return false. I went around and around with myself on the question of whether to make recursion the default, but I think the answer is, If your Visitor works harder by recursing, it should require more code to get it done -- overriding the defaults to recurse where needed.

Now that responsibility for recursion has returned to the visited hierarchy, Operation::accept() looks like this:


void Operation::accept(Visitor &v) const
{
  if (v.enter(*this)) {
    std::vector<Expression *>::const_iterator it;
    for (it = operands().begin();
         it != operands().end();
         ++it) {
      const Expression *e = *it;
      v.visit(e);
    }
    v.exit(*this);
  }
}

Handling recursion in the object does the trick most of the time, but the sad thing is, it still doesn't fix the encapsulation issue with our pretty-printer example. To print an infix expression, the Print Visitor has to know the structure of Operator. That's just a fact of life, but if we change the Visitor interface just a tiny bit, we can at least print expressions in prefix and postfix notations without loss of encapsulation. All that's needed is an exit() function to correspond to the enter() function, which will close parentheses for prefix expressions, or print the operator for postfix expressions.

As you can guess, the exit() function returns void, and the default implementation in abstract base Visitor does nothing. If you have sharp eyes, you'll notice that I already added a call to it in Operation::accept() above. It only gets called if the Visitor chose to recurse. Now a postfix pretty-printer is much simpler than the infix version presented in the earlier post:


bool Postfixer::enter(const Variable &v)
{
  _str << v.name() << " ";
  return false;
}

bool Postfixer::enter(const Number &n)
{
  _str << n.value() << " ";
  return false;
}

bool Postfixer::enter(const Operation &o)
{
  return true;
}

void Postfixer::exit(const Operation &o)
{
  _str << o.symbol() << " ";
}

Most importantly, it knows nothing about the structure of Operation.

This post has dragged on long enough, but there is one last ease-of-use improvement I want to make to our Visitor. It must implement the Default Visitor idiom, which makes the double-dispatch polymorphic in both parameters. In simpler terms, if the concrete visitor does not implement, say, enter(Foo &), the default action is not "do nothing", it is enter(ParentOfFoo &). It's hard to visualize the benefit of this in my tiny toy example, but in a realistic hierarchy -- for example, a language hierarchy with abstract Declarations, Statements, and Expressions -- it often saves a lot of coding. Note that in our example, it means we have to add a function to Visitor that hasn't been there before: enter(Expression &).

Next up, some preprocessor magic which generates a lot of the code for these Visitors -- including the Default Visitor fallbacks -- and which also gives you distant early warning of certain common typos.

Friday, April 3, 2009

Last Element in a Tcl List

Oh, thank you Tcl, for providing the special string end. That way I can get the last element in a list with:


  [lindex $mylist end]

What a relief! I was about to type:


  [lindex $mylist [expr {[llength $mylist] - 1}]]

And if I had had to type that, my head would have exploded. Fortunately, I got away with just a pulsing vein on my temple at the thought of a pseudo-keyword like end.

Sunday, March 15, 2009

C++ Visitors: Basic Implementation

In the last post, I described the pros and cons of the Visitor design pattern, and promised some sample C++. This post has a basic implementation of Visitor, to illustrate the concepts. In later posts, we'll build a more powerful and yet more user-friendly Visitor, and use some preprocessor tricks to partially foolproof the use of Visitor.

One place that I've found Visitor useful is in class hierarchies that describe a language -- in my case, usually a hardware description language, but it would apply equally to classes representing a programming or scripting language. In a language project, you often find yourself traversing a parse tree of the input code, performing functions which can't properly be said to be the responsibility of the language classes. Even if you could make a case for adding virtual functions to achieve the task, it makes sense to keep the related functionality together in one file. But you want to dispatch on the node type, so that your code doesn't devolve into a forest of switch statements. This is exactly the place to use Visitor.

Since language parse trees are a common place to apply Visitor, let's start our sample C++ code with a toy class hierarchy for expressions. There are only three concrete classes: Variable, Number, and Operation, all derived from abstract Expression. I kept these classes very simple for brevity's sake, hopefully more care would go into their design in real life:


class Expression {
public:
 Expression();
 virtual ~Expression() throw() = 0;
 virtual void accept(Visitor &v) const = 0;
};

class Number : public Expression {
public:
 explicit Number(int value);
 virtual ~Number() throw();
 int value() const;
 virtual void accept(Visitor &v) const;

private:
 int _value;
};

class Variable : public Expression {
public:
 explicit Variable(const std::string &name);
 virtual ~Variable() throw();
 std::string name() const;
 virtual void accept(Visitor &v) const;

private:
 std::string _name;
};

class Operation : public Expression {
public:
 enum Operator {
   PLUS, MINUS, MAX_OPERATOR
 };
 Operation(const Operator &op,
           Expression *lhsOrOnly,
           Expression *rhs = NULL);
 virtual ~Operation() throw();
 Operator op() const;
 const char *symbol() const;
 const std::vector<Expression *> &operands() const;
 virtual void accept(Visitor &v) const;

private:
 Operator _op;
 std::vector<Expression *> _operands;
};

Notice that Expression has a pure virtual accept() function. We need that in order to bounce into the correct function in Visitor classes -- we'll discuss that in a moment. Now let's see what the abstract base Visitor class looks like. It's just an interface, any real visiting work will be done by concrete visitors derived from this. As I said, we'll be making the interface more sophisticated in the next post, but here's our first crack at it:


class Visitor {
public:
 Visitor();
 virtual ~Visitor();

 virtual void visitVariable(const Variable &);
 virtual void visitNumber(const Number &);
 virtual void visitOperation(const Operation &);
};

Our first implementation of accept() is quite simple, so I'll only show one example:


void Operation::accept(Visitor &v) const
{
 v.visitOperation(*this);
}

C++ Visitors are coded with this accept/visitFoo idiom to work around C++'s lack of multiple dispatch. Calling visitor.visit(*expr) would pick visit()'s signature at compile time based on the static type of expr. So we have to call expr->accept(visitor), which picks the accept() function at run time based on expr's actual type. Inside accept(), *this has the correct static type and can go to the correct member of the Visitor.

Some things to note about this code:

It's usually reasonable for accept() to be a const function and visit*() to take const references. But you may need it to be otherwise. Remember, Visitors are like external virtual functions. Will the virtual functions you'll add be const or non-const?
On the other hand, you definitely want visit*() to be non-const, and for accept to take a non-const reference. Concrete Visitor classes may need to store some state.
Make ~Visitor() virtual but not pure, and don't declare private copy/assign members (see 5.3.1) for Visitor, so that derived Visitors can rely on compiler defaults if they wish.
For the same reason, it seems better to define no-op defaults for the visit*() functions instead of making them pure virtual.
As written, Operation::accept() does not traverse the Expressions contained in the Operation. That means that concrete Visitors have to do so themselves if they want to. Really such traversal should be Operation's responsibility; we'll add that capability in our next pass.

So how do you use this thing? A simple example is class Print : public Visitor, which will print one of our Expressions. The interface is self-explanatory -- a constructor and inherited virtual visit*() functions -- so I'll just give the implementation:


Print::Print(std::ostream &str)
 : Visitor()
 , _str(str)
{
}

void Print::visitVariable(const Variable &v)
{
 _str << v.name();
}

void Print::visitNumber(const Number &n)
{
 _str << n.value();
}

void Print::visitOperation(const Operation &o)
{
 _str << "(";
 std::vector<Expression *>::const_iterator it =
   o.operands().begin();

 while (it != o.operands().end()) {
   const Expression *e = *it;
   ++it;
   if (it == o.operands().end()) {
     // Print unary or binary operator.
     _str << o.symbol();
   }
   e->accept(*this);
 }
 _str << ")";
}

The Print class is suitable for framing in an ostream shift operator:


std::ostream &operator<<(
 std::ostream &str, const Expression &expr)
{
 Print printer(str);
 expr.accept(printer);
 return str;
}

There you go. The basics of C++ Visitors. The toy example doesn't really do justice to how useful they are: Visitors are a big help when you have a big hierarchy to deal with. But the simple design presented here runs into some difficulties when you have a big hierarchy. Coming up, I'll show you some tricks to get the power of Visitors while softening some of the difficulties with using them.

Here is the complete list of posts in this series:

Sunday, March 8, 2009

C++ Visitor Pattern: Motivation

The Visitor design pattern is perhaps not the best-loved of the GoF patterns. One indication of its unpopularity is its position as the last pattern in the GoF Design Patterns book. But I have found it useful in several projects, and would like to share some Visitor tricks I've learned along the way.

In C++, Visitor is an abstract base class, with a set of one-argument virtual functions -- one for each class in a forest of hierarchies -- thereby providing "multiple-dispatch" of some piece of functionality. Virtual functions are an example of single-dispatch: based on the type of an object, the correct virtual function is called. Multiple-dispatch means that a function is selected based on more than one type: in this case, the type of the visitor, and the type of the class it visits.

Have you ever been writing some code, and thought: "I wish I could add a virtual function to these classes, without changing the class interfaces"? If so, that is exactly what Visitor allows. Of course, you have to have designed the class to accept a Visitor first, but after that, you have this "external virtual function" capability.

Another way in which Visitor is helpful in big C++ projects, is that it allows you to have good source file hygiene -- as described in our coding standard -- but still keep related functionality in one place. An example is pretty-printing of an Expression hierarchy: to make consistency easier to maintain, you would like all the printing functions to be in a single file, which would violate our source file standards if implemented by virtual functions in the classes.

Sounds good, why is Visitor unpopular? There are a couple of valid complaints about the pattern. First, it can make for difficulties when adding a new class to a visited hierarchy. This is somewhat mitigated by the Default Visitor variant, and we can also use some preprocessor tricks to create a compile- or link-time error for Visitors which must be extended for some or all new classes. It's also worth mentioning that if the Visitor functionality were instead implemented as a pure virtual function, the same amount of work is required (except for tracking down all the Visitors).

Secondly, effective use of Visitors may imply a loss of encapsulation. If a Visitor is to do meaningful work on an object, it may need access to implementation details that -- all other things being equal -- we would prefer to hide inside the class.

Nevertheless, Visitor can be a very useful design pattern. In coming posts I will present some practical C++ examples for getting the most out of it.

Here is the complete list of posts in this series:

Wednesday, February 25, 2009

Getting Back to Where You Were in Emacs

When you do an interactive search in emacs (e.g.: C-s or C-r), the mark is set at the place you began your search. So when you're ready to go back to the editing you were doing when you started searching, a simple C-x C-x (exchange-point-and-mark) gets you back to where you were.

That's so second-nature to me that I am always a little surprised when C-x C-x doesn't take me where I expect, as can happen when you light one search off the end of another, or find yourself in another file. So here are a couple of tricks to try when C-x C-x fails you:

Double Undo: Usually the place you want to get back to is the place you've just been typing in the current file. So an undo -- C-x u -- will snap the window back to that spot as it undoes your recent typing. But you didn't want to undo that, so you have to "undo the undo". Successive C-x u commands just undo more and more, so first break the cycle by issuing some innocuous command -- I like to use C-n (next-line) -- followed by a second C-x u. In case that's confusing, the sequence is: C-x u C-n C-x u. Of course, that only works if you didn't change files while you were off searching.

Pop Global Mark: The command pop-global-mark -- C-x C-SPC -- is a power tool to center the window at a recent mark, no matter what file it was in. This is especially handy if your searching and grepping takes you across several files, or if you can't remember which file you were working in. Just keep popping marks until the file and location you want shows up. This command sometimes fails you, if the place you're trying to return to never had a mark set because you never started a search from there. If you know you're about to embark on a scavenger hunt and will want to return, set the mark explicitly -- C-SPC -- before you begin.

Oh yeah, don't forget point-to-register!

Wednesday, January 21, 2009

Emacs Repeat Command

One of the handiest commands in emacs is repeat-complex-command (shortcut: M-x ESC ESC). It opens in the minibuffer the lisp formulation of the last command you ran, for example:


Redo: (query-replace "j" "k" nil nil nil)

That way, if you just want to redo that thing you did, hit return. If you want to slightly modify it -- change "k" to "m", for example -- just edit the command before pressing return. Not very interesting in the "k" to "m" case, but if you run query-replace-regexp and then realize you got one tiny thing wrong in a huge regular expression, well, you can see that would be useful.

Sometimes the thing that comes up in the minibuffer isn't something worth repeating:


Redo: (find-file-other-window "~/.emacs" 1)

In that case, just use the minibuffer history (M-p, M-n) to get to the command you're looking for.

Gotcha: I scratched my head for a few minutes today wondering why nothing happened when I re-ran a replace-string. If your replace command is restricted to a region, the bounds of that region are present in the lisp command. So when you try to redo it, it redoes it for your old region -- not the current one. Best case for that is a no-op; worst case is changing a region you didn't want to. Be careful when redoing any command that might apply only to the region.