Friday, 11 September 2015

Law of Interface Responsiveness

Changes in response time between 0 and 1 seconds don't matter, because they're too small to notice.

Changes in response time between 1 and 10 seconds matter a lot, because the user will get bored waiting for it to finish.

Changes in response times above 10 seconds again don't matter, because the user is less likely to be sitting there waiting for it to finish, and will have gone and done something else.

Thursday, 3 September 2015

Mental arithmetic tips

For a sum like 8 * 26, imagine 26 -> 16 48, and when you have that clear in your head, smash them together to get 208.

To factorize a number less than a 1000, you only need to test divisibility by primes up to 31. There aren't that many of them: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 and 31.

To test for divisibility by 2, check if the last digit is divisible by 2. To check 3, check if the sum of the digits is divisible by 3. To check 5, check if the last digit is 5 or 0. To check 11, add and subtract alternative digits, for example 374 = 11 * 34, and 3 + 4 - 7 = 0.

The others are not so easy. For some of them, think of subtracting a multiple of the prime, for example

301 = 280 + 21 = 7 * 40 + 7 * 3 = 7 * 43.

Test the lower primes first, because they are more likely to succeed.

For the larger primes, you can memorize the composite numbers they are involved in:

23 * 23 = 529
23 * 29 = 667
23 * 31 = 713
29 * 29 = 841
29 * 31 = 899
31 * 31 = 961

Wednesday, 19 August 2015

TeX - the worst programming language ever

Here are some observations I've had on programming TeX, or as I've taken to calling it recently, "Knuth's hunk a' junk".

You can't tell how many arguments a function will take by looking at its definition, because even if it takes no arguments, it can call another function that does take arguments.

All lines need to be ended with comment characters, except the ones that don't. A stray line with no comment at the end can cause an extra blank line to appear in the output, or break completely.

You can give numbers as a series of digits, which is what you expect. This will work most of the time, but occasionally a macro call immediately following the number will be expanded sooner than you might think. This can cause problems, for example in \ifnum\number=0\noexpand\foo\fi. If \number is equal to 0, you would expect this to result in "\noexpand\foo", but the \noexpand disappears, and you only get "\foo".

Likewise if you did \ifnum\number=1\twentythree\fi, and \twentythree was "23", this doesn't check if \number is 1: no, it actually checks if \number is 123. Not kidding. The solution is to always follow a number with a space, i.e. "\ifnum\number=1 \twentythree\fi".

Execution is split into multiple stages, including "expansion" and "execution". If you expand without executing, you can mess up the syntax for the execution. The error messages you get when the execution happens don't tell you why it's messed up: it just throws the whole dog's breakfast in your face.

This can happen when you define a macro that defines another macro, and then you use the first macro in the definition of a third macro whose body is expanded at the time of definition. Hence a macro doesn't stand alone - you have to account for the context in which it used.

TeX is a weak language, and sometimes you have even less of it. For example, suppose you are writing out to an auxiliary file, and you want to expand some control sequences, removing spaces following them. You think you're nearly a TeX wizard, and know about \futurelet, so you think, a-ha, I'll do a \futurelet at the end of this control sequence, check if the next character is a space, and if so, remove it.

This fails because \futurelet belongs, not to the expansion phase, but the execution phase. (Learn that by heart.) What you have at this stage are string concatenation, and string splitting. It's possible to remove a space even with these two. I'm not sure of the details, but it's something like this:

* Get the next character (splitting). Call it X.
* Append a space, followed by a marker character ("@") (concatenation). String now looks like "X @".
* Split the string at the first space. If X was not a space, this gives us X, if X was a space, this gives you an empty string. Also split the rest of the input at the marker character, and discard before then.

This doesn't work, however, because the first step doesn't work. You can't get the next character if it's a space, you get the first non-space instead.

And after I wrote that I discovered the existence of \ignorespaces, so none of that should be necessary. This allows me to make another point, which is that it's hard to learn TeX, because you can't tell which control sequences are primitives, and which are user-defined, just by looking at it. This wouldn't be so bad, except there are hundreds of primitives, not even including the primitives that plain Tex adds. I went looking for a definition of \ignorespaces in the file I saw it in and couldn't find one.

Error messages bear no relationship to the actual problem that caused them. You have to learn by experience what error message means what: for example "missing control sequence inserted (\inaccessible)" actually means: you tried to give a character an active definition when the character was not active (at the time of definition).

When you don't recognize the problem, you have to stare at the log files for half-an-hour or so, avoiding conscious thought, before you understand it.

Your input goes through a stage of interpretation called the "cat codes". (They're called that because TeX is personified by a lion.) You can tell TeX to make changes to the cat codes, but you also have to remember the cat codes from before the changes, in case something that was said in the past comes up again.

Your code is read from left to right, strictly. (Anyone who's learned about lambda calculus will see a similarity.) It's possible to expand the token after next with a construct like "\expandafter\next\token". But if you want to expand the token after that first, you have to do


To expand three tokens in advance and then two tokens in advance and then one in advance, it's:


Each time you need to expand a token coming from further down the line, the number of \expandafter's you need doubles. This is quite possible in practice, for example to expand a token inside a macro definition you need to jump over at least "\def", the name of the macro being defined, and an open brace ("{").

I keep on thinking that it must get easier at some point.

Monday, 10 August 2015


A compiler is

Happiest when compiling

Another compiler

How many cross-compilers could a cross-compiler compile if a cross-compiler could compile cross-compilers?

Monday, 9 March 2015

Vim tips

The vim text editor has many features, and I have often felt that my use of it is suboptimal. Here are some features I found out about that are useful which I wish I knew about earlier.

The ":scriptnames" command. This shows a list of the scripts that vim read when starting up. Useful for debugging vim configurations.

To show where a particular vim variable got its value, use ":verb set". This is useful when a vim setting fails to have to value you thought you gave it, because it has been overridden somewhere else.

Sometimes when wanting to open a new line above the one I was on, I would type ESC O, and this would break in a confusing way when I tried typing the contents of the new line. I didn't even know how I got into this state when it happened. Eventually, I realised this could be fixed by using "set ttm=20" in my "vimrc" file. (the ttm setting is also known as ttimeoutlen).

The "a" flag of "formatoptions" is very useful for automatically formatting text as you type it. Turn off for filetypes where newlines are important, for example with "autocmd FileType sh setl fo-=a" for shell scripts.

The "\zs" and "\ze" sequences in regexps are very useful for inserting text before or after certain places. For example, ":%s/\zebanana/the /g" inserts the text "the " before any occurrence of the text "banana". Without this feature you would have to type "%s/\(banana\)/the \1/g", which is harder.

"gv" to reselect the last visual selection - for example, to perform a search and replace on it, or a shift.

Remapping the Caps Lock key to Esc is very helpful - before this I was starting to get a sore little finger by constantly pressing "Control-C".

I have the following to turn off "comment leader" insertion, (like "*" for "/* ... */" comments in C), which I found very difficult to turn off and keep off.
    set fo-=c
    set fo-=r
    set fo-=o

    " Disable "comment leader" garbage
    set com=

    I have the following to disable the gaudy bright yellow search matching:

    set hlsearch
    set hl=ls

    I find the following useful for reformatting lines:

    map <CR> i<CR><Esc>

    This is useful for editing numbers leading zeroes in dates:
    set nrformats-=octal

    Wednesday, 22 October 2014

    Port of Last Eichhof game to Allegro library

    There was a game called "Last Eichhof" released in 1993 for MS-DOS. The source code was released by the original coder (Danny Schoch) - I don't know exactly when - and is available from various places on the Internet.

    Earlier this year I ported this code to the Allegro games programming library ( The source is available from here. The ported version smooths the motion of sprites on the screen and makes the controls more responsive (so the minimum amount you can move your ship with a tap of a key is smaller).

    There were several points of interest. The assembly routines had to be rewritten completely, of course, which included the main game loop and the graphics routines. I had to learn about the parts of the VGA programming interface that the original used - but I feel I only scratched the surface of it, as it is quite horribly complicated. I managed to avoid having to read much about Sound Blaster programming - the only complication was how to decode the sound samples in the data files, which were in something called 8-to-4 bit ADPCM. Fortunately I found an algorithm on a website for decoding it.

    Another unusual feature of the source was the format of its data files, and how they were read. There are lines like this in the original source code (from GAMEPLAY.C):

    // Load weapon sprite library.
       ptr = loadfile(datapool, "weapons.sli");
       weapon.nsprites = *ptr;
       weapon.sprite = (struct sTableEntry *)(ptr+1);
       for (i = 0; i < weapon.nsprites; i++) {
          (long)weapon.sprite[i].sprite += (long)weapon.sprite;

    The call to 'loadfile' gives a pointer to the data. This starts off as a number of sprites, followed by offsets into the data, followed by the data itself. 'weapon.sprite' is set to point directly into this data area. This is an array of 'struct sTableEntry', with the exception that the 'sprite' fields of each 'struct sTableEntry' are a kind of offset, and not pointers. The loop converts these offsets into true pointers (of type 'struct sprstrc * far'), each a pointer to a structure with information about the sprite in it.

    There are a few tricky points here. The first is that the compiler has to give 'struct sTableEntry' exactly the right layout for this to work, with the right padding, widths and endianness. This kind of "deserialization" is not portable between compilers or architectures.

    Converting the "long" fields into far pointers relies on a "long" type being the same width as a far pointer. (Note C programming for MS-DOS had two kinds of pointer, called "near" and "far". The "far" pointer was 32 bits and could access a wider address space than the 16-bit near pointer.) Note that the "offsets" are not simple numbers giving the number of bytes into the data area: they are a very strange type of value, being the numeric difference between two far pointers considered as numbers. This would appear very dodgy - whether we get a pointer to the right place could depend on the address of 'weapon.sprite'.

    These lines took me some time to understand what is going on, considering that a cast on the left-hand side of an assignment statement is not supported by modern C compilers. I believe it means:

    weapon.sprite[i].sprite = (struct sprstrc * far) ((long) weapon.sprite + (long) weapon.sprite[i].sprite);
    When I was playing this game under MS-DOS, it took me a long time to beat, and I only beat it once. I have beaten the original once or twice in DosBox, after practising on the Allegro version. I think the Allegro version is slightly easier because of the motion smoothing making it easier to see where enemies are moving on the screen, as well as the more responsive controls.

    Thursday, 31 July 2014

    Notes on compatability

    There are two types of compatibility in software: new versions of programs reading old data (backward compatibility), and old versions of program reading new data (forward compatibility).

    Backward compatibility is the easiest to achieve, as it does not require predicting the future. How can we ensure forward compatibility then? By imagining reasonable ways to extend the data format and making sure that the program works with these - e.g., doesn't break on unknown options. One suggestion has been to use a version number in your data format. However, I read that this failed for MIME, because no-one knew what the correct behaviour was if they got a different version number.

    A related problem is interoperability. This refers to reading data produced by a program whose creation or maintenance is buereaucratically or socially separated from your control, and producing data to be read by this program.

    How can we ensure interoperability? One way is that if you are inventing a file format or network protocol, make sure you define it thoroughly. Otherwise, other people will implement it all in slightly different, incompatible ways, and it will be impossible to be compatible with them all. (I heard that CSV was an example of this.) A similar way is make your data format as easy to parse as possible, otherwise other people may do it wrong. (See this page about different termcap versions for an example of a format with varying implementations.)

    One way to encourage interoperability and backward compatability is intolerance of malformed syntax. This is a herbicide on the proliferation of degenerate data and means that new implementations do not have the burden of supporting degenerate data that people are relying on. How much you can get away with this depends on the share of usage of your program.