## Wednesday, 19 August 2015

### TeX - the worst programming language ever

Here are some observations I've had on programming TeX, or as I've taken to calling it recently, "Knuth's hunk a' junk".

You can't tell how many arguments a function will take by looking at its definition, because even if it takes no arguments, it can call another function that does take arguments.

All lines need to be ended with comment characters, except the ones that don't. A stray line with no comment at the end can cause an extra blank line to appear in the output, or break completely.

You can give numbers as a series of digits, which is what you expect. This will work most of the time, but occasionally a macro call immediately following the number will be expanded sooner than you might think. This can cause problems, for example in \ifnum\number=0\noexpand\foo\fi. If \number is equal to 0, you would expect this to result in "\noexpand\foo", but the \noexpand disappears, and you only get "\foo".

Likewise if you did \ifnum\number=1\twentythree\fi, and \twentythree was "23", this doesn't check if \number is 1: no, it actually checks if \number is 123. Not kidding. The solution is to always follow a number with a space, i.e. "\ifnum\number=1 \twentythree\fi".

Execution is split into multiple stages, including "expansion" and "execution". If you expand without executing, you can mess up the syntax for the execution. The error messages you get when the execution happens don't tell you why it's messed up: it just throws the whole dog's breakfast in your face.

This can happen when you define a macro that defines another macro, and then you use the first macro in the definition of a third macro whose body is expanded at the time of definition. Hence a macro doesn't stand alone - you have to account for the context in which it used.

TeX is a weak language, and sometimes you have even less of it. For example, suppose you are writing out to an auxiliary file, and you want to expand some control sequences, removing spaces following them. You think you're nearly a TeX wizard, and know about \futurelet, so you think, a-ha, I'll do a \futurelet at the end of this control sequence, check if the next character is a space, and if so, remove it.

This fails because \futurelet belongs, not to the expansion phase, but the execution phase. (Learn that by heart.) What you have at this stage are string concatenation, and string splitting. It's possible to remove a space even with these two. I'm not sure of the details, but it's something like this:

* Get the next character (splitting). Call it X.
* Append a space, followed by a marker character ("@") (concatenation). String now looks like "X @".
* Split the string at the first space. If X was not a space, this gives us X, if X was a space, this gives you an empty string. Also split the rest of the input at the marker character, and discard before then.

This doesn't work, however, because the first step doesn't work. You can't get the next character if it's a space, you get the first non-space instead.

And after I wrote that I discovered the existence of \ignorespaces, so none of that should be necessary. This allows me to make another point, which is that it's hard to learn TeX, because you can't tell which control sequences are primitives, and which are user-defined, just by looking at it. This wouldn't be so bad, except there are hundreds of primitives, not even including the primitives that plain Tex adds. I went looking for a definition of \ignorespaces in the file I saw it in and couldn't find one.

Error messages bear no relationship to the actual problem that caused them. You have to learn by experience what error message means what: for example "missing control sequence inserted (\inaccessible)" actually means: you tried to give a character an active definition when the character was not active (at the time of definition).

When you don't recognize the problem, you have to stare at the log files for half-an-hour or so, avoiding conscious thought, before you understand it.

Your input goes through a stage of interpretation called the "cat codes". (They're called that because TeX is personified by a lion.) You can tell TeX to make changes to the cat codes, but you also have to remember the cat codes from before the changes, in case something that was said in the past comes up again.

Your code is read from left to right, strictly. (Anyone who's learned about lambda calculus will see a similarity.) It's possible to expand the token after next with a construct like "\expandafter\next\token". But if you want to expand the token after that first, you have to do

"\expandafter\expandafter\expandafter\next\expandafter\token1\token2".

To expand three tokens in advance and then two tokens in advance and then one in advance, it's:

"\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter\expandater\next\expandafter\expandafter\expandafter\token1\expandafter\token2\token3".

Each time you need to expand a token coming from further down the line, the number of \expandafter's you need doubles. This is quite possible in practice, for example to expand a token inside a macro definition you need to jump over at least "\def", the name of the macro being defined, and an open brace ("{").

I keep on thinking that it must get easier at some point.

This post is a work in progress.

## Monday, 10 August 2015

### Haiku

A compiler is

Happiest when compiling

Another compiler

How many cross-compilers could a cross-compiler compile if a cross-compiler could compile cross-compilers?

## Monday, 9 March 2015

### Vim tips

The vim text editor has many features, and I have often felt that my use of it is suboptimal. Here are some features I found out about that are useful which I wish I knew about earlier.

The ":scriptnames" command. This shows a list of the scripts that vim read when starting up. Useful for debugging vim configurations.

To show where a particular vim variable got its value, use ":verb set". This is useful when a vim setting fails to have to value you thought you gave it, because it has been overridden somewhere else.

Sometimes when wanting to open a new line above the one I was on, I would type ESC O, and this would break in a confusing way when I tried typing the contents of the new line. I didn't even know how I got into this state when it happened. Eventually, I realised this could be fixed by using "set ttm=20" in my "vimrc" file. (the ttm setting is also known as ttimeoutlen).

The "a" flag of "formatoptions" is very useful for automatically formatting text as you type it. Turn off for filetypes where newlines are important, for example with "autocmd FileType sh setl fo-=a" for shell scripts.

The "\zs" and "\ze" sequences in regexps are very useful for inserting text before or after certain places. For example, ":%s/\zebanana/the /g" inserts the text "the " before any occurrence of the text "banana". Without this feature you would have to type "%s/$$banana$$/the \1/g", which is harder.

"gv" to reselect the last visual selection - for example, to perform a search and replace on it, or a shift.

Remapping the Caps Lock key to Esc is very helpful - before this I was starting to get a sore little finger by constantly pressing "Control-C".

I have the following to turn off "comment leader" insertion, (like "*" for "/* ... */" comments in C), which I found very difficult to turn off and keep off.
set fo-=c
set fo-=r
set fo-=o

set com=

I have the following to disable the gaudy bright yellow search matching:

set hlsearch
set hl=ls

I find the following useful for reformatting lines:

map <CR> i<CR><Esc>

This is useful for editing numbers leading zeroes in dates:
set nrformats-=octal

## Wednesday, 22 October 2014

### Port of Last Eichhof game to Allegro library

There was a game called "Last Eichhof" released in 1993 for MS-DOS. The source code was released by the original coder (Danny Schoch) - I don't know exactly when - and is available from various places on the Internet.

Earlier this year I ported this code to the Allegro games programming library (http://alleg.sourceforge.net/). The source is available from here. The ported version smooths the motion of sprites on the screen and makes the controls more responsive (so the minimum amount you can move your ship with a tap of a key is smaller).

There were several points of interest. The assembly routines had to be rewritten completely, of course, which included the main game loop and the graphics routines. I had to learn about the parts of the VGA programming interface that the original used - but I feel I only scratched the surface of it, as it is quite horribly complicated. I managed to avoid having to read much about Sound Blaster programming - the only complication was how to decode the sound samples in the data files, which were in something called 8-to-4 bit ADPCM. Fortunately I found an algorithm on a website for decoding it.

Another unusual feature of the source was the format of its data files, and how they were read. There are lines like this in the original source code (from GAMEPLAY.C):

weapon.nsprites = *ptr;
weapon.sprite = (struct sTableEntry *)(ptr+1);
for (i = 0; i < weapon.nsprites; i++) {
(long)weapon.sprite[i].sprite += (long)weapon.sprite;
}

The call to 'loadfile' gives a pointer to the data. This starts off as a number of sprites, followed by offsets into the data, followed by the data itself. 'weapon.sprite' is set to point directly into this data area. This is an array of 'struct sTableEntry', with the exception that the 'sprite' fields of each 'struct sTableEntry' are a kind of offset, and not pointers. The loop converts these offsets into true pointers (of type 'struct sprstrc * far'), each a pointer to a structure with information about the sprite in it.

There are a few tricky points here. The first is that the compiler has to give 'struct sTableEntry' exactly the right layout for this to work, with the right padding, widths and endianness. This kind of "deserialization" is not portable between compilers or architectures.

Converting the "long" fields into far pointers relies on a "long" type being the same width as a far pointer. (Note C programming for MS-DOS had two kinds of pointer, called "near" and "far". The "far" pointer was 32 bits and could access a wider address space than the 16-bit near pointer.) Note that the "offsets" are not simple numbers giving the number of bytes into the data area: they are a very strange type of value, being the numeric difference between two far pointers considered as numbers. This would appear very dodgy - whether we get a pointer to the right place could depend on the address of 'weapon.sprite'.

These lines took me some time to understand what is going on, considering that a cast on the left-hand side of an assignment statement is not supported by modern C compilers. I believe it means:

weapon.sprite[i].sprite = (struct sprstrc * far) ((long) weapon.sprite + (long) weapon.sprite[i].sprite);
When I was playing this game under MS-DOS, it took me a long time to beat, and I only beat it once. I have beaten the original once or twice in DosBox, after practising on the Allegro version. I think the Allegro version is slightly easier because of the motion smoothing making it easier to see where enemies are moving on the screen, as well as the more responsive controls.

## Thursday, 31 July 2014

### Notes on compatability

There are two types of compatibility in software: new versions of programs reading old data (backward compatibility), and old versions of program reading new data (forward compatibility).

Backward compatibility is the easiest to achieve, as it does not require predicting the future. How can we ensure forward compatibility then? By imagining reasonable ways to extend the data format and making sure that the program works with these - e.g., doesn't break on unknown options. One suggestion has been to use a version number in your data format. However, I read that this failed for MIME, because no-one knew what the correct behaviour was if they got a different version number.

A related problem is interoperability. This refers to reading data produced by a program whose creation or maintenance is buereaucratically or socially separated from your control, and producing data to be read by this program.

How can we ensure interoperability? One way is that if you are inventing a file format or network protocol, make sure you define it thoroughly. Otherwise, other people will implement it all in slightly different, incompatible ways, and it will be impossible to be compatible with them all. (I heard that CSV was an example of this.) A similar way is make your data format as easy to parse as possible, otherwise other people may do it wrong. (See this page about different termcap versions for an example of a format with varying implementations.)

One way to encourage interoperability and backward compatability is intolerance of malformed syntax. This is a herbicide on the proliferation of degenerate data and means that new implementations do not have the burden of supporting degenerate data that people are relying on. How much you can get away with this depends on the share of usage of your program.

## Saturday, 22 March 2014

### Principles of GUI's

I had a few thoughts a while back about GUI's like Microsoft Windows and what principles they should follow.

* Ease of switching between applications
* Each user "command" should only have one interpretation and be read by only one program. For example, if you are scrolling a page by clicking the middle mouse button, another application shouldn't use that mouse click to do something else.
* Lack of excess flexibility and useless information. For example, you do not need to be able to position desktop or folder icons at arbitrary pixel locations. This could apply to the window paradigm itself - moving non-maximized windows around is not that fun. Another example is the dotted outline around the active widget - could be an icon, or a command button. Only used for keyboard input and worries the user whether they will accidentally activate a command.
* Uninterruptability - I.e. one program should not be able to become more prominent, taking up the screen and redirecting input commands to itself (maybe interrupting a sequence of commands intended to be received by another program/interface). An example is "splash screens" for programs. Another is the Windows Start Menu - several times when Windows has been starting up the Start Menu has disappeared while I'm trying to use it.
* Non-persistence of "high-energy" states. For example, desktop icons selection. Desktop icons should only be highlighted when the user is immediately about to do something with them. Otherwise there is a constant worry that they will accidentally rename or delete something with a single mouse click or key press.
* Lack of pointless distractions ("Unused icons in your system tray have been hidden.").
* Appearance of expensiveness of operations - for example, opening a elementary GUI feature like a menu or clicking on a tab in a tabbed interface shouldn't perform expensive calculations, load other programs or load data from slow external storage or network connections.

Updated 3rd June 2014.

### Keyboard auto-repeat

I figure that keyboard auto-repeat isn't always desirable.

Take a web browser for example. You press and hold the down button, and the page scrolls down, and then stops, and then starts again and continues scrolling until you lift the button.

It should be more like a video game: start scrolling down, smoothly, as soon as the key is depressed, and cease scrolling as soon as it is lifted.

In X11, the server sends events to programs - KeyPress and KeyRelease. If autorepeat is on many of these events are generated for one (physical) keypress. You can turn it off with "xset -r". I'd like to try only turning it on for the arrow keys, but there isn't an easy way to do this - you have to turn on autorepeat, and then turn it off for all keys except the ones you want to leave it on for.