Wednesday 19 August 2015

TeX - the worst programming language ever

Here are some observations I've had on programming TeX, or as I've taken to calling it recently, "Knuth's hunk a' junk".

You can't tell how many arguments a function will take by looking at its definition, because even if it takes no arguments, it can call another function that does take arguments.

All lines need to be ended with comment characters, except the ones that don't. A stray line with no comment at the end can cause an extra blank line to appear in the output, or break completely.

You can give numbers as a series of digits, which is what you expect. This will work most of the time, but occasionally a macro call immediately following the number will be expanded sooner than you might think. This can cause problems, for example in \ifnum\number=0\noexpand\foo\fi. If \number is equal to 0, you would expect this to result in "\noexpand\foo", but the \noexpand disappears, and you only get "\foo".

Likewise if you did \ifnum\number=1\twentythree\fi, and \twentythree was "23", this doesn't check if \number is 1: no, it actually checks if \number is 123. Not kidding. The solution is to always follow a number with a space, i.e. "\ifnum\number=1 \twentythree\fi".

Execution is split into multiple stages, including "expansion" and "execution". If you expand without executing, you can mess up the syntax for the execution. The error messages you get when the execution happens don't tell you why it's messed up: it just throws the whole dog's breakfast in your face.

This can happen when you define a macro that defines another macro, and then you use the first macro in the definition of a third macro whose body is expanded at the time of definition. Hence a macro doesn't stand alone - you have to account for the context in which it used.

TeX is a weak language, and sometimes you have even less of it. For example, suppose you are writing out to an auxiliary file, and you want to expand some control sequences, removing spaces following them. You think you're nearly a TeX wizard, and know about \futurelet, so you think, a-ha, I'll do a \futurelet at the end of this control sequence, check if the next character is a space, and if so, remove it.

This fails because \futurelet belongs, not to the expansion phase, but the execution phase. (Learn that by heart.) What you have at this stage are string concatenation, and string splitting. It's possible to remove a space even with these two. I'm not sure of the details, but it's something like this:

* Get the next character (splitting). Call it X.
* Append a space, followed by a marker character ("@") (concatenation). String now looks like "X @".
* Split the string at the first space. If X was not a space, this gives us X, if X was a space, this gives you an empty string. Also split the rest of the input at the marker character, and discard before then.

This doesn't work, however, because the first step doesn't work. You can't get the next character if it's a space, you get the first non-space instead.

And after I wrote that I discovered the existence of \ignorespaces, so none of that should be necessary. This allows me to make another point, which is that it's hard to learn TeX, because you can't tell which control sequences are primitives, and which are user-defined, just by looking at it. This wouldn't be so bad, except there are hundreds of primitives, not even including the primitives that plain Tex adds. I went looking for a definition of \ignorespaces in the file I saw it in and couldn't find one.

Error messages bear no relationship to the actual problem that caused them. You have to learn by experience what error message means what: for example "missing control sequence inserted (\inaccessible)" actually means: you tried to give a character an active definition when the character was not active (at the time of definition).

When you don't recognize the problem, you have to stare at the log files for half-an-hour or so, avoiding conscious thought, before you understand it.

Your input goes through a stage of interpretation called the "cat codes". (They're called that because TeX is personified by a lion.) You can tell TeX to make changes to the cat codes, but you also have to remember the cat codes from before the changes, in case something that was said in the past comes up again.

Your code is read from left to right, strictly. (Anyone who's learned about lambda calculus will see a similarity.) It's possible to expand the token after next with a construct like "\expandafter\next\token". But if you want to expand the token after that first, you have to do

"\expandafter\expandafter\expandafter\next\expandafter\token1\token2".

To expand three tokens in advance and then two tokens in advance and then one in advance, it's:

"\expandafter\expandafter\expandafter\expandafter\expandafter\expandafter\expandater\next\expandafter\expandafter\expandafter\token1\expandafter\token2\token3".

Each time you need to expand a token coming from further down the line, the number of \expandafter's you need doubles. This is quite possible in practice, for example to expand a token inside a macro definition you need to jump over at least "\def", the name of the macro being defined, and an open brace ("{").

I keep on thinking that it must get easier at some point.

Monday 10 August 2015

Haiku

A compiler is

Happiest when compiling

Another compiler


How many cross-compilers could a cross-compiler compile if a cross-compiler could compile cross-compilers?