TXR: an Original, New
Programming Language for
Convenient Data Munging

Kaz Kylheku <kaz@kylheku.com>

Quick Links

What is it?

TXR is a pragmatic, convenient tool ready to take on your daily hacking challenges with its dual personality: its whole-document pattern matching and extraction language for scraping information from arbitrary text sources, and its powerful data-processing language to slice through problems like a hot knife through butter. Many tasks can be accomplished with TXR "one liners" directly from your system prompt. TXR is relatively new: the project started in 2009.

It is difficult to give a small introduction to TXR because it is no longer a small language. The PDF rendition of the reference manual, which takes the form of a large Unix man page, is 778 881 900 951 pages long, excluding any index or table of contents. There are many ways to solve a given data processing problem with TXR: many skills and techniques can be used.

Random Testimonials

"The best lisp for text processing is TXR Lisp. [...] But [it is] a rabbit hole that goes so deep, you'd better have nothing but free time if you want to grok it all"
(anonymous comment on 4Chan)

"So far, txr-lisp felt most ergonomic to me when I tested various languages for their suitability to implement MAL in"
(comment by user wasamasa in a #Lisp IRC channel)

"My bank statements are pdf's which can be converted to ascii using pdftotext, however this destroys the structure of the documents which makes extracting data using regexps (even pcre's) very difficult, but much easier using txr."
(comment by user vcdimension on HackerNews)

Small, efficient; low dependencies

TXR Lisp supports compilation to code for a register-based virtual machine code. Individual functions can be compiled as well as files (.tl to .tlo). Compiled files may be catenated together to load. Individual compiled files, as well as catenated files, may be compressed with gzip.

Though not native, the compiler is optimizing: it performs optimizations like jump threading, dead code elimination, constant folding, and data flow optimizations. The compiler is actively being improved.

Application deployment is possible. The save-exe function creates a copy of the TXR executable, under a name of your choosing, and containing an expression that is executed at startup, typically used to load the rest of the application relative to the same directory. The executable just has to be accompanied by all the needed library modules; the details are in the reference manual.

TXR is light in terms of resources. The executable is some 1.7 megabytes of code (a little more than twice the size of GNU Awk), and the satellite library modules add up to another megabyte and a half. It has no external dependencies other than libffi. The entire project is easy to build; it just requires GNU Make and GCC or Clang, and a few shell utilities needed by the configure script. There are some generated sources, which are shipped and so the tools for them are not required.

TXR has a small memory footprint. When the executable is started up to its interactive prompt, the memory use is similar to GNU Bash. When compiling its standard library of TXR Lisp code, including complex files such as the compiler, TXR requires a peak of only around 18 megabytes.

Innovative, but with traditional roots

TXR is a fusion of many different ideas, a few of which are original, and it is influenced by many languages, such as Common Lisp, Scheme, Awk, M4, POSIX Shell, Prolog, Ruby, Python, Arc, Clojure, S-Lang and others.

TXR consists of two languages, which can be used separately or tangled together: the TXR Pattern Language, and TXR Lisp.

A comparison may be drawn between the TXR Pattern Language and the Unix utility Awk. Both provide an implicit, convenient way of scanning input. Whereas Awk implicitly reads a file, breaking it into records and fields which are accessible as positional variables, TXR has quite a different way of making input handling implicit: namely via a nested, recursive pattern matching notation which binds variables. This approach still handles delimited fields with relative convenience, but generalizes into handling messy, loosely structured data, or data which exhibits different regularities in different sections, etc. Constructs in TXR (the pattern language) aren't imperative statements, but rather pattern-matching directives: each construct terminates by matching, failing, or throwing an exception. Searching and backtracking behaviors are implicit. It has features like structured named blocks with nonlocal exits, structured exception handling, named pattern matching functions, and numerous other features.  TXR's pattern language is powerful enough to parse grammars, yet simple to use in an ad-hoc way on trivial tasks. Speaking of Awk, TXR in fact contains an implementation of Awk, in the form of a Lisp macro, which brings us to the next topic.

The other language in TXR is TXR Lisp. This is not an implementation of an existing Common Lisp or Scheme, but a new dialect, which contains many new ideas. TXR Lisp is feature-rich, and oriented toward succinct, convenient expressiveness. While staying completely true to the Lisp heritage, it takes cues from new scripting and functional languages.

Users of mainstream Lisp will find that skills transfer well to and from TXR. There will be features users will miss from TXR Lisp when using others Lisps, and likely vice versa.

The TXR project values brevity in programs: programs should be short and clear. If you're struggling in coming up with a nice solution to a problem, the TXR project wants to hear from you; give a shout to the mailing list. If a program is significantly clearer and shorter in another language, that is of interest to the project; TXR may be able to absorb the technique or something equivalent.

Help Needed

The TXR project is looking for hackers to develop features

TXR has clean, easy to understand and maintain internals that are a pleasure to work with. Be sure to read the HACKING guide.

Examples

Here is a collection of TXR Solutions to a number of problems from Rosetta Code.

Make a Donation

TXR is truly free software because it is distributed under the two-clause BSD license which allows every conceivable use, commercial and non-commercial.

If you find TXR to be a valuable tool in your arsenal, here is one way to show your appreciation and support! Developing stuff like this takes countless hours.

Warning: Regarding Homebrew

WARNING: Do not use TXR packaged by Homebrew! The TXR project has received reports that Homebrew packages of TXR contain an unstable, unreliable executable. There are reports from users that they are not even able to build the Homebrew TXR package themselves (in order to investigate into the problem). The Homebrew build formula txr.rb does not run make tests, so when the build does succeed, it is not verified to be good. It is suspected that Homebrew may be using compiler code generation features that cause instability.