1.0 Document id
1.1 General
Copyright © 1996-2008 Jari Aalto
License: This material may be distributed only subject to
the terms and conditions set forth in GNU General Public
License v2 or later; or, at your option, distributed under the
terms of GNU Free Documentation License version 1.2 or later
(GNU FDL).
1.2 T2html program features
Writing text documents is different from writing messages to
Usenet or to your fellow workers. There already exists several
tools to convert email messages into HTML, like MHonArc,
Email hyper archiver, but for regular text documents, like for
memos, FAQs, help pages and for other papers, there wasn't
any suitable HTML converter couple of years back. The author
wanted a simple HTML tool which would read pure plain
text documents, like guides, tips pages, documentation,
book mark pages etc. and convert them into HTML. Here you will
find the specification how to format your text documents for
Few arguments, why plain text is the best source document format:
- It is readable by all, without any extra software
- Deliverable by email, as is.
- Most easily kept in version control
- Most easily patched ( when someone sends a diff -u ...)
- Most easily handed to someone else when author no longer
maintain it. (If you have specialized tools, people
need to learn them in order to maintain your FAQ.)
1.2.1 Overview of features:
- Requires Perl 5.004 or never
- 500K text document takes 70 seconds to convert to HTML.
- TF to Perl POD conversion may be in a future plan.
- Better linking of multiple files planned
- Configuration file for individual file options planned.
1.2.2 HTML conversion
- minimal mark up: rendering is based on indentation level.
Written text document looks like a "Natural Document", and is
suitable for reading as such.
- Text layout with indentation rules is called Technical Format
(TF) and document must be formatted according to it before it
is suitable for HTML generation.
- Rules are simple: place heading to the left and text at column 8.
- Program generates META tags for search engines.
- Colored html page: <EM> <STRONG> <PRE> ...
- Hyperlinks and email addresses are automatically detected.
No mark up is needed.
1.2.3 HTML 4.01
- Make a single html (1 file) or Framed version (3 files)
- Sample CSS2 (Cascading Style Sheet) included in HTML code for
document rendering. User can import his own CSS2.
1.2.4 Link check for the text file
- You need LWP module in order to use this feature. (Comes with
latest Perl)
- Program has switches to run Link check on your text file
to find out any broken or moved link. Currently you
have to manually fix the links, nut an Emacs mode to do this
automatically is planned. The output from Link check is standard
grep style: *FILE:NBR:Error-Description*
1.2.5 Splitting the text file to pieces
- You can split very large document into pieces, e.g. according
to top level headings and convert each piece to HTML. This is
also handy if you're planning to print Slides for a class:
Split on Headings to individual files: raise the font point
and print each file separately.
1.3 How to convert text files into HTML?
The TF specification can be found from the Manual
The command used to generate this page was:
--author "Jari Aalto" \
--title "Conversion for text files" \
--html-body LANG=en \
--Out \
index.txt |
1.4 Writing a text document
You need nothing else but a text editor where the current column
number is displayed or editor can be configured to advance your
TAB by 4 spaces. That's it.
An Emacs minor mode (See package
tinytf.el) can
make the writing documents easy. The mode will help formatting
paragraphs, filling bullets numbering headings and keeping TOC
up to date.
1.5 Ripping program documentation
1.5.1 Documentation tools in programming languages
Perl is an exception within programmin languages, because it
includes internal documentation syntax called POD (Plain Old
Syntax), with which you can embed documentation right into the
program source. Deriving the documentation from perl programs
is a straightforward job. Another well known language
(invented long after Perl) is Java, which calls the embedded
documentation javadoc. fro all others, there is need to
write separate documentation.
1.5.2 Other programming languages
But it is possible to embed documentation inside any
programming language: directly into the code. A small Perl
utility can be used to extract the documentation provided it
was written in TF format. Documentation is put at the
beginning of the file and updated there. Program ripdoc.pl
extracts the documentation which follows TF guidelines. The
idea is that you can generate HTML documents from the embedded
'TF pod'. The conversion goes like this:
Suitable for awk, shell, sh, ksh, C++, Java, Lisp, python,
Tcl etc. programming languages. The only criteria is that the language
supports one-comment-starter and that the documentation has
been written by using it. Languages that have comment-start
and comment-end, like C that has /* and */, are not suitable for
ripdoc.pl.
2.0 Other converters
2.1 Postscript
2.2 Texinfo
- See page http://www.fido.de/kama/texinfo/texinfo-en.html
where you can find C-program html2texinfo program
- Perl program html2texi.pl
http://www.cs.washington.edu/homes/mernst/software/#html2texi
html2texi converts HTML documentation trees into Texinfo
format. Texinfo format can be easily converted to Info format
(for browsing in Emacs or the stand alone Info browser), to a
printed manual, or to HTML. Thus, html2texi.pl permits
conversion of HTML files to Info format, and secondarily
enables producing printed versions of Web page
hierarchies. Unlike HTML, Info format is searchable. Since Info
is integrated into Emacs, one can read documentation without
starting a separate Web browser. Additionally, Info browsers
(including Emacs) contain convenient features missing from Web
browsers, such as easy index lookup and mouse-free browsing.
2.3 Other text to HTML tools
2.4 Other Utilities
- DocBook - SGML online book
- Texi2html
Perl script.
- HTML tidy
remove extra markup.
- FTL
Latex like Perl formatting
- Hyperlatex
"Hyperlatex is a package that allows you to prepare documents
in HTML, and, at the same time, to produce a neatly printed
document from your input. Unlike some other systems that you
may have seen, Hyperlatex is not a general LaTeX-to-HTML
converter. In my eyes, conversion is not a solution to HTML
authoring. A well written HTML document must differ from a
printed copy in a number of rather subtle ways. I doubt that
these differences can be recognized mechanically, and I
believe that converted LaTeX can never be as readable as a
document written in HTML. The basic idea of Hyperlatex is to
make it possible to write a document that will look like a
flawless LaTeX document when printed and like a handwritten
HTML document when viewed with an HTML browser."
- html2texi
"html2texi converts HTML documentation trees into Texinfo format.
Texinfo format can be easily converted to Info format (for browsing
in Emacs or the stand alone Info browser), to a printed manual, or
to HTML. Thus, html2texi.pl permits conversion of HTML files to
Info format, and secondarily enables producing printed versions of
Web page hierarchies. Unlike HTML, Info format is searchable. Since
Info is integrated into Emacs, one can read documentation without
starting a separate Web browser. Additionally, Info browsers
(including Emacs) contain convenient features missing from Web
browsers, such as easy index lookup and mouse-free browsing."
- RTF in PC
- catdoc
Viewing MS WORD files.
Catdoc is simple, one C source file, compiles in any system (DOS;
Unix). Feed MS word file to it and it gives 7bit text out of it.
- word2x
Viewing MS WORD files.
- MSWordView
"MSWordView is a program that can understand the microsofts word
8 binary file format (office97), it currently converts word into
html, which can then be read with a browser."
- Laola
Viewing MS WORD files.
"Laola(perl) does a respectable job of taking MSWord files to text
...LAOLA is giving access to the raw document streams of any program
using "structured storage" technology to save its documents.
ELSER is dealing especially with these streams as they are present
in Word 6 and Word 7 documents."
2.5 General Document Maintenance tools