A. Example Programs
This section describes some of the demonstration programs which are
distributed with Libann. They do not form part of the library, but are
a useful source of reference to anyone wanting to implement similar
functions.
The examples are distributed in the `demos' directory of the
distribution. These examples are deliberately over simplified. In real
life examples, a better choice of feature vector and more elaborate
selection of training parameters would be required.
A.1 Natural Language Selection Using a Kohonen Network
The Kohonen network is useful when classifying data for which you do not
have prior knowledge of the classes or the distribution of features.
The wordFreq example program shows how a Kohonen network can
classify written texts according to language, based upon their word
frequencies.
In the directory `demos/data/texts' there several files downloaded from
http://promo.net/pg. These are text files written in
- English
- French
- German
- Spanish
- Latin
One obvoius way a Kohonen network might classify these, is
according to their language.
The output from the wordFreq program confirms this
expectation.
Note that all of the texts, regardless of their primary language
contain approximately 1800 words of copyright information written in
English. One of the advantages of neural network classifiers is their
tolerance to this sort of `noisy' data.
The first step in classifying the texts is to define some sort of
feature vector. In this case, the vector is the relative frequency of
words in the texts.
The program first examines all the texts and identifies the most common
words among them.
We have a priori knowledge of the languages used, and so for best
results, the feature vector would have at least as many elements as
there are classes (languages). However, the Kohonen network is used
most commonly where this information is not known. The program uses the
less than optimal vector size of 3.
Running the program displays the following output:
| Creating word vector
Using file /tmp/libann1.2.D007/demos/data/texts//1drll10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//81agt10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8bern11.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8cinq10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8cnci07.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8fau110.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8hrmr10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8trdi10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//alad10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//auglg10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//civil10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//lazae11.txt
The most common words are: the, de, und
.
.
.
|
The next thing the program does, is to take each file individually and
to calculate the occurance of each of the words `the', `de' and
`und'. The frequencies are normalised relative to the total number of
words in the text, otherwise the network would be variant to the length
of the text.
The program uses a C++ class called WordFreq which is inherited
from ann::ExtInput .
This makes it easy to create the feature vectors and to train the network.
| // Create frequency counts and put them into a set
typedef set<string>::const_iterator CI;
for (CI ci = files.begin(); ci != files.end() ; ci++) {
FreqCount fc(*ci,wv);
trainingData.insert(fc);
}
// Create the network
ann::Kohonen net(vectorSize,7);
// Train the network
net.train(trainingData,0.3,0.8,0.1,0.40);
|
After training, each feature vector is presented to the network. The
program creates a directory for each class it detects, and copies the
text into it.
| bash-2.05a$ ls 1*
1111111111111100111001111110111111111111011111010:
1drll10.txt alad10.txt civil10.txt
1111111111111101111011111111111110011111011111001:
auglg10.txt
1111111111111101111011111111111110011111011111011:
81agt10.txt 8bern11.txt 8fau110.txt 8hrmr10.txt
1111111111111111111010111100111110111111011111100:
8cinq10.txt
1111111111111111111010111100111111111111011111100:
8cnci07.txt 8trdi10.txt lazae11.txt
|
There are several interesting points about this result:
- The first directory contains only English texts.
- The second directory, contains a single file which has both Latin
and German text (and the English copyright information).
- The third directory contains only German texts.
- The network has been unable to clearly discriminate between French
and Spanish text. This
is a result of too few dimensions in the feature vector. The word `de'
is common in both languages, whereas `the' and `und' are not common in
either of them. The best it could do was to create two classes, one
containing both French and Spanish texts, the other containing only
French.
A.2 Character Recognition using a Multi-Layer Perceptron
Optical character recognition is a common application for neural
networks. This example program demonstrates how a multi-layer
perceptron can be used to recognise and classify printed characters.
In a character recognition application, we know what classes to expect
[a--z] and we can manually classify some of the samples. This
situation makes the problem suitable for supervised leaning using
multi-layer perceptron network.
The mlp-char program uses a multi-layer perceptron network to
classify bitmap glyphs. The glyphs concerned are in the directory
`demos/data/glyphs'. There are 6 instances of glyphs representing
the characters [a--e]. The mlp-char program uses a C++ class
called Glyph inherited from ann::ExtInput . A
Glyph is a feature vector of the same length as the number of
pixels in the bitmap. A black pixel is represented by 1 and a white
pixel by 0.
The first thing the program does therefore is to create a feature map
from the glyphs and their respective classes.
| // Populate the feature map from the files
ann::FeatureMap fm;
for (CI ci = de.begin() ; ci != de.end() ; ++ci ) {
const string filename(*ci);
// Reserve files with a 6 in them for recall
// Don't train with them
if ( "6.char" == filename.substr(len -6, len) )
continue;
// The classname is the first letter of the filename
const string className(filename.substr(0,1));
// Create the glyph and add it to the map
const Glyph g(filename);
fm.addFeature(className,g);
}
|
Note that one sample from each class is not put into the feature map and
will therefore not be used for training.
The glyphs happen to be of resolution 8x8 and therefore the feature
vector (and hence the input layer of the network) are of size 64.
There are 5 classes, which can be represented by a network output layer
of size 3. The next task therefore is to train the network.
| // Set up the network and train it.
const int outputSize=3;
const int inputSize=fm.featureSize();
ann::Mlp net(inputSize,outputSize,1,0);
net.train(fm);
|
Finally, we want to use the recall method to classify glyphs.
The program does this with a loop, recalling all glyphs (including those
used for training).
| // Recall all the glyphs
for (CI ci = de.begin() ; ci != de.end() ; ++ci ) {
const string pathname(*ci);
const Glyph g(pathname);
cout << pathname << " has class " << net.recall(g) << endl;
}
|
The following shows the results of running the program, filtering the
samples ending in `6.char' (the ones not used for training).
| bash-2.05a$ ./mlp-char ../data/glyphs/ | grep 6.char
../data/glyphs//a6.char has class a
../data/glyphs//b6.char has class b
../data/glyphs//c6.char has class c
../data/glyphs//d6.char has class d
../data/glyphs//e6.char has class e
bash-2.05a$
|
These happen to be all correctly classified.
Running the program again, this time without the filter, showed 2 samples
(out of a total of 30) incorrectly classified. The ratio of correctly
classified samples to the total number of samples (in our case 28/30 =
0.94) is called the precision of the classifier.
A precision of 95% is a reasonable figure for most applications.
Adjustment of the training parameters, and increasing the size of the
hidden layer can improve the precision.
A.3 Style Classification using a Multi-Layer Perceptron
This program is a slightly more ambitious application of a multi-layer
perceptron. It attempts to classify different types of document
according to their grammatical style. To do this, it uses the
style program published by the Free Software Foundation (http://www.gnu.org/software/diction/diction.html). This program takes
a English text file and produces 9 different metrics about the author's
grammatical style.
Our program takes a set of files, runs the style program over
each of them, postprocesses the output and then reads that output to
create objects of class DictStyle which is inhereted from
ann::ExtInput .
Each DictStyle is then entered into a FeatureMap as
before.
In this program, the first part of the filename is assumed to be the
name of the class for training purposes.
|
// Create a feature map from the files given on the command line
for ( int i = 3; i < argc ; ++i ) {
const string pathname(argv[i]);
// extract the filename from the full pathname
const string filename(pathname.substr(pathname.rfind("/")+1,
string::npos));
// The classname is filename upto the first number
const string className(filename.substr(0,filename.find_first_of("0123456789")));
// Create a DictStyle object from the text file
DictStyle ds(pathname);
fm.addFeature(className,ds);
}
|
The Libann source comes with some examples of text files which
can be used to test this classifier. These are located in
`demo/data/text/style' and comprise extracts from 5 each
- Novels.
- User Manuals.
- Legal Documents.
We would expect these types of document to have a quite different style
of language, and therefore to be able to classify them accordingly.
Training the classifier and recalling from it is simple:
|
ann::Mlp classifier(9,2,0.4,6,0.45,1,3);
cout << "Training the classifier\n";
classifier.train(fm);
cout << "Writing to " << netfile << endl;
.
.
.
// Recall files
for ( int i = 3; i < argc ; ++i ) {
const string pathname(argv[i]);
DictStyle ds(pathname);
cout << classifier.recall(ds) << endl;
}
|
Investigating the precision of this classifier is left as an exercise
for the reader.
A.4 Hopfield Network
In the directory `demos/data/glyphs/numerals' there are 5 files
each containing a bit map pattern of the numerals from 1 to 5.
This program demonstrates how a Hopfield network can learn these
patterns, and then how a noisy pattern can be presented to the network
and be identified as the original.
The network is created from a set of all the patterns it is to hold, as
described in See section 4.4 Hopfield Networks.
| // Create a training set
set<ExtInput> inset;
for ( CI ci = filenames.begin() ; ci != filenames.end() ; ++ci ) {
Glyph g(*ci, true);
inset.insert(g);
}
// Instantiate a hopfield net trained with our patterns
Hopfield h(inset);
|
Having done this, the program mutates a few of the bits. The program
uses a special function called mutate for this purpose.
| for ( CI ci = recallNames.begin() ; ci != recallNames.end() ; ++ci ) {
Glyph g(*ci);
mutate(g);
ann::vector result = h.recall(g);
}
|
Results of the running the program show correct recall for the numerals
1--4.
However the network has problems identifying a noisy number 5. This is
because of the similarity between a `3' and a `5', and because
of the overlap in patterns when too many are given to the network to
learn. The Boltzmann machine can overcome these limitations.
A.5 The Boltzmann Machine as a Classifier
One application of a Boltzmann machine is its use as a classifier.
However it is not as fast as the Multi-Layer Perceptron, but this example
shows how a simple classification task can be achieved.
This demonstration program is located in
`demos/boltzmann/boltzmann-char.cc' and the data which we'll use
are found in
`demos/data/glyphs/xo' which are a number of bitmaps representing
the characters `+', `o' and `x'.
The program creates Boltzmann machine which recognises what each of
these characters look like, and then presents it with another similar
glyph from each class.
Like the Multi-Layer Perceptron example, the program creates a
ann::FeatureMap with the glyphs it is to be trained with.
|
for (CRI ci = de.rbegin() ; ci != de.rend() ; ++ci ) {
const string pathname(*ci);
// Get the filename from the pathname
const string filename(pathname.substr(pathname.rfind("/")+1,
string::npos));
// Save these ones for recall purposes
if ( filename.find_first_of("23") != std::string::npos)
continue;
// The classname is the first letter of the filename
const string className(filename.substr(0,1));
// Create the glyph and add it to the map
const Glyph g(pathname,true);
fm.addFeature(className,g);
}
|
Note that files which have a `2' or a `3' in them are not added to the
feature map, because they will not be used in training, but only for
recall.
Now the Boltzmann machine itself is created:
|
ann::Boltzmann net(fm,10,10,0.9);
|
The parameters after fm are the number of hidden units, the
initial temperature and the cooling rate respectively.
Looking up values in the Boltzmann machine is simply a matter of
presenting the value to the recall method;
This method will return a string representing the class to which the
feature belongs.
In this case, all 9 glyphs in `demos/data/glyphs/xo' are correctly
classified, despite the network having been trained with only one from
each class.
This document was generated
by John Darrington on May, 15 2003
using texi2html
|