Numdiff

by Ivano Primi <ivprimi (a) libero (dot) it>
Last Update: 2013-09-15

News

About

Numdiff (which I will also write numdiff) is a little program that can be used to compare putatively similar files line by line and field by field, ignoring small numeric differences or/and different numeric formats. Equivalently, Numdiff is a program with the capability to appropriately compare files containing numerical fields (and not only).

Whenever you compare a couple of such files, what you want to obtain usually is a list of the numerical fields in the second file which numerically differ from the corresponding fields in the first file. Well known tools like diff, cmp or wdiff can not be used to this purpose: they can not recognize whether a difference between two numerical fields is only due to the notation or is an actual difference of numerical values. In addition, you could also want to ignore differences in numerical values as long as they do not exceed a certain threshold. In other words, you could desire to neglect all small numerical differences too. However, programs like diff and wdiff can not be used to ignore small numerical differences, since they do not even know what a numerical difference is. That is why I decided to implement Numdiff.

In writing this program I was inspired by ndiff, a GPL'ed software by Nelson H. F. Beebe of the Salt Lake City University, see

http://www.math.utah.edu/~beebe/software/ndiff

ndiff is a good tool and I used it for a while, but I did not completely like the way it works and so numdiff was born. Although ndiff inspired numdiff, they are completely different from the viewpoint of the source code: numdiff has been entirely written from scratch with addition of source code from GNU bc, GNU diff and GNUlib.

When comparing files, Numdiff assumes by default that the fields are separated by white-space characters (spaces, horizontal tabulations and newlines), but the user can also specify its list of separators through the option -s, see the User Manual.

Numdiff has many features that ndiff lacks, for instance it recognizes complex numbers and allows to specify different sets of field delimiters for the two files to compare. In addition, starting from version 5 Numdiff includes a filter which allows it not to get confused if in one file one or more lines are present for which there exist no corresponding lines in the other file. Also this feature is missing in ndiff.

I know that many people could find Numdiff simply useless. But people working in Scientific Computing or in Numerical Analysis could find it useful for their job. Since one might compare a file containing the output produced by a given numerical program, when running in a certain environment, with another file containing the output produced by the same program but in a different environment. By different environment I mean e.g. a different operating system or a different compiler on the same system. Moreover, sometimes one has to compare the output of a numerical program, which is made to solve a certain problem, with the one produced by another program, which solves the same problem but using a different algorithm. Finally, one might compare the output of a numerical program with a sample file containing a list of expected data (which could have been computed theoretically or come from experiments in a laboratory). In all these situations Numdiff could turn out very helpful, since it also lets the user specify a tolerance for absolute and/or relative differences, then reporting only the fields which differ enough to exceed these tolerances.

To end this presentation, I have to say that Numdiff is a console application, i.e. a computer program designed to be used via a text-only computer interface, such as a text terminal or the command line interface of some operating systems. This means no mouse, no windows, no buttons, no silly icons. All modern operating systems provide with the Graphical User Interface (GUI) a program to emulate a text terminal. This program has different names depending on the operating system you are using: console, terminal emulator, xterm, rxvt, and so on. To use Numdiff you have to open the console/terminal emulator, start to write there some strange commands, and then press the key Enter to execute them :) If you do not know how to start with a terminal emulator, search the web for a user guide and, after reading it carefully, come back here.

Sample Output

Since a single example is often more useful than many words... Let us suppose that file1 contains the list of numbers:

  1.25	-3.45		1.23456789E-2   -5.98765432e+5  100.00

and file2 the following one:

  1.250001  -3.450003	1.23456788E-2   -5.98765431e+5  100.000022

We can compare these two files by calling numdiff (the name of the program must be written lower case!) and passing it file1 and file2 as arguments:

  numdiff file1 file2

The output of this command will be:

  ----------------
  ##1       #:1   <== 1.25
  ##1       #:1   ==> 1.250001
  @ Absolute error = 1.0000000000e-6, Relative error = 8.0000000000e-7
  ##1       #:2   <== -3.45
  ##1       #:2   ==> -3.450003
  @ Absolute error = 3.0000000000e-6, Relative error = 8.6956521739e-7
  ##1       #:3   <== 1.23456789E-2
  ##1       #:3   ==> 1.23456788E-2
  @ Absolute error = 1.0000000000e-10, Relative error = 8.1000001393e-9
  ##1       #:4   <== -5.98765432e+5
  ##1       #:4   ==> -5.98765431e+5
  @ Absolute error = 1.0000000000e-3, Relative error = 1.6701030958e-9
  ##1       #:5   <== 100.00
  ##1       #:5   ==> 100.000022
  @ Absolute error = 2.2000000000e-5, Relative error = 2.2000000000e-7
  
  +++  File "file1" differs from file "file2"

This text should be self-explanatory. The tags ##l and #:f, where l and f are integer numbers, refer respectively to the line number and to the position of the field within the line. Then

  ##1       #:1   <== 1.25
  ##1       #:1   ==> 1.250001
  @ Absolute error = 1.0000000000e-6, Relative error = 8.0000000000e-7

means that the first field of the first line is given by 1.25 in the first file, 1.250001 in the second one. The absolute difference between these two numbers is 1.0000000000e-6, while the relative difference is given by 8.0000000000e-7.

Numdiff can also print a sort of statistical report about the numerical differences discovered in the two files. To this end is sufficient to specify the option -S. If you are interested only in the statistical report and you want to remove from the output the detailed list of all differences, then you have to specify additionally the option -q. The output of the command numdiff -S -q file1 file2 is:

  
  5 numeric comparisons have been done, all of them
  have produced an outcome beyond the tolerance threshold
  
  Largest absolute error in the set of relevant numerical differences:
  1.0000000000e-3
  Corresponding relative error:
  1.6701030958e-9
  Largest relative error in the set of relevant numerical differences:
  8.6956521739e-7
  Corresponding absolute error:
  3.0000000000e-6
  
  Sum of all absolute errors:
  1.0260001000e-3
  Sum of the relevant absolute errors:
  1.0260001000e-3
  Arithmetic mean of all absolute errors:
  2.0520002000e-4
  Arithmetic mean of the relevant absolute errors:
  2.0520002000e-4
  Square root of the sum of the squares of all absolute errors:
  1.0002469695e-3
  Quadratic mean of all absolute errors:
  4.4732404362e-4
  Square root of the sum of the squares
  of the relevant absolute errors:
  1.0002469695e-3
  Quadratic mean of the relevant absolute errors:
  4.4732404362e-4
  
  +++  File "file1" differs from file "file2"

You can specify an absolute error tolerance (or a relative error tolerance) by the option -a (-r). If the user specifies an absolute error tolerance, numdiff only reports the absolute differences exceeding that tolerance. For instance, the output of numdiff -a 1.0e-5 file1 file2 will be

  ----------------
  ##1       #:4   <== -5.98765432e+5
  ##1       #:4   ==> -5.98765431e+5
  @ Absolute error = 1.0000000000e-3, Relative error = 1.6701030958e-9
  ##1       #:5   <== 100.00
  ##1       #:5   ==> 100.000022
  @ Absolute error = 2.2000000000e-5, Relative error = 2.2000000000e-7
  
  +++  File "file1" differs from file "file2"

Numdiff can also recognize non-numerical differences between the files passed to it as arguments. If a certain field in any of the two files is of non-numerical type, then, instead of performing a numeric comparison, Numdiff will simply perform a literal (character by character) comparison. If the file example1 contains the line

  1.0     xyz     3.0     x       y

and the file example2 the line

  abc     1.1     3.3     x       z

then numdiff example1 example2 displays

  ----------------
  ##1       #:1   <== 1.0
  ##1       #:1   ==> abc
  @                                                     @@
  ##1       #:2   <== xyz
  ##1       #:2   ==> 1.1
  @                                                     @@
  ##1       #:3   <== 3.0
  ##1       #:3   ==> 3.3
  @ Absolute error = 3.0000000000e-1, Relative error = 1.0000000000e-1
  ##1       #:5   <== y
  ##1       #:5   ==> z
  @                                                     @@
  
  +++  File "example1" differs from file "example2"

The most appealing feature of Numdiff is the ability to detect insertions/deletions of lines, similarly to what diff does, through activation of a filter. Let us suppose that the files list1 and list2 contain the data

  Additional_line_which_creates_confusion
  Additional_line_which_creates_confusion
   +1.000
   +2.510
  +10.022

and

   +1.003
   +2.500
  +10.000
  Final_line_which_creates_confusion

respectively. What you would expect to find in the report displayed by Numdiff is that list1 contains two lines at the begin which are not present in list2, that the last line of list2 is not present in list1, and finally that the three numerical values in list2 differ from the corresponding values in list1 together with an indication of the absolute and relative errors. But the output of the command numdiff list1 list2, namely

  ----------------
  ##1       #:1   <== Additional_line_which_creates_confusion
  ##1       #:1   ==> +1.003
  @                                                     @@
  ----------------
  ##2       #:1   <== Additional_line_which_creates_confusion
  ##2       #:1   ==> +2.500
  @                                                     @@
  ----------------
  ##3       #:1   <== +1.000
  ##3       #:1   ==> +10.000
  @ Absolute error = 9.0000000000e+0, Relative error = 9.0000000000e+0
  ----------------
  ##4       #:1   <== +2.510
  ##4       #:1   ==> Final_line_which_creates_confusion
  @                                                     @@
  ----------------
  ##5       <== +10.022
            ==>
  
  
  ***  End of file "list2" reached
       Likely the files "list1" and "list2" do not have the same number of lines !
  
  +++  File "list1" differs from file "list2"

differs from your expectations. By default Numdiff compares indeed the first, second, third line of the first file (in this case list1) with the first, second, third line of the second file (list2), and so on. If in one of the two files to compare there are one or more lines for which there exist no corresponding lines in the other file, then Numdiff gets confused and displays a wrong output.

The filtering mechanism implemented in Numdiff since version 5 can detect such situations and re-synchronize the two files to obtain the final expected result. For instance, the command numdiff -z @ list1 list2, which activates the filter through the option -z @, outputs

  ----------------
  ##1       <== Additional_line_which_creates_confusion
            ==>
  
  ----------------
  ##2       <== Additional_line_which_creates_confusion
            ==>
  
  ----------------
  ##3       #:1   <== +1.000
  ##1       #:1   ==> +1.003
  @ Absolute error = 3.0000000000e-3, Relative error = 3.0000000000e-3
  ----------------
  ##4       #:1   <== +2.510
  ##2       #:1   ==> +2.500
  @ Absolute error = 1.0000000000e-2, Relative error = 4.0000000000e-3
  ----------------
  ##5       #:1   <== +10.022
  ##3       #:1   ==> +10.000
  @ Absolute error = 2.2000000000e-2, Relative error = 2.2000000000e-3
  ----------------
            <==
  ##4       ==> Final_line_which_creates_confusion
  
  
  +++  File "list1" differs from file "list2"

The use of the filter can be sometimes tricky, see the User Manual for more examples and additional explanations.

Numdiff has many more options and features. In the User Manual you can find a detailed description of them.

Installation

On Unix(R) and GNU systems, like GNU/Linux, configuration, building and installation of Numdiff can be performed through the standard three steps:

          ./configure
          make
          make install

provided that the system supplies an ANSI C compiler, a POSIX implementation of the make utility and a shell sh-compatible. The compiler should at least accept the option -o to write its output to a specified file, the option -D for macros pre-definition, the option -l to search for a specified library, and the options -I and -L to add a given directory to the search path for include and library files respectively. If you want to install the documentation also in the GNU Info format, then you need additionally a proper installation of GNU Texinfo. Finally, a proper installation of GNU Gettext is needed if you care about support for languages other than english (at the moment only the Italian localization is available). If you leave enabled the Natural Language Support and you want to install also the localization files, then, after make, you will have to type and run

          make install-nls

By default, make install will install all the files in /usr/local/bin, /usr/local/info, etc. You can specify an installation prefix other than /usr/local using the option --prefix in the configure step, for instance --prefix=$HOME:

          ./configure --prefix=$HOME

Type ./configure --help to obtain the complete list of all available options.

Once Numdiff has been installed, you can remove all files previously installed by a simple make uninstall. If you have also installed the localization files trough make install-nls, then, in order to remove these ones too, use make uninstall-nls in place of make uninstall.

Look at chapter 4 of the User Manual if you need more information on how to compile, build and install Numdiff.

TODO

Known issues

If you use gcc 4.7.2 or clang 3.3 to compile and build Numdiff (but other versions of these compilers could also be affected by this issue), do not run configure with --enable-optimization and do not set the environment variable CFLAGS to -O, -O1, -O2, etc., unless you also use --disable-gmp. If you enable the optimized build of the code while keeping GMP-based multiple precision arithmetic active, then the executable of Numdiff you get will not run properly (long execution time with overloading of the processor).

License

Numdiff (also written numdiff) is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Numdiff is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Contact and Bug reports

Bug reports have to be sent to the address ivprimi@libero.it . Please, put Numdiff in the subject and indicate the version of the operating system you are running (in particular, do not forget to specify if it is a 32- or a 64-bit system), and, if you know it, the version of the compiler used to build Numdiff. Please write also whether your version of Numdiff uses the GNU MP library or not. Before writing an email be sure to run the latest stable version of Numdiff, I do not provide support for older versions.

Download and Documentation

The tar-gzipped archive with the source code of Numdiff can be downloaded from

http://savannah.nongnu.org/download/numdiff

The latest stable release of Numdiff is given by version 5.8.0. Together with the source code, the archive contains a very detailed user manual (in English). The manual, which has been written by using GNU Texinfo, is available in the following formats:

Permission is granted to copy, distribute and/or modify this manual under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation. A copy of the license is always included in the section entitled "GNU Free Documentation License". You can also obtain a copy of the GNU Free Documentation License from http://www.gnu.org/copyleft/.

The manual of Numdiff can also be browsed online here.

Acknowledgments

First I want to thank all the people till now involved in the Free Software community, starting from those ones directly involved in the GNU project (http://www.gnu.org). Without their great work, this little one would have never been done.

I have also to thank Aurelio Marinho Jargas (verde@aurelio.net), author of txt2tags (http://txt2tags.sf.net), a free (GPL'ed) and wonderful text formatting and conversion tool, which I used in writing this web page.

I want to thank Mr. Norman Clerman of Opcon Associates, Inc. for several suggestions he gave me to improve the readability and the effectiveness of the output produced by Numdiff. He also pointed out the need to implement a filter to resynchronize the lines between two files in case of addition or deletion of one or more lines. I have to give him credit for the urge to prepare the versions 4.x and 5.x of Numdiff.

Finally, I want to thank my friends Mariapia Palombaro, since she removed some errors while reviewing the first version of this document, and Paolo Caramanica, who suggested me to add more information to the output of the option -S of Numdiff.