Myer - Semantic Highlighting for C source
Jonathan Yavner <jyavner@member.fsf.org>

Links:  [Download area]  [Savannah homepage]  [Freshmeat homepage]

Freshmeat spiel

Myer supports contemplative review of C source code.  It is for maintainers who know their program "too well" and need to see it from a different angle.  It colorizes identifiers and constants to show their marginal cost to the program's coupling and cohesion metrics.  Myer is based on gcc and accepts the same C dialect.  It "runs the preprocessor in reverse", propagating info from the parse tree back to spots in the .c and .h files.  Output is HTML.

Version 20031129 replaced some hacks with solid code (scoping of like-named global variables, handling of type names).
Version 20031031 now supports gcc-3.3.2, uses much less swap, and fails less often when merging header files.
Version 20031017 first release.

Please send your opinions of this project, so I can prioritize the to-do list.

Example output

Coupling is indicated by color: low coupling = blue, high coupling = cyan.  Module-local items = purple.  Function-local items = green.

Cohesion is indicated by intensity: low cohesion = bright, high cohesion = dim.  Cohesion does not apply to function-locals.

Compatible systems

Installation

  1. Download and unpack a copy of the gcc "core" compiler.  Here is gcc-3.3.2 (recommended), while here is gcc-3.2 (also works).  To use other gcc revs, you'll have to interpolate the patch files.
  2. For a minimal installation of gcc for Myer, execute in the gcc download directory:
  3. Download and unpack a copy of Myer (see link at beginning of this document).
  4. Alter the Makefile for Myer:
    1. Change GCC_SRC to point to your downloaded copy of gcc.
    2. Change GCC_PATCH if you're using gcc-3.2 instead of gcc-3.3.2.  Patch files are supplied only for those two revs!
    3. [Optional] change GCC_SYSINCLUDE to point to your installed gcc library, but only if your installed gcc is version 3.0 or later and you intend to delete your gcc download directory after installing Myer.
    4. [Optional] change OPT if you want Myer to run even faster.
  5. Run make in Myer's directory.
  6. [Optional] To install Myer, copy ./myer to the desired directory, then copy ./gcc_patch/cc1 to the same directory and rename it myer_cc1.
Here are the major steps performed by Myer's Makefile:
  1. Creates subdirectory ./gcc_patch containing soft links to most of gcc's files.
  2. Copies some gcc source files to gcc_patch and patches them. These patches add a new compiler option -d@ which prints (line,col) and UID for each identifier and constant.
  3. Compiles and links the phase-1 compiler ./gcc_patch/cc1.
  4. Creates the soft link ./myer_cc1 which points to ./gcc_patch/cc1.
  5. Creates myerenv.c with a list of the standard predefined cpp macros for your system.
  6. Compiles and links Myer to produce program ./myer.

Invocation

For simple projects:

     /path/myer *.[ch]
This creates a subdirectory "Myer" containing colorized HTML versions of your files.  It works best when all the source files in the directory are part of one program.  "/path/" is where you have put your Myer executable.

For projects with Makefiles:


     make
     rm *.o
     make CC=/path/myer
This creates subdirectories "xxx_Myer" for each program xxx that the Makefile creates.  This technique requires that your Makefile be assiduous in always using "$(CC)", not "cc" or "gcc" to invoke the compiler, that each program contains at least one module that is separately compiled to .o before being linked as an executable, and that you do not combine object files into .a archives before linking.  Myer contains hacks for dealing with ".o" file extensions to make this technique work.  This invocation mode colorizes only the .c and not the .h files; sorry.

The full monty:

     /path/myer [options] files
Myer accepts a few options of its own, then passes the rest to gcc.  Files with ".c" extensions are passed to gcc for parsing (using the new -d@ option in gcc).  Files with ".myerN" extensions are parsed internally (see -Pn and -o below).  For a filename ending with ".o", Myer will instead parse the corresponding file with ".myer2" extension.  Other filenames are assumed to be headers and are processed only if some parsed file referred to them via #include.

Myer options:

-Pn: Produce output for phase n.  There are six phases (see discussion below).  The most useful phases are -P6 (final HTML output) and -P2 (the last phase where each .c file can be processed separately).  Default is -P6 unless -o specifies a filename with a ".o" extension; in that case the default is -P2.  After you create a phase file with a .myerN extension, you can send the file back into Myer on a later run to continue processing from that point; if N = 3, 4, or 5 (which describe entire programs), you can't send in any other files on that run.

-o name: Send output to file name.  Default is "Myer/" for phase 6 (which needs an output *directory*); or stdout for other phases.  If name ends with ".o", it is changed to ".myerN" where N is the selected phase; if no phase is selected, -P2 is assumed.  If name has no final slash and phase 6 is selected, "_Myer/" is appended to name to make the output directory.

-v: Print verbose progress reports during execution.

-c: Ignored for compatibility with Makefiles.  Myer does not run ld.

Myer's phases

Phase
Function performed
Limiting goal
1 "parse"
Parse C into a stream of tokens that associate identifiers and constants with both their text position (line,col) in the compilation unit and their semantic units (= UIDs) in gcc's parse tree.
Still recognizable as a stream of C code.
2 "token"
Fixup the token stream: sort by (line,col), merge duplicate tokens in macro definitions, deal with generated identifiers from macro calls.  Split the token stream by file of origin and convert compilation-unit line numbers back to input-file line numbers.
Still processing each compilation unit separately.
3 "merge'
Combine compilation units.  Merge the token-streams for header files mentioned in several units.
Still generic C processing.
4 "sum"
Produce summary counts of various things, that will be used by phase 5.
Still no meat.
5 "calc"
Calculate marginal costs to coupling/cohesion for each identifier and constant.
Still no HTML.
6 "html"
Reread input files, annotate with marginal-cost info.
(Done.)

For further info on the contents of .myerN files, see README-phases.

Customization of parameters

This is an alpha rev!  For now, you'll have to edit the program's header files.  The parameters all have names starting with "DEFAULT_", in case the program someday acquires command-line options to override these defaults.

Parameters in file  myer.h:

DEFAULT_SCALE_COUPLING: Specifies the numerators in the three formulas for the marginal coupling cost of an item.  Denominators: how many modules refer to its defining module, how many functions in this module refer to items in its module, how many items from its module are referenced from this function.  Total coupling is the sum of these three fractions.  The default numerators (35%, 50%, 15%) need more research, as does the sum-of-fractions formula.

DEFAULT_SCALE_COHESION
: Specifies the numerators in the three formulas for the marginal cohesion cost of an identifier.  Denominators: total references from all modules, total references from this module, total references from this function.  Total cohesion is the sum of these three fractions.  A *high* value for the total indicates *low* cohesion cost.  The default numerators (35%, 50%, 15%) need more research, as does the sum-of-fractions formula.

DEFAULT_OUTPUT_DIRECTORY: Name of output directory when no -o option is used.  This is also the suffix used (with a preceding underscore) when -o specifies a name without a final slash.  Default value is "Myer/".

Parameters in file  myerhtml.h:

DEFAULT_GAMMA_COUPLING
: Specifies the exponent in the equation
          html_coupling = orig_coupling(1/gamma)
A gamma > 1 reduces the visual differences between identifiers that have high coupling cost, while magnifying the differences among low-coupling items.  This correction is needed partly because of the logarithmic response of the eye, but mostly because of the sum-of-fractions nature of the marginal-cost formulas.  Likely values for marginal cost of coupling include 1, ½, 1/3, ¼, etc., where the denominator is the number of module items referenced.  It seems wasteful to assign half the color space to the difference between the first two likely values!  The default coupling gamma (1.75, needs more research) changes the likely-value steps to 1.0, 0.67, 0.53, 0.45, etc.

DEFAULT_GAMMA_COHESION: Ditto, but for cohesion.  The default cohesion gamma (1.25, needs more research) gives likely-value steps of 1.0, 0.57, 0.41, 0.33, etc.

DEFAULT_COLOR_BG: Basic background color.  Default is pale pink, to improve contrast with the bright cyan colors.

DEFAULT_COLOR_FG: Basic foreground color for stuff other than identifiers and constants.  Default = medium gray.

DEFAULT_COLOR_IFDEF: Foreground color for code that has been #ifdef'ed out.  Default = light gray.

DEFAULT_COLOR_LOCAL: Foreground color for function-local items.  Default = light green.

DEFAULT_COLOR_GLOBAL: How to compute a foreground color from coupling and cohesion values.  The default is a simplistic formula where color varies from blue to cyan as coupling varies from 0.0 to 1.0, while brightness varies from 40% to 100% as cohesion varies from 0.0 to 1.0.  Notes:
     [1] The eye perceives a cyan color as much brighter than a blue at the same rate of photon emission.
     [2] At equal RGB brightness, a cyan color (= blue + green phosphors) involves twice as many photons as a blue (= just blue).
     Maybe the formula should use the CIELuv colorspace instead of RGB to make the color and brightness formulas independent?

DEFUALT_COLOR_MODLOCAL: How to compute a foreground color for a module-local item, which has only cohesion and no coupling.  The default is a simplistic formula that uses varying brightness of magenta (equal amounts of blue and red).

DEFAULT_HTML_STARTCONST and DEFAULT_HTML_ENDCONST: Some HTML code to emit before and after literal constants to distinguish them from identifiers.  The default gives them a peach-colored background.

C dialect limitations

Myer should accept any C program that gcc will accept, but doesn't.  Here are the gcc features known to be unsupported by Myer:

To-do list

Best viewed with mammalian neocortex ON