secondo/Tools/pd/PD1

/*
//paragraph [1] Title: [{\Large \bf] [}]
//[ue] [\"{u}]
//[us] [\_]
//[_] [\_]


[1] The PD System: Integrating Programs and Documentation

Ralf Hartmut G[ue]ting

May 1995

2002-2004, Markus Spiekermann. Changes of makefiles and shell scripts.


1 Overview

The purpose of ~PDSystem~ is to allow a programmer to write ASCII program files with embedded documentation (~PD~ stands for ~Programs with Documentation~). Such files are called ~PD files~. Essentially a PD file consists of alternating ~documentation sections~ and ~program sections~. Within documentation sections, one can describe a number of paragraph formats (such as headings, displayed material, etc.), character formats (e.g. italics), and special characters (e.g. ``[ue]''). How this is done, is described in the document ``Integrating Programs and Documentation'' [G[ue]95].

The main component of PDSystem is an executable program ~maketex~ which converts a PD file into a LaTeX file. More precisely, given a file ~pdfile~, a LaTeX file ~pdfile.tex~ is created as follows: First, a standard header for LaTeX is copied from a file ~pd.header~ into ~pdfile.tex~. Then, ~pdfile~ is first run through a filter program called ~pdtabs~ which converts tabulator symbols into corresponding sequences of blanks. The output of this filter is fed into ~maketex~ (which can therefore be sure not to encounter any tab symbols) which converts material in documentation sections into LaTeX code and encloses program sections by ``verbatim'' commands to force TeX to typeset them exactly as they have been written.

A PD file may contain very long lines, because CR (end of line) symbols need only be present to delimit paragraphs. To make the output file ~pdfile.tex~ easily printable, the output of ~maketex~ is run through a further filter called ~linebreaks~ which introduces CR symbols such that no lines with more than 80 characters are in the output. In total, we have the processing steps illustrated in Figure 1.

		Figure 1: Construction of a TeX file from a PD file
				[pddocu.Figure1.eps]

The file ~pd.header~ is shown in Section 6.1. ~Cat~ is a UNIX system command. Executables ~pdtabs~ and ~linebreaks~ are created from the corresponding C programs ~pdtabs.c~ and ~linebreaks.c~ shown in Section 7. The programs leading to ~maketex~ are described below. The complete process shown in Figure 1 is executed by a command procedure called ~pd2tex~ given in Section 8.4.

There are further command procedures:

  * ~pdview~ allows one to view a PD file under unix using the ~xdvi~ viewer (after it has been processed by LaTeX). On MS-Windows the previewer is called ~yap~. The viewer is defined in the environment variable ~PD[_]DVI[_]VIEWER~.

  * ~pdshow~ shows a PD file using a postscript viewer defined in the environment variable ~PD[us]PS[us]VIEWER~. This shows embedded figures. However, the quality of the display is not as good as with ~xdvi~.

  * ~pd2pdf~ converts a PD file into the portable document format using the program ~dvipdfm~.

At a lower level there are three scripts which convert PD to other formats:

  * ~pd2tex~ is called by every script to convert PD into a LaTeX file.

  * ~pd2dvi~ is called by ~pdview~ to create a DVI file.

  * ~pd2ps~  is called by ~pdshow~ and creates a postscript file.

These scripts may be useful to recreate a DVI file while previewing an older version. The preview will detect automatically
that there exists a newer DVI file and reloads it. Another application may be to use Adobe Distiller in order to create a PDF starting
from a postscript file.

As an example, the processing steps used in ~pdshow~ are illustrated in Figure 2.

		Figure 2: Steps of ~pdshow~ [pddocu.Figure2.eps]

We now consider the construction of ~maketex~, the central component of the PD system. ~Maketex~ depends on the components shown in Figure 3.

		Figure 3: Components for building ~maketex~
			[pddocu.Figure3.eps]

These components play the following roles:

  * ~Lex.l~ is a ~lex~ specification of a lexical analyser (which is transformed by the UNIX tool ~lex~ into a C program ~lex.yy.c~). The lexical analyser reads the input PD file and produces a stream of tokens for the parser.

  * ~Parser.y~ is a ~yacc~ specification of a parser (transformed by the UNIX tool ~yacc~ into a C program ~y.tab.c~). The parser consumes the tokens produced by the lexical analyser. On recognizing parts of the structure of a PD file, it writes corresponding LaTeX code to the output file.

  * ~NestedText.c~ is a ``module'' in C providing a data structure for ``nested text'' together with a number of operations. This is needed because text for the output file cannot always be created sequentially. Sometimes it is necessary, for example, to collect a piece of text from the source file into a data structure and then to create enclosing pieces of LaTeX code before and after it. The ~NestedText~ data structure corresponds to a LISP ``list expression'' and is a binary tree with character strings in its leaves. There are operations available to create a leaf from a string, to concatenate two trees, or to write the contents of a tree in tree order to the output.

  * ~ParserDS.c~ contains a number of data structures needed by the parser. These data structures are used to keep definitions of special paragraph formats, special character formats, and special characters, which can be defined in header documentation sections of PD files.

  * ~Maketex.c~ is the main program. It does almost nothing. It just calls the parser; after completion of parsing (which includes translation to Latex) a final piece of Latex code is written to the output.

Figure 3 describes the dependencies among these components at a logical level. An edge describes the ``uses'' relationship. For example, the ~NestedText~ module is used in the lexical analyser, the parser and in the parser data structure component. Figure 4 shows these dependencies at a more technical level, as they are reflected in the ~makefile~ (see Section 9).

		Figure 4: Technical dependencies in the construction of the executable ~maketex~ [pddocu.Figure4.eps]

Here each box corresponds to a file. An unlabeled arrow means the ``include'' relationship (for example, ~NestedText.h~ is included into ~Lex.l~, ~Parser.y~, and ~ParserDS.c~. Edges labeled with ~lex~, ~yacc~, and ~cc~ mean that the tools ~lex~ and ~yacc~ or the C compiler, respectively, produce the result files. Fat edges connecting files indicate that these files are compiled together.

The rest of this document is structured as follows: Section 2 describes the ~NestedText~ module (files ~NestedText.h~ and ~NestedText.c~), Section 3 the lexical analyser (~Lex.l~), Section 4 ~ParserDS.c~, Section 5 the parser itself (~Parser.y~). Section 6 shows the header file for Latex and the rather trivial program ~Maketex.c~. Section 7 contains the auxiliary functions ~pdtabs.c~ and ~linebreaks.c~. Section 8 gives the command procedures ~pdview~, ~pdshow~, etc. Finally, Section 9 contains the ~makefile~.

*/