Glossary

  • The text: an entire work, or any part of it; e.g., Hobbes’ Leviathan.
  • Markup: XML or other markup that identifies (in a machine-readable form) parts, chapters, paragraphs, sentences, etc., within a text.  Later, perhaps other kinds of markup.
  • Chunks: approximately paragraph-sized bits of the text, distinguished by their function (e.g., definition, argument, explanation, description, etc.), together with other information about these bits of text (see below).
  • The outline: usually we will use this to mean just the outline, not the chunks that are placed in the outline.  The outline in this restricted sense consists exclusively of its nodes and their ordering.
  • Nodes: parts or lines of the outline, or what lives across from a number or bullet point; for example, “The ends of inquiry” and “The lawfulness of private organizations” are headings of two nodes; note, however, that nodes are to be distinguished by numerical or other unique identifiers, not by their headings, since the same node can have different headings in different languages, and headings are editable.
  • Headings: the words, in some specific language, assigned to a node.  As the same node can exist in multiple languages, it can have multiple headings.

Chunks are made up of:

  • Function: a single-word description of the linguistic function of the chunk; e.g., definition, argument, explanation, description, etc.
  • Summary: a summary of the content of the chunk.
  • The sentences: the words actually taken from the text; often, a paragraph.  (A more technically useful concept might be the language-independent identifiers of sentences, stated in terms of the markup of the text.)
  • Reference: a human-friendly, automatically-generated pointer to where the sentences may be found in the text.  E.g., “Hobbes, Lev XIV 1″.
  • Chunk metadata: the above components of a chunk.  Project participants may decide to add other required metadata fields.

Example of a chunk:

Argument [function]
A very small and relatively powerless state is ineffective to remove the state of nature. [summary]

Nor is it the joining together of a small number of men that gives them this security; because in small numbers, small additions on the one side or the other make the advantage of strength so great as is sufficient to carry the victory, and therefore gives encouragement to an invasion. The multitude sufficient to confide in for our security is not determined by any certain number, but by comparison with the enemy we fear; and is then sufficient when the odds of the enemy is not of so visible and conspicuous moment to determine the event of war, as to move him to attempt. [the sentences]

Hobbes, Lev XVII 3 [reference]
Finally, we will use a verb, “to chunk,” to mean dividing a text up into chunks.