The Text Outline Project: how internationalization is handled

We aspire to make Textop international and multilingual from its initial launch: it will be possible to upload texts and summarize them in their original languages.  The subprojects, when they eventually launch, will also be launched in several languages at the same time.  But, as we argue on this page, certain features inherent in the project will make it necessary for there to be a common language, a lingua franca; and this language should be English.

Launching with multiple languages

Since it is so essential to the scholarly quality of this enterprise that the original versions of texts be used (i.e., in their original languages), and since it is so beneficial that the same outline and text chunks be displayed in multiple languages, it is crucial that internationalization be deeply built into the software from the beginning.

The Text Outline Project will also launch with subject area mailing lists (e.g., philosophy, law, history) in multiple languages.  Furthermore, as long as volunteer translators can be found to do so, the present organizational website, including its various policy and planning documents, will be translated and maintained in the various languages of the project as a whole, to make it as easy as possible for people from around the world to get involved.  The difficult ongoing translation and maintenance work might require its own special tools, however: eventually we might replace the current pages with wiki pages.

The need for a common language for the Textop outline

It will probably be necessary for there to be one "master version" of the outline headers (but not necessarily of other content), in English.  (For definitions of jargon like "headers," please see the glossary.)  The reason for this is that, without a master version, there will be one outline per language, which will not achieve the remarkable and unprecedented benefits of tearing down language barriers.  As explained in the requirements discussion, outline nodes will have unique identifiers that are independent of the relative positions of nodes, or the wording of nodes' headers.  This means that node headers can be assigned multiple translations, one for each language, and that the reordering and renaming of nodes in the English master version is technically consistent with having versions in other languages.  In other words, if a node's header is rewritten in English and placed under a different parent node, the Polish version of the outline will also have the corresponding node under the new parent node as well, even if the header text has not yet been re-translated.  Of course, as the requirements discussion makes clear, there need to be robust tools and processes in place so that headers and other text can be translated efficiently.

Note that if there is to be a single master outline of the entire project, then editing of the outline can only be done in a lingua franca, of which English is the most plausible candidate.  But it seems very likely that each new text will require additions and changes to the outline.  In that case, since work on the outline can only be done in English, a person must be able and willing to work in English in order to do significant work on the Collation Project.  One might well argue that this will make it possible for people of many languages to work together--but only by making them work in English.  That seems to be a suboptimal solution.  Notice, though, that while we propose that Textop have a lingua franca for contributors, the result for users will be a fully consistent, international system that displays no particular favoritism to English.  If, by contrast, we were to have different outlines in different languages, while this would make contribution convenient for non-English speakers, it would be truly suboptimal for all users, regardless of language, because there would be no single text resource shared by and uniting the entire educated world--a shared resource that we are in an unusual and unique position now to create.

If we are committed to having a single outline for all languages, the only way we can avoid having a master outline in a lingua franca seems to be to allow all languages to edit the outline in their own language.  Some might find this suggestion incoherent at first glance, while others will find it feasible, so long as there is rapid-enough translation, so that any change to the outline in any one language propagates quickly across outline versions in all other languages.  The main difficulty with this suggestion is the inevitable difficulty of coordinating work and resolving disagreements among people who do not speak a common language.  Negotiation will be necessary if the parallel work in different languages is to be coordinated into a single outline.  In that case, it seems likely that, frequently, English will be the language in which negotiation takes place, especially since there may be many more than two people who are constantly working together--and then, there will need to be a convenient assumption about what language to use.  This will no doubt be English, because, like it or not, English is the language best-known by scholars around the world.

If English will be used to negotiate disputes among persons working on the outline, it is a small step to suggest that--purely for everyone's convenience--there be a single version of the outline, in English, on which all work on the outline takes place.  After all, consider that there might well be people working on the same area of the outline, collating different works in different languages.  It would be very easy for there to be edit conflicts when translators did not keep up with editors.  A more fundamental problem is that, without a canonical version of the outline in one language, translators will not know which version to translate.  Without agreement upon what thought a node's title is supposed to convey--which is impossible to propagate without speaking a specific, agreed-upon language--a node's headers in various languages may well be irreconcilable and lead to nasty interlinguistic disputes.  Even worse, observe that the information contained in the node's headers are crucial to making sense the hierarchical ordering of nodes: "Vision" is a plausible child node of "Perception," and not vice-versa, but we can tell that only by actually reading and agreeing on the node headers.

The need for a common language of policy development and project management

There is another, separate reason to resign ourselves to adopting English as Textop's lingua franca.  It is that, if the outline is to be coherent, there has to be a shared understanding of the principles behind its construction.  It is certain that the principles will be complex, evolving, and not easy to understand.  It will be very important, as the outline grows through collaborative effort, that there be a very widespread and inter-language understanding of what the outlining principles are.  Such a mutual understanding is best generated through discussion and very probably it can only be generated that way.  But then, again, a lingua franca is needed and, speaking practically, English is the best such language we have now.

Consequently, the Textop and Textop-OSS mailing lists will be conducted mainly in English, as will most advisory or governance discussions.

It is to be hoped that Anglophobes and, for example, Francophiles can take some solace in the fact that chunking and summarizing is best done in the language of the original--and, after all, even if English has become a lingua franca, only a small fraction of the great literature of the world is in English.  There will still be a great deal of work to be done in original languages.

Chunking of a text is uniform across languages, and takes place in the language of the original

Probably, the chunking of translations should be identical to the chunking of the original-language text, because there is to be a single master outline, and if the same outline is used differently, that is, if chunks and placement of chunks in the outline are permitted to differ from language to language, it is likely that only the English language outline will fully make sense.  In outlining the Leviathan it was discovered that outline headings frequently depended on the chunks chosen for some text.

So chunking and outlining decisions should go hand in hand: either there should be one outline, a unique set of chunks for each text (across languages), and a unique mapping of those chunks to the outline, or else all of these things, both outline and chunking, should vary from language to language.  The world will be greatly enriched if a way can be found to create a central, uniform meeting point, centered around the analysis of classic texts, for all languages.

An even more decisive consideration is the fact that we will no doubt frequently come across the following situation.  A text will be chunked and summarized in one language but not in others; now, if chunking is always supposed to take place in a single language, then, to make this text available in other languages, all that contributors need to do is to mark up a translation consistently with the original.  Then, even if the chunks in other languages are not translated to a target language, users will be able to see the sentences of the chunks rendered in the target language.  This would be efficient.  Matters would be very different if each translation were chunked differently: then readers of other languages would have to wait until translations were separately chunked to see the text collated into the version of the outline in their languages.

So the language of the original should be used to chunk a text; in other words, we should not chunk translations.  Instead, we should mark up translations so that we can import the chunks created for an original text in its original language into the translation; all that is left to do is to create summaries for the chunks, or to translate the summaries created for the original-language version.

By the way, it is not quite so obvious, as far as we can see, that chunk summaries be translations of the summary in the original.

There is a question whether dead languages should be adopted as the language of the chunks--whether, for example, when outlining Plato's Republic Attic Greek should be used as a canonical language of the chunks for that text.  This assumes that there needs to be a canonical language for the chunks of a given text; but on first glance, there does not seem to be any reason for that.  Perhaps it should be permissible for someone to do in a living language the chunking of a text originally written in a dead language.  But on second glance, there is one concern in that case, which is that it would be arbitrary which living language would be used; and the decisions about chunking the text would then be left to the speakers of that (arbitrarily-chosen) living language.  Besides, Textop will be so devoted to leadership by expertise that it is hardly a stretch to insist that only scholars who are able to work in the dead language be given the right to chunk the text.  So the proposed rule will be that for texts in all languages, including dead ones, chunking should be done in the language of the original.

Back to home page