Text and Collaboration
A personal manifesto for the Text Outline Project

The project will be put in a "back burner" state until 2007 or 2008 by project director Larry Sanger, who has decided that his energies are better spent on a new project.

Larry Sanger

April 2006 (revised occasionally since then)



Part I: Why and how to collaborate


        Strong collaboration and its many possibilities

        The culture of collaboration


Part II: How to pursue some new collaborative projects

        The Collation Project

        The benefits of the Collation Project

        Three more collaborative text projects

                Analytic Dictionary Project

                Debate Guide Project

                Event Summary Project

        A single outline of human knowledge?

        The way forward

Bear in mind that as the Text Outline Project proceeds, this manifesto (especially Part II) will become dated.  The project summary and many other documents will be kept up-to-date.




It is widely held that the Internet is about to undergo a revolution made possible by a new kind of collaboration.  I hold this position myself, strongly, but I take an unusual approach to it.  Part I makes two broad points.  (1) Strong collaboration is importantly different from traditional collaboration; but it is, or should be, something more than just social software like wikis and open methods like Wikipedia's.  We need to think much more creatively, designing collaborative systems to solve very particular problems elegantly and efficiently.  (2) The strong collaborations that will really make a difference in the future will combine the culture of openness found among the open source/open content crowd with the culture of respect for expertise found among traditional information producers (including academics).


In Part II, I propose a new "Collation Project" which has the lofty aim of enlisting large numbers of scholars to take scholarly public domain texts, analyze them into paragraph-sized chunks, and shuffle the chunks into a single massive outline--to construct, eventually, The Book of the World.  This will have revolutionary implications by making knowledge more easily accessible and smashing interdisciplinary and language barriers.  At the same time I propose to start at least three other projects: (1) an Analytical Dictionary Project, which sorts lexicographical data by concept, not word, and puts the result into the same outline as the Collation Project; (2) a Debate Guide Project, which summarizes arguments and positions on all manner of controversy; and (3) an Event Summary Project, which will locate detailed summaries of current events as part of a chronology that is continuous with a historical chronology. 


These projects will be associated with the Digital Universe Foundation but under my direction.  They will be built from the ground up by a massive body of volunteer scholars led by subject area and other experts.  It will be highly participatory and open from the beginning--but a meritocracy with enforceable rules.



Part I: Why and how to collaborate


The educated world, and everything touched by it, is about to undergo an alteration of the same order of consequence as the scientific method, the printing press, the Industrial Revolution, and the transistor.  So I will suggest in this essay.  It will be a revolution in collaboration--just as, not coincidentally, these previous revolutions made it possible for people to work more closely together in various new ways.  To a great extent, as has often been observed, the increasingly evident power of collaboration is really just the fulfillment of the promise of the Internet.  So perhaps a better descriptor is simply "the Internet Revolution."  If so, the Internet Revolution has parts, and we are only now entering into the next part--the collaborative part.


I do not claim to be making an original claim in forecasting this revolution, nor do I really expect to persuade anyone that it is coming.  I hope this essay will at least evince my own sincerity in avowing a belief in this impending revolution.  I hope it will also help explain why I, and a growing number of people online, believe this revolution is coming.  And for those who are not yet convinced, I hope it will at least clarify and plant the seed of a very, very fruitful idea.


This essay is intended more as a manifesto than as a project proposal or an academic paper.  I plan to describe an idea, "strong collaboration," and why creativity is needed to make it work; explain the cultural elements also needed for strong collaboration to work; provide a number of illustrations of new sorts of collaborative projects, and explain why these are possible only now and why they are indeed revolutionary; and conclude with some notes about how to prosecute the new projects, which I invite you to join.


One other introductory remark is in order.  Some people will read an inaccurate summary of this essay, or this introduction, and think that it's old hat--that I am just singing the praises of wikis and blogs and folksonomies all over again.  "Power to the cyber-people" and all that: that was still new about two years ago, but not now.  As an interpretation of this essay, that would be a mistake.  My take on the future of collaboratively-developed content is importantly different from that which prevails on Wikipedia, Slashdot, Kuro5hin, and most of the Blogosphere that comments on this stuff--i.e., the websites and Internet projects that most closely tracks developments in online collaboration.  I boldly submit that my take is not only different, it is more mature and better-developed than the prevailing view of online collaboration, according to which, as far as I can tell, collaboration is best done when anarchy prevails, anonymity is expected, and hard-won expertise is at best ignored and at worst an object of sophomoric contempt.  Not only do I have these philosophical or policy disagreements, I think that focus on tools like wikis, to the exclusion of projects and their special requirements, has led to widespread indifference to whole classes of potential new kinds or systems of collaboration.  At any rate, I think and hope that those who theorize about online collaboration will find something new and worth thinking about here.


Strong collaboration and its many possibilities

Strong, or radical, collaboration is crucially different from old-fashioned collaboration.  Many people who have not worked much with open source software, or with Wikipedia, do not realize this.  Old-fashioned collaboration generally involves two or more people working serially on a single work, or each on a different part of a work, and the work is then put together by an editor and perhaps approved by committee.  This frequently produces boring, unadventurous, and confusing work, as everybody knows; the phrase "written by committee" stands for "stitched together incoherently like a Frankenstein monster."


Strongly collaborative works are not written by committee, in this way.  Anyone who tries to replicate the success of Wikipedia, for example, by using committees just has not got the concept of strong collaboration.


Instead, strong collaboration involves a constantly changing roster of interchangeable people, and changing mainly at the whim of the participants themselves.  For the most part at least, collaborators are not pre-assigned to play special roles in the project.  There is just one main role--that of collaborator.  And anyone who shows up and fits the requirements (bear in mind that some projects have almost no requirements at all) can play that role.  Moreover, to the extent to which work is strongly collaborative, everyone has equal rights over the product.  Everyone feels equal ownership and feels equally emboldened to make changes.


The justification for this odd and perhaps frightening methodology has nothing to do with egalitarian ideology, as some have portrayed it.  At least, it need not have anything to do with egalitarian ideology.  Some have portrayed Wikipedia and open source software as "communist" because its results are free, and everyone is free to participate under roughly equal terms (if they are able, that is).  But this is silly, of course.  People of every political stripe can and do support strongly collaborative methods.  The fact that the collaborative work is free is not a result of some strictly "communist" desire for "no ownership" and "free stuff for everybody."


Instead, open source and open content stuff is free for the simple reason that an arbitrarily large number of people can and do work on it.  So who's going to own it?  Nobody can own it, or else you couldn't get that large number of people to work on it so hard and take personal responsibility for it.  Similarly, you can't get an arbitrarily large number of people to volunteer to work on something that a much smaller number of individuals own, or have special authority over.  If a work is said to be "owned" by John Doe, Jane Roe will think it's up to John Doe to call the shots.  She doesn't want to step on Doe's toes, or she might feel that she has no right to do so.  So she and others just don't have the same incentive to work together on something when she is told the content has owners.  When a work is free and ownerless, everyone enjoys more or less equal rights to contribute, and motivation to get to work, and to work together, is much stronger.


It is not anything magic about wiki software in particular that makes Wikipedia work as well as it does.  Wikipedia's success is more due to the fact that it is strongly collaborative than that it is a wiki.  Wikis and the Wikipedia model are one way to enable strong collaboration, but they are not only one way.  I think that the Wikipedia community made a mistake when it decided that it's the wiki part that explained Wikipedia's success.  They proceeded to apply the same software and content development system, which happened to work (more or less) for an encyclopedia, to develop very different kinds of projects: a dictionary, news articles, editing public domain books, writing new books from scratch, and several more things.  It seems they found they had a whopping good hammer and suddenly everything looked like a nail.  A lot of other people have got in on the act, with "wiki farms" sprouting up all over.  Even Amazon.com has recently decided to add wiki pages to their pages about books.  I predict that this experiment, like the Los Angeles Times Wikitorial experiment, will fail miserably.  The revolutionary thing about Wikipedia, the thing worth replicating, is not the fact that it's a wiki.  It's the fact that it's strongly collaborative.  And collaborations do not happen just because someone puts up a wiki.  As the zillions of dead or stillborn wikis attest, with wikis, it is not the case that if you build it, they will come.


The fact is that there is no substitute for carefully thinking through the details of a collaborative system and how it ought to work in order produce a particular kind of information.  Just appropriating the Wikipedia system will not solve all content development problems.  Different kinds of information--different content editing systems.  No one would have suggested that Second Life, which is a collaboratively-developed virtual space (like a video game that the participants work on together), should be a wiki.  A wiki is inappropriate for that because wikis aren't (strictly speaking) 3D, they're mainly text.  Why should every collaborative text project be a wiki, and share the Wikipedia model, just because it's text?  For a lot of people who love innovation, there seems to be far too much in-the-box thinking going on.  Wikis, and the very specific way that Wikipedia uses the wiki tool, are just one game.  There are many other games.  One only needs to do creative and intelligent problem-solving to think of many more; and the best "Web 2.0" innovators realize this very well.


What relatively few seem to realize is what an absolutely huge assortment of collaborative text development systems are possible.  We haven't even begun to scratch the surface.  The names of tools--"wikis," "blogs," "tagging," etc.--seem to define the creative boundaries for too many people.  Often, one has only to change a single rule, and the whole paradigm associated with a tool changes.  Take the humble mailing list (Internet forum), which can count as a collaborative content development system (just not strongly so, because people can't edit each others' posts).  The difference between moderated and unmoderated lists is well known, and everybody familiar with mailing lists has an opinion about moderation.  But it seems few people think much about other variants--other ways to get people to interact and work together.  Think of mailing lists as games.  What are all the games you could play?  What rules could you impose, that everyone playing could agree to, so that the result would be interestingly different?  Here's just one possibility.  Suppose

  1. The list moderator posts a general question, about war, free trade, global warming, or whatever, and invites people to write reasoned answers to the question.

  2. All the answers (or maybe the half-dozen the moderator thinks best) are posted at once.

  3. Anyone who wants to can write a reply to exactly one of the respondents.

  4. The moderator collates the replies so that all the replies to answer #1 are grouped in one post, all the replies to answer #2 are grouped in a second, and so forth.

  5. Finally, the initial answerers write responses to the replies, and these are posted.


It is trivially easy to imagine a zillion other "games" one could try with a mailing list.  Most of the possibilities have not even been tried; instead, when people use mailing lists, they tend to use them only in very specific, simple ways.


Why haven't the possibilities been explored in practice?  I think it's the same reason that wikis, blogs, and other collaborative software each tend to be conducted in just a few different ways: not because there are only a few good ways to use these tools and we have settled upon the best uses, but because people are generally conformists, and because games require shared understandings that must be conveyed deliberately and self-consciously.  In other words, for new social games to be played online, there has to be someone who declares, "This is how I propose that we play the game," and then enough other people, a quorum, actually have to play it that way.  Since people are such conformists, it's hard to get people to be the first to play new collaborative content games, and even harder for people to do what it takes organize new sorts of games.


My point, then, is that Wikipedia's content editing system is just one game, and there are a zillion other games one can imagine using wikis to play.  Out of so many possibilities, it would be astonishing if we hit upon the best one for creating an encyclopedia collaboratively right out of the gate; it would be even more astonishing if exactly that game were also the best one for other kinds of information, like books or dictionaries.


When, with practically endless variations on tool design and rules of operation, there is such a huge space of possibilities to explore, it becomes abundantly clear that a great deal of intelligent, careful, and above all creative thought needs to go into the design of collaborative systems.  The requirements of an ideal resource of a specific type, as well as the needs and culture of the collaborators, need to be very carefully considered.


In fact, it is best if the collaborators themselves think through together the requirements of a project.  There must be leadership, granted--that is, some way of actually arriving at a decision when one is needed, that the collaborators can view as legitimate--but a large group of people thinking creatively about many possibilities can produce more ideas, and more interesting ideas, than just one or a few people working alone.  In fact, when it comes to deciding on a particular "game" to "play," what collaborative content system to adopt, considerations of smart, creative design are not the only reason to engage the collaborators.  In addition, to have maximal participant buy-in, they must feel that they have a significant role in designing the system, i.e., that the decisions do not come from the top down.


This brings me to the next major topic: while collaborative systems should be designed with the needs and values of participants in mind, I think that a certain culture, or set of values, is necessary in order to make collaboration work.  The principle that collaborators should participate in system design is an example, but only one example.


The culture of collaboration

What makes strong collaboration work?  It's a whole set of things.  Everyone involved should understand what the project's aim is and what the rules are for getting there.  They should also feel empowered to get to work, and (if they're qualified) they should be able to work on any part, or almost any, of the project, whenever the desire arises.  Assignments are not made; participants assign themselves, and choosing one assignment does not prevent others from choosing the same assignment.  There are, of course, variations on these very general principles, but to the extent to which a project is strongly collaborative, these principles will hold true.  What is contrary to these principles are the notions of ownership and top-down assignment.  All this is now very well-understood by the open source community and Wikipedia and other such projects.  Shared ownership and self-direction are essential to the success of strong collaboration.


So I maintain that, in order to work, any system of strong collaboration requires something like egalitarianism; but not necessarily the total absence of leadership and authority.  Many people who spend much time on collaborative projects might well disagree with this, or at least anarchists or borderline anarchists would.  It turns out that this is culturally very important for understanding collaborative projects, so I need to spend some time on it.


Political and legal theorists know very well that, if we put political systems on a line, a continuum based on their commitment to equality, anarchy is at one end of the continuum, because every assertion of authority illustrates the difference, after all, between those with authority and those without it.  In a perfectly equal society--strictly impossible, no doubt--everyone would be equal in authority.  And, without often getting quite honest and explicit about it, theorists about online collaboration frequently commit themselves to something approaching anarchism.  Intellectually speaking, it's an easy position to take, because it simply amounts to consistency with an appealing ideal.  If your most fundamental, baseline principle is egalitarianism, it requires work, and more complexity, to justify adding authority to social systems, including collaborative systems, precisely because adding authority entails creating an inequality.


Why is this position so popular among collaborative content producers?  Actually, it might seem reasonable in the context of collaboration.  If shared ownership and self-direction are essential to strong collaboration, as I claim, then adding an authority to a collaborative system is to raise up a person who has some special right to control content and direct effort.  So it is easy for someone committed to very strong, or radical, collaboration to claim that a critic of collaboration does not "get it" if he suggests that there needs to be minimal oversight or control of some sort.  In this way, the culture of collaboration naturally lends itself to a relatively radical position on the question of authority.


Still, the most radical position is, of course, only one policy position, and not necessarily the one that will actually work best--that is, unless success itself is measured by the degree to which a content creation project is itself egalitarian.  There are some people who seem to believe that: it is not really the quality of the software, or the tagging system, or the encyclopedia, or the dictionary that matters.  It could be garbage, and some people really wouldn't care that much; what matters is simply that the process that produces these is maximally ownerless and self-directed.  It's the social experiment more than anything else that explains why some people are drawn to collaborative content production.


The notion that strong collaboration is valuable mainly or only as a social experiment barely warrants consideration, much less refutation.  Information (software, categorization, etc.) has its own internal requirements, and it is practically certain that those requirements cannot be met as well by a radically egalitarian system as it can by a system that contains some rules and controls and, as a practical necessity, persons in authority to enforce the rules and controls.


So far, this is only one side of the problem of making strong collaboration work well.  The other side is that the most radically collaborative system is apt to strike most traditional information producers--publishers, editors, producers, journalists, college professors, corporate software engineers, etc.--as an obvious nonstarter, and the egalitarian or anarchical ideology that appears to be behind it is bound to seem puzzling and foreign at best.  For academics, strong collaboration seems to have produced bewildering amounts of information more quickly and efficiently than they have ever seen; but the information is typically of self-evidently low quality.  Traditional information producers are intrigued by and envious of collaborative content production, but they cannot see how to make it work to produce information that meets their own requirements.


The main puzzlement that academics and (more generally) traditional information producers cannot get past are the two above-named essential features of strong collaboration: shared ownership and self-direction.  Shared ownership requires free information--which is where open source and open content enter the picture--and thus requires a radical reconsideration of business models, something that information producers are understandably loath to do.  Self-direction, or the rejection of top-down assignments, is also exactly contrary to the usual corporate methods.  For all the talk of flattened corporate cultures, self-assignment at the level illustrated by Wikipedia is apt to make managers very nervous.


When it comes to signed content, including journalistic and academic writing, matters are complicated further by the fact that academic and professional advancement seems to require clear authorship.  One impediment that keeps academics from using wikis to develop encyclopedias (notwithstanding the academics who contribute to Wikipedia) is that they do not see, in the Wikipedia sort of model, any way to get personal credit for their work.  There is a tendency, therefore, to suggest that wikis be adopted with an apparently slight variation on the Wikipedia model: that articles be put under the control of specific authors.  But this suggestion would reject a precise requirement for the Wikipedia model to work.  The suggestion is made under the impression, debunked earlier, that it is wikis themselves that somehow magically produced all that content.  This is not the case: the quantity of content is a result of the fact that the wiki software was used to pursue strong collaboration, so that everyone felt equally motivated to take responsibility for and improve content throughout the project.  Wiki software is the tool, but strong collaboration is the method; praise the method, not the tool.


But the suggestion that academics (or journalists, or corporate software engineers, etc.) work without assigned authorship is still, in 2006, apt to be dismissed out of hand.  Until they consider it more seriously, traditional information producers will continue to watch from the sidelines with mixed amazement and contempt.


I propose, though I will have to develop this thought elsewhere, that academics and journalists think about the possibility of distinguishing assignment and credit.  It is possible to be granted a byline or other credit after doing certain kinds of work.  That is the notion behind a system I have helped to design for the Encyclopedia of Earth and the Digital Universe encyclopedia: there can still be topic editors, lead authors, and contributing authors listed, but they are listed only after they have actually contributed to or taken responsibility for an article.  The assignments do not come from the top down; people do work where they feel they can help.


So to sum up, there are very different difficulties on two sides when it comes to making strong collaboration work.  On the one hand, as we have just seen, traditional information producers must learn to work together and get credit without ownership or assignment.  On the other hand, the online public who are such fans of collaboration must make room for and reconcile themselves to the notion of at least minimal authority and control.  The best collaborative projects of the future will walk the line and combine the best of both worlds.  But is that possible?


Ever since I articulated the systems and rules that ran Nupedia and Wikipedia, I have been thinking about the most difficult problem for the future of strong collaboration, namely, to combine the largely incompatible cultures of anarchical collaborationists and of traditional information producers.  Though many people from both sides will be reluctant to admit it, the best, most productive culture will combine elements of both.  When it comes to information that can be jointly authored to good effect (original scholarship and some artwork might not be in this category), if it is developed using strong collaboration, the results will be superior in terms of quantity and efficiency.  But the information needs to be reliable as well, and there is simply no substitute for the involvement of people with mature and relatively reliable judgment--for expert involvement.  And experts will not get involved in large numbers, I think, unless they are granted some degree of control and oversight.


Open source software serves as an excellent example here.  The persons leading the most successful open source projects are in general, regardless of what credentials they may or may not have, competent and reliable in making judgments about what new code should or should not get into a new release of the software.  Still, in a certain way, software is relatively easy because either it does what it is intended to do, or it does not.  This is simply not the case with text.  An encyclopedia article, tag, discussion post, dictionary entry, etc., just is, it's not a piece of code that compiles and does what it's supposed to.  So with collaboratively developed content, it's often difficult to tell whether it's substandard--because participants do not know or perhaps care about what the standards are, or in some cases they reject the notion of standards at all.  Good anarchists that they are, they are laws unto themselves.  By contrast, code is nice in that it carries its standards with it, so to speak, in large part anyway.


Sometimes Wikipedia and other such projects are called "open source," and it is said that, like real open source (software) projects, they follow the maxims, "publish, then filter" and "publish early and publish often."  After previous essays I have occasionally been accused of taking a step backward, i.e., of advocating that strongly collaborative projects "filter, then publish."  That's a step backward because it's the old content development model.  But this is not what I recommend.  In fact--and I am not the first to make this observation--as a maxim, "publish, then filter" does not really get at all relevant aspects of the dynamic.  Instead, what is true of open source software is what I recommend: "post widely, but internally among collaborators; then filter; then publish."  In other words, code is immediately published internally, among fellow coders; then the project managers decide what additions are in and what additions are out; then the result is published externally, i.e., compiled and released as a new version.  Collaborative open content development projects should work like this, but they don't.  The best of future projects will, though, or so I maintain.


On this theme I want to convey two messages.  First, speaking to the open source and open content community: I ask you to imagine if the Establishment were to use the methods and principles (including shared ownership and freedom) that you champion.  Just imagine what fantastic results would come of that.  Imagine that, and then ask yourselves what you can do, perhaps what in your processes and attitudes you can change, to help see to it traditional information producers adopt the really productive parts of your culture.  And bear in mind that they love the efficiency collaborative systems display, and they aren't in principle opposed to freedom and openness.


Second, speaking to traditional information producers (including academics): imagine a world, after a new collaborative revolution, in which massive amounts of reliable information, nothing like today's Internet, is available free for all.  Isn't that something you would want to use your influence to get behind, if it were possible?  If such incredibly useful information resources might very well be created with low overhead, then isn't it worth it, at least as an experiment, to jettison top-down assignment and individual authorship, and to explore the creative possibilities of modest business models necessary to support the modest overhead?  It may or may not make you rich; but it might well make the world rich in a way it has never been before.



Part II: How to pursue some new collaborative projects

The Collation Project

Next I'll shift gears, so to speak, and illustrate the two themes developed so far in this essay by describing four collaborative text projects that I personally will be starting, in association with the Digital Universe, in the coming days and weeks.  We will be looking for participants and so I hope you will read this as a potential participant.  The projects will, I hope, illustrate the necessity of creative thinking in project design (not just reusing old tools and methods), of strong collaboration, and of expert oversight and participation.  The latter I maintain because these projects just could not be carried out in any credible way without expert oversight.


The four projects I have in mind are: a text outlining or collation project, an analytical dictionary project, an event summary project, and a debate guide project.  I do not expect you to find these brief descriptions at all enlightening, because these are, to my knowledge, brand new kinds of reference works, and they are, all of them, projects that simply could not be carried out in any credible way except by engaging experts to lead strong collaborations.  But experts are for the most part not familiar with strong collaboration (although they are rapidly becoming more so).  That is why these references do not exist yet.  But, as I will also argue, these projects could provide dramatic, unprecedented benefits to the scholarly community and the world at large, and so I think we will be able to get the needed quorum to work on them.


Let me begin by explaining how I stumbled upon the idea of the Collation Project.  Since early 2005 I've been working with the Digital Universe Foundation, which among many other things aims to organize information taxonomically.  That must have been part of my inspiration to start a personal side-project: placing the philosophical parts of Hobbes' Leviathan into an outline.  I have gone through it paragraph by paragraph (approximately: text chunks are broken up by function, such as "argument," "explanation," "definition," "description," etc.), summarized each chunk in a sentence or less, and then put each paragraph and its summary into a topical, hierarchical outline.   There are not more than a half-dozen text chunks under a given heading; if there are any more than that, I create subheadings.  You would be amazed at the enormous variety of topics Hobbes covers in the Leviathan.  I believe I created something over 250 outline headings.


An example will help.  A small part of the outline I created for the Leviathan looked like this:



Under the heading "Desirability of anarchy" there was this text chunk:



Anarchy is undesirable because it allows people to be subjugated very easily.


And be there never so great a multitude; yet if their actions be directed according to their particular judgements, and particular appetites, they can expect thereby no defence, nor protection, neither against a common enemy, nor against the injuries of one another. For being distracted in opinions concerning the best use and application of their strength, they do not help, but hinder one another, and reduce their strength by mutual opposition to nothing: whereby they are easily, not only subdued by a very few that agree together, but also, when there is no common enemy, they make war upon each other for their particular interests. For if we could suppose a great multitude of men to consent in the observation of justice, and other laws of nature, without a common power to keep them all in awe, we might as well suppose all mankind to do the same; and then there neither would be, nor need to be, any civil government or Commonwealth at all, because there would be peace without subjection.


Hobbes, Lev XVII 4


Under the heading "Minimal States" there was this:



A very small and relatively powerless state is ineffective to remove the state of nature.


Nor is it the joining together of a small number of men that gives them this security; because in small numbers, small additions on the one side or the other make the advantage of strength so great as is sufficient to carry the victory, and therefore gives encouragement to an invasion. The multitude sufficient to confide in for our security is not determined by any certain number, but by comparison with the enemy we fear; and is then sufficient when the odds of the enemy is not of so visible and conspicuous moment to determine the event of war, as to move him to attempt.


Hobbes, Lev XVII 3


Imagine, if you can, an entire philosophical text broken apart and placed into outline in this way.  But in fact, I conceived of this as just the first text in a long series of texts to outline.  I have been planning next to put Locke's Essay Concerning Human Understanding into the same outline.  The goal has been to go through a list of some fifty of the most seminal texts in the history of philosophy and put them all into the same outline.  The result would be an unprecedented reference that would allow historians of philosophy to see texts on related topics at a glance.


Of course, I have encountered a few problems.  Now, the basic idea of this sort of text outlining, or "collation," has proven to me increasingly exciting and sound, and very worthwhile.  But one problem is that I don't think I have enough time to do a good job of this while I work full-time (and very hard) on the Digital Universe.  Another problem is that I know I am wasting enormous amounts of time by using a substandard tool--Microsoft Word's outlining feature--when what I really need is a very special sort of tool.  A final problem is that, while I would like to take exclusive credit for this, and retain total control over it, I know that what I can do by myself will not be nearly as impressive, useful, or high-quality as what many scholars can do together.


So, in some idle moments on a flight to Purdue, I asked myself: how might things go if I were to invite a lot of people to work with me, collaboratively, on the outline?  This set my mind racing.  I had already determined, through an inventory of 46 texts I wanted to include, that if I spent 10 hours per week doing outlining, I could finish the whole job in just 4.3 years.  It is easy to imagine how much more quickly the work could be done by using a strong collaboration.  If there were 20 people working on average 4 hours per week, it could be done in just a half a year.


Then I thought: wouldn't this be wonderful to do in the field of law?  Imagine case law and treatises deeply collated into a single extremely detailed outline.  This analytical index of legal topics would sort not just individual cases (something that legal references already do), but the many claims and arguments contained in each case, as well as in treatises and much else--making it dead simple to find the exact precedent and concepts the legal researcher is looking for, quickly, and with the exact text references instantly visible.  It might well become the perfect legal research tool.


Perhaps the notion of text collation could be applied to nearly ever field that works with texts.  So I asked myself: why would there have to be multiple outlines?  Couldn't it all be part of one single outline?  It would be The Book of the World, an enormous library of texts analyzed and deeply collated.  There are, to be sure, some difficulties: how would history, which deals with particular events, fit alongside philosophy or law or science, which deals with abstract or timeless concepts, principles, and laws?  How would all of this fit alongside (if at all) fiction?  And surely the most useful outline would include works that are not in the public domain.  How could they be included?  Would a project collating only public domain works be worthwhile?  (Very much so, I think.)  Plainly, there was (still is) a great deal of thinking-through to be done; but on first glance, it seemed to me that if a quorum of text collators could be found, a system (a "game," again) could be found that would produce something fantastic.


I had already looked online for outlining software for my personal use, but I had not found any that seemed suitable.  If the project were collaborative, though, clearly the specific requirements of the system needed to produce this outline would be very specialized indeed; it would be necessary to radically redesign and enhance any existing software, or else start from scratch.  I imagine a Web interface with three columns.  In the leftmost column is the outline; in the middle column is the text one is working through (such as the Leviathan); and in the right column is metadata, i.e., information about particular text chunks.  I see it working like this (though this is of course only one possibility): one selects a part of the text in the middle column; then, in the right column, up pops fields to fill in, "Function" and "Summary."  The text chunk itself is automatically copied out of the text, and the software could automatically supply a text reference, if the text were marked up properly.  Finally, the whole entry (or an icon representing it) could be dropped into the outline simply by dragging it from the rightmost column to the leftmost.  This software would nicely speed up the summarizing work that I have been doing with Word's outlining feature.  More importantly, living on the Web, it would be social software, i.e., many people could use it and work on it at the same time.


The benefits of the Collation Project

Some people see the benefits of this project instantly.  After I had this idea, I arrived at Purdue and told nearly everyone I met there about it; everyone thought it was a great idea, and many of them were quite excited about it.  But at least one skeptic I've talked to since then has wondered why something so complicated is worth the effort.  There are bound to be more such skeptics.  So let me lay it out plainly.


Legal research would be made more efficient because those preparing briefs look for relevant precedents on very specific points of law.  So a researcher need only drill down through the outline to the supporting evidence for a particular argument, and laid out without further work are direct links to the full text of case law and (public domain, at least) treatises.  That would be incredibly useful.


Many other examples are possible, but I'll give just two in the interest of brevity.


The applications for history, and especially the history of ideas, are striking.  Imagine that part of the outline takes the form of a chronology, with the top-level nodes being periods such as the Renaissance and the lowest-level nodes being days, or shorter periods.  Letters, first-hand accounts, and other original sources could be integrated deeply into the chronology, alongside whole texts.  (For example, Hobbes' Leviathan would naturally take its place as part of the chronology of 1651.)  The chronology could then be sorted or filtered in all sorts of interesting ways.  One could view just the political history of France; the original sources; the history of ideas; everything from the 17th century; and so forth.  I am sure that the comparison of relatively recent historiographies side-by-side with original source texts would be particularly interesting.  Comparative studies in the history of ideas becomes a matter of reading the contents of the outline; if one is interested, say, in the history of theories of perception in the 17th and 18th centuries, one need only consult certain parts of the outline, and one could compare side-by-side what Descartes, Locke, Hume, and Reid all had to say about the same concepts and arguments.  The writing of history would be at once much easier, and also perhaps (for some) less necessary, interesting, and relevant--it would at least have to change, because so much of what appears in historiographies would be evident from a study of the outline.


Collation of texts in the social sciences poses many interesting possibilities, although they face a problem mentioned above, namely that most of the texts one might want to put into the outline are not in the public domain.  But suppose that problem were solved somehow.  The difficulty with explanation of social phenomena is their complexity.  Theories in psychology, sociology, etc., tend to simplify the complexity, emphasizing one factor over others, so it would no doubt be particularly interesting to have all the various proposed explanations of a given phenomenon, e.g., murder rates, schizophrenia, or the collapse of civilizations.  Textbooks already do such comparisons, of course, but what they do not do is place the arguments from the original sources in all their glory side-by-side.  And, as scholars are fond of saying, the devil is in the details.  An exhaustive outline would speak the truth and shame the devil--expose the devilish details of complex phenomena and explanations.


In general, the Collation Project would deeply integrate large numbers of texts, that is, place together related pieces of text.  This would have far-reaching effects precisely because the close comparison of related texts is inherently enlightening.  Doing this on a mass scale has revolutionary potential in virtually every field.


Next I would like to point out four broader advantages, and considering these advantages, it will become clear, I hope, that the project proposed really is revolutionary.


The outline would mean the end of academic provincialism: the comments of various theorists, from different disciplines, all appearing in the same place would make it abundantly clear what interdisciplinary reading needs to be done.  The artificial lines between disciplines come crashing down when one sees exactly how others, in other fields, are commenting on one's pet topic.  If contemporary papers are properly collated into the outline, conference organizers can see better who to invite for a specific purpose--not just the people who happen to work in the organizer's own field, and who publish in the same journals in which the organizer publishes.  And so forth.


Even more significantly, the outline would create a worldwide scholarly conversation, and this for a perhaps unexpected reason.  Assume that a canonical text, in its original language, is marked up in a precise way, so that translations can be marked up in exactly the same way.  Then scholars from around the world can look at the same outline in their own language.  (See the internationalization discussion for more.)  Now consider that one obvious filter option would be this: if a text is not translated to your language, then display it in its original, or in another one that you know.  Suppose as a scholar you "live" in a certain area of the outline, and some untranslated (say) Polish texts appear in that area; then you have an excellent reason to get that text translated, and when you do, suddenly you and others like you are joined to a global conversation that simply did not exist previously.  By having a single outline for all the world's works, in all languages, collating small chunks of all texts into a single very fine-grained outline, we essentially bring researchers of all languages together--and the wisdom of humanity is explosively enriched.


Learning and education, too, would be revolutionized.  A student who is assigned to write a paper on a specific topic, using one or two main source texts, can instantly find where a specific passage is located in the outline, and then consult other texts from a wide assortment of thinkers on that precise point.  And if the outline had indexed all public domain texts the student might possibly be expected to cite--as in a history, literature, philosophy, or classics class--the student is saved trips to the library stacks.  Of course, the precise same point applies for scholars as well.  Scholarly research to find credible sources for specific facts, theories, quotes, etc., previously requiring long trips to the library, becomes nearly trivial for everyone.


The fourth and final advantage is one I especially look forward to.  It is that, with this detailed outline making precedents and theories so easily researchable and available, there is much less reason to reinvent the wheel endlessly, as academics infamously do.  Philosophers have been advancing the same theories in slightly different forms for millennia; and in the last 100 years, the recycle rate has increased exponentially.  Illuminating the precedents will, I can only hope, shame academics out of their bizarre habit of recycling old, long-ago-analyzed theories, and the committees that essentially have required this perverse behavior will find better ways to evaluate academic progress.  I look forward to the day that the outline makes it perfectly obvious that the dialectical landscape in so many fields has been thoroughly explored, exposing so many academic games as mere games that it will be hard to take them seriously anymore.


If the project here described will have such wonderfully beneficial and revolutionary effects, you might well ask, then why hasn't it been done before?


The answer is interesting and instructive.  Before the advent of personal computers, shuffling large amounts of text was usually prohibitively expensive and labor-intensive.  Mortimer J. Adler's Syntopicon and the Scriptorium of the Oxford English Dictionary project might, perhaps, be examples; but Adler's work was a pilot project at best, and a dictionary's needed entries are easily discoverable and arrangeable.  It is hard to imagine anyone exerting the effort to maintain a truly enormous outline (containing much more than Adler's Great Books), which is constantly expanding and changing, as the one envisioned here would have to be, without a computer.


But why did it not happen in the 1980s, when computers become commonplace?  One reason perhaps is that a quorum of texts had not yet been created.  The task of typing in or scanning texts was, you might recall, very far from trivial at the time.  Project Gutenberg was well under way, however.  So why not collate the texts that were already available?  The reason here is surely that few people would have understood at the time some basic principles of collaborative content production that open source software and wikis have made obvious, and the understanding of how to construct the tools to manage such a project was still relatively primitive.  How to organize the project surely would have been a major sticking point.


In fact, now is exactly the right time to start this project.  We should expect to see it starting up just now.  In recent years, Wikipedia has demonstrated to the world what can be accomplished by people editing each others' text in a highly asynchronous, distributed collaboration.  The Open Content Alliance, along with the Amazon and Google book-scanning projects, have many book-lovers salivating over the impending digitization of entire libraries.  Finally, the unreliability or irresponsibility of some collaborative content communities, like Wikipedia, the Blogosphere, and MySpace, has increasing numbers of academics and professionals interested in "taking back the Web" for more serious, scholarly purposes--which is one reason that the Digital Universe, an expert-managed general information project, is growing in popularity.  All of these developments together will help persuade some plugged-in graduate students, professors, and others that the Collation Project is feasible.  I hope a quorum will be thus persuaded.


But the challenge described above--of joining populist methods with expert oversight--is very relevant here.  Ordinary uncredentialed collaborators must resign themselves to working under the guidance of experts; and experts must resign themselves to doing work at all without getting credit for very specific pieces of authored text.  (They can get credit for the roles they play in the project, though.)


Bear in mind, too, that the Collation Project demands expert leadership.  Scholars from every field have, as it were, a conceptual web that represents conceptual and logical relationships in their field, and this is an understanding that requires a great deal of study and is not found among dilettantes.  Only people who have worked and studied a great deal in a particular field, and who are very familiar with a text, have the understanding necessary to make difficult decisions about what conceptual structures are appropriate for the text, and how points it makes are similar to and different from related points to be found in other texts.


But a number of people have said, independently, that this project has "graduate students" written all over it.  The majority of the work done, I think, will probably be done by graduate students.  They are, after all, the ones who are very carefully studying texts that are now familiar to old hands.  The knowledge one can get from closely summarizing chunks of text and putting the chunks into an outline is exactly the sort of knowledge that graduate students are after.


Three more collaborative text projects

As interesting as it is, I would have you regard the Collation Project as just an example of what is possible through expert-managed strong collaboration.  Next I want to provide, more briefly, three more ideas, and conclude by asking you to join me in building these fantastic new resources.


(1) An analytical dictionary.  The notion of a collaborative dictionary project does not strike me as very compelling, because we have been working on dictionaries for centuries, and the old-fashioned way of building them seems to have been fully successful in building them.  I note, however, that the best dictionary in English, The Oxford English Dictionary, was built collaboratively, led by experts, and with input from everyone who wanted to offer input--it just was not strongly collaborative.


What is more interesting, and this is a point I am very anxious to convey because it is so important, is that strong collaboration makes significant new variations possible that were never before possible.  What I have in mind is an enormous analytical dictionary, the entries of which are not individual words but instead concepts.  Think of those parts of dictionaries that distinguish the senses of near-synonyms: hate, abhor, detest, contemn, etc.  Imagine organizing a dictionary by grouping and distinguishing related words under general conceptual headings; and imagine arranging the headings hierarchically.  In fact, one might begin with the structure and words of Roget's Thesaurus.  So you can think of the core idea of an analytical dictionary, of this sort, as being an expanded commentary on Roget's Thesaurus.


So far, this is hardly an original idea and hardly one that absolutely requires strong collaboration.  But imagine what could be achieved if a community of lexicographers, and their students and other word-lovers, were to get together and work on, essentially, an encyclopedia of language.  They could add idioms; describe connotations, including cultural references, in great detail; provide illustrative texts, as the OED does; get deeply into the specific jargon and usages of specific fields; provide a usage guide.  In fact, this might be the ultimate way to organize a usage guide.  Traditional dictionaries describe words and distinguish senses of those words.  An analytical dictionary would, by contrast, describe concepts and distinguish how words are used to convey those concepts.  Moreover, there is no reason to think that, if analytical dictionaries in other languages were to use the same outline, translators and language scholars would then quickly be able to discover the precise word needed to express a precise concept in another language.


But why think that this could only be done using a strong collaboration?  Anyone who has read about the history of the OED knows just how logistically difficult producing a new edition of the OED is.  What I am proposing here combines the sort of information you find in the OED with idioms, information about connotations, cultural references, and a massive usage guide.  Furthermore, I propose to organize the work in an outline form, an outline that is shared across many languages.  Using a top-down assignment system to produce this work is possible, but it would, I think, be so labor-intensive that a publisher probably would not want to risk the money to support its construction.


A free project using strong collaboration, however, overseen but not directed top-down by lexicographers, would make relatively short work of an analytical dictionary.  The power of strong collaboration is such that complex information like this can appear to "organize itself," when in fact it is any number of people, working together and mutually trusting each other, making a long series of intelligent decisions about entries that they care about.  It works, as has been observed before, for the same reason that free markets efficiently produce cheap products as if by "an invisible hand": self-directing individuals are able to go where they are needed, and do what is needed, more efficiently than if the same individuals merely follow orders "from the top."  There is no reason to think that strong collaboration would not be able to produce this sort of analytical dictionary.


(2) A debate guide.  Next let's shift gears.  Imagine, if you will, a dynamic summary of a debate--a guide to the competing positions.  The closest existing and familiar type of document would be the Voter's Guide often handed out before elections, in which each candidate is given the same amount of space to express his or her position on current issues.  But this is only the roughest of approximations.  Suppose we go further and say that:



In short, imagine that there were a free resource where, for any issue of controversy, you could expect to have the leading arguments on all sides clearly and carefully explained, with links to supporting evidence.  This could be an enormously influential and useful resource for both decisionmakers and the public to have--to say nothing of its educational value.


Again, this could not happen without strong collaboration.  What corporate editorial staff, following traditional top-down methods, could produce something as enormous as what is envisioned here, with constant back-and-forth adjustment among positions, and the necessary participation of scores of thousands, or even millions of disputants?


Imagine thousands of debate guides, burnished to a fine luster, masterfully laying out the competing positions in most of the important political and policy controversies; suppose this work were guided by some of the most influential people in those controversies.  Such a resource might well become the central clearing-house for information about those societally important controversies generally--and if so, it could not fail to have deep repercussions on the larger debates themselves. 


(3) Event summaries.  One of the more obvious tasks for a group of online collaborators is the reporting of news (especially local news), such as has been organized by the South Korean news project, OhMyNews (although, unlike Wikinews, OhMyNews is not strongly collaborative).  The difficulty with citizen journalism, however, is that many of the most interesting stories require contacts with highly-placed news sources, contacts to which "citizen journalists" have no access.  When it comes to getting the information out of those sources that can be gotten, professional journalists are probably doing as well as anyone can.


Just as in the case of ordinary dictionaries, I personally doubt that strong collaboration can do what traditional journalism does better than traditional journalists do it.  But, again, one would do well to think creatively of other games to play, of what other resources could be created.  Just as strong collaboration might be better at creating analytical dictionaries than ordinary dictionaries, I think that strong collaboration might be better at creating a different sort of news resource than ordinary journalism.  I mean something I call "event summaries."


In particular, I think that people can most usefully collaborate on ongoing, updated news narratives--news "events" broadly construed, or ongoing "stories."  A very large percentage of articles in national and international news concerns ongoing events, such as the ongoing conflict in Iraq.  Articles in Wikipedia about news events are frequently narratives in this sense, and it is in this function that Wikipedia's editing process really shines.  The public has shown that it can usefully aggregate news from many sources, and the result is something that even journalists have found impressive.  Nevertheless, Wikipedia's system can be improved or, at least, supplemented.  Here are some ideas:



Of course, in the community at work on this resource, credentialed journalists who are involved should enjoy their usual professional stature in the production of high-quality reporting, which is the natural result of the education and job experience of journalists.  The result would be a rational augmentation of professional news reporting that, because developed collaboratively as part of large, coherent narratives, represents a more complete, fair, and integrated account of ongoing events.  It would combine the immediacy and reach of blogging with the expertise of journalism and science. It would also be a thoroughly modern, and revolutionary, approach to news reporting and consumption: it would help properly contextualize news stories (lack of context is surely one of the worst challenges facing modern journalism); it would invite news consumers to become contributors, by adding as-yet unreported facts and removing bias; it would remove the need for repeating the same background with each new article; and it would provide a potentially powerful new channel for subject-area experts to more directly advance the quality of journalistic output, and thus the news consumed by world citizens.



All of these projects will have some benchmark requirements for participation.  Not only will real names and identities be required, but benchmarks in terms of degrees, credentials, experience, training, etc., will be expected for participation in various aspects of the project.  For example, the director of a project to summarize a given text should be someone who can be recognized by people in his or her field as an expert on that text, and of course, the more expert, the better.  But the rank-and-file summarizers and outliners might be graduate students.  As to marking texts up (i.e., to generate text references automatically), while competence in the language of the original will be expected, no further qualifications will be necessary.  Similar appropriate qualifications, to be discussed by the community, will be required for other roles.  But we will not multiply roles unnecessarily in any case.


Let us briefly revisit the issue why the above sorts of projects were not feasible just a few years ago, but are now: there just is no way to organize in a top-down fashion very complex work that needs to be constantly revised.  For the Collation Project, and the other projects envisioned here, it simply will not work to say (as the OED did) "we need exactly this data now" or (for traditional encyclopedias and dictionaries) "we need exactly these entries from you."  The fully interdependent nature of the outline and the entries that go in the outline will change from day to day, rendering assignments outdated.  It is the complexity of the project that requires strong collaboration, i.e., a large body of people producing the same work together, each working independently, but all working together as an "organic whole."  Strong collaboration is not a method that professional content producers have been in a position to understand or appreciate until recently.  In 2006, I hope, they are, and college professors, editors, and others will want to join me and the Digital Universe Foundation in leading this unprecedented and literally revolutionary set of text projects, and I hope a large number of graduate students and others capable of similar levels of work will be ready to get to work actually putting these resources together.


A single outline of human knowledge?

I next want to make a suggestion that is distinguishable from the foregoing proposals, but which has intrinsic interest and merit of its own.


I've already suggested that both the Collation Project and the analytical dictionary both make use of large outlines.  But could these be the same outline?  And isn't it possible that the debate guides and event summaries also be placed within this outline?


First of all, at first glance, it seems that the Collation Project's outline will be much larger and more in-depth than that needed by an analytical dictionary.  The Collation Project's purpose is not to organize mere concepts and words, but instead complex propositions and arguments, and for many of these there are no corresponding concepts.  So it seems there will not be a mapping from the Collation Project's outline onto the analytical dictionary's outline.


But I have not yet thought of any reason to suppose that there cannot be a useful mapping in the opposite direction, from the dictionary to the text outline.  That is, it seems quite possible that each entry in the analytical dictionary will have a place within the text outline, and that the conceptual structure of the latter will be suitable for the former.  A glance at the outline I have made of Hobbes' Leviathan only confirms this impression.  Similarly, I do not see why there could not be a mapping of debate guide topics to the text outline.  Of course, I realize that this is hardly an argument on my part; it could be just a lack of imagination.  But I would be very interested to learn if there were some good reason to think that there can, or cannot, be one convenient central outline that could encompass information of these kinds for ready comparison.


The case of event summaries is different.  But recall that I said that part of the text outline would take the form of a chronology.  Event summaries would very nicely fall onto the end of this chronology and become continuous with the rest of history, even as history is being made.  Moreover, that is the natural place for event summaries: the difference between the news and history is just a matter of time.  Journalists are historians of very recent events.


We may find a similar insight to support the notion of a unified outline for all kinds of text.  We find that, since concepts bear relations to each other, it is possible to put the contents of texts, which deal with concepts at more or less abstract levels, not only into an outline but into the same outline.  But since language represents those concepts, it is not surprising that an encyclopedia or dictionary of the language, divided by concepts rather than individual words, could be placed within the same outline.  Again, disputes are conducted about the same topics about which people write texts; so one should expect to be able to find appropriate places where coherent representations of large swathes of a debate can be placed.


You might wonder why I stop there, and why I do not go on to explain how encyclopedia articles, books, and various other kinds of information could not be placed within the same outline.  And you would be right.  I would fully expect to be able to use the outline to organize all sorts of information.


It is worth pointing out that in one of the most important texts of the French Enlightenment, the Preliminary Discourse to the Encyclopedia of Diderot, d'Alembert proposes to organize knowledge into a "tree," as I propose to do with these projects, and as the Digital Universe is already doing:


After reviewing the different parts of our knowledge and the characteristics that distinguish them, it remains for us only to make a genealogical or encyclopedic tree that will gather the various branches of knowledge together under a single point of view and will serve to indicate their origin and their relationships to one another. …


[T]he encyclopedic arrangement of our knowledge … consists of collecting knowledge into the smallest area possible and of placing the philosopher at a vantage point, so to speak, high above this vast labyrinth, whence he can perceive the principle sciences and the arts simultaneously.  From there he can see at a glance the objects of their speculations and the operations which can be made on these objects; he can discern the general branches of human knowledge, the points that separate or unite them; and sometimes he can even glimpse the secrets that relate them to one another.  It is a kind of world map which is to show the principle countries, their position and their mutual dependence, the road that leads directly from one to the other.  (pp. 45-6, p. 47; trans. Richard N. Schwab and Walter E. Rex, University of Chicago Press, 1995).


To postmodern Enlightenment-bashers, this is apt to seem the height of naïveté.  But d'Alembert was on to something; he was solving a practical problem (the arrangement of encyclopedia articles), similar to the one I propose to take on, and he had a practical, if imperfect, solution.  If we regard the sort of outline he provided for the Encyclopédie, similar in concept to outlines of knowledge provided by Bacon and Hobbes and, much later, by Mortimer J. Adler for Encyclopaedia Britannica, not as objective representations of knowledge but instead as summaries of the dialectical landscape, i.e., of the contents of actual texts whether accurate or in error, then we might see the project of writing an outline into which very many texts are to be placed as a practical project.  In constructing the outline, difficult and virtually arbitrary decisions are probably inevitable--d'Alembert admitted as much--but that does not mean the result will not be very useful.  I can easily imagine the integration and improvement of my outline of the Leviathan with text from a few dozen other great works of philosophy being of considerable interest to historians of philosophy, even if the outline is not quite perfect.  Besides, the wonderful thing about expert collaboration is that it can be continuously improved (and that, I want coders to know, is a software requirement: the outline must never even tend to be ossified by the code).


What I am suggesting, then, is that all sorts of textual information--whole texts and parts thereof, detailed accounts of how concepts are expressed in a language, summaries of debates, historical documents, summaries of recent events, and so forth--could in theory be placed into the same outline.  The question is whether we ought to try to create this outline-based information resource, or whether it would be too much trouble.  The Collation Project alone is apt to seem amazingly daunting; adding the rest is apt to seem laughable.


Actually, extrapolating from my own rate of work so far, I think I could collate 50 major works of philosophy in about five years, working alone (see the Collation Project summary for details).  Still, I'm very aware of the ambition of the project, and so I propose that we take "baby steps" and get started with something relatively modest: five public domain works to be put into an outline.  (This was actually a suggestion from someone at Purdue.)  By the time the software is written for this project and the collation of these works is complete, we will have greatly expanded our understanding of what is possible and what is not, and we'll be able to revise our plans profitably.


This brings me finally to the role of the Digital Universe (DU) in this project.  I am employed as Director of Collaborative Projects at the Digital Universe Foundation (DUF), and I am pursuing these projects in association with and under the sponsorship of, but not under the direct supervision of, the DUF.  (More on proposed project governance in a bit.)  The DU is already developing and placing vast amounts of information into a taxonomy.  The rules the DU are now following call for a one-to-one correspondence between encyclopedia article topics and taxonomy nodes.  So it is possible that the DU's taxonomy could be used for the Collation Project outline, and vice-versa.  This matching, however, would have to be intentional: eventually, the DU taxonomy would have to adopt the outline developed for the Collation Project, or vice-versa.


One obvious suggestion is that from the beginning we make the outlining software and taxonomy-building software one and the same, and the project for outline- and taxonomy-building one and the same.  One problem with this suggestion is, until we are very far along with the Collation Project's outline, we cannot know that its proper shape will be the same as the DU taxonomy's.  But, really, the decisive problem with the suggestion is that we should not tie the hands of the Collation Project by forcing the people working on the outline to work closely with an already-established group of people, especially when the two groups of people might be working at cross-purposes.


The way forward

Here is what I propose instead.  We will organize a new community from the bottom up, through grassroots online organizing via mailing lists, blogs, and other methods.  This community will decide on and spec out the software needed for the Collation Project, bearing in mind how text projects such as the others described above might make use of the same software.  The community will pursue a pilot project with just a few texts, and, when all is ready, open development up to a very broad group of participants.  I want these strongly collaborative text projects to be very much driven by participants, but particularly by the best-qualified participants, i.e., college professors, editors, and other professionals.  But it will, at least in the foreseeable future, be a free and volunteer project.  The rapidity of the progress we will make will depend primarily on how quickly the participants can get organized, how long of a discussion we want to have about project policy, how quickly the software is adapted or, if necessary, written from scratch, and how quickly the pilot project can be completed and any necessary changes made.


Once the Collation Project and any associated projects are well along, we will approach the Digital Universe Foundation's Board of Directors to incorporate the work as part of the free offerings of the DU.  At that point, the community behind the Collation Project and associated projects will have to negotiate with the DU community to decide how and whether the outline and the taxonomy can be matched.  Possibly they will never be wholly merged, but only mutually interlinked and searchable from a single point.  Worries on this issue should not stop us from forging ahead with the Collation Project--the opportunities are too great not to use the best possible tools and methods for the job.


The opportunities both fulfilled and missed with two previous projects I helped to develop, Nupedia and Wikipedia, make me very aware of the crucial importance of getting the plan exactly right, and then creating a flexible but solid organizational framework in which work can proceed decisively.  The first step in actually building the project, therefore, is to set up mailing lists and immediately kick off an open and  wide-ranging discussion of the project.  For a strongly collaborative project like this, it is necessary for collaborators to be involved in project design.


As a way to get a project started, this is apt to seem bizarre to many academics and "traditional information producers" (as I called them above).  I do not propose to have a steering committee; I do not (at first) propose to make strategic partnerships with one or two organizations; I do not propose to limit who may and may not show up to the party.  Won't this mean that I will end up with a bunch of amateur volunteers?  Not necessarily, first of all; I recruited over 100 Ph.D.-level people to work on Nupedia in nine months, and I think I and others who are interested in this project might be able to recruit many more than that in a much shorter time.  The point is that we will be able to use people at nearly all levels.


But I also propose to make all project mailing lists, accessible via textop.org, moderated and led by various different well-qualified people.  The discussion will be open, but it will also be serious and mature.  Trolls, as soon as they are identified as such, will be out on their ear.  Mediocre thinkers who attempt to dominate discussions will be instructed to tone it down or leave. There might even be an upper limit to the number of messages postable per day on project mailing lists, to make sure that everyone can properly digest important material that is coming over the transom; this will also give moderators an opportunity to select the most important posts and thereby keep the level of discussion high.  We may even, if there is sufficient energy and discipline to do this, schedule different topics for discussion, rather than try to talk about everything at once.  In this way, what I hope to build is something completely unfamiliar to most people: an open, high-quality meritocracy.


Different mailing lists will, later, serve as workgroups that are empowered to do what they need to do for the project.  For example, there might be a recruitment group that has its own document management system for purposes of hosting recruitment materials; there might be a design group that develops the website look and feel; and so forth.  An open discussion will always be the first step, however.


At the same time, I will solicit membership in an Advisory Committee, which will advise but not direct the project.  This will consist of distinguished scholars, editors, business people, and others who have an interest in providing high-level advice on the direction of the project.  The Board of Advisors will have its own mailing list, and we might have a conference call from time to time (with minutes published publicly).  The Board's brief will be to act as a distinguished, publicly-visible, independent check on the direction of the project.


What will not exist is a secret coterie of influential players who hand down project policy from on high.  Rather, all project work will be expected to be done out in the open either via mailing lists, other public communication systems such as blogs, or using the software itself.  I personally do not want to get distracted by back-room deals and in-depth discussions with individuals and individual organizations.  It is more important by far that I spend my time actually working out plans and policies with the people who are responsible for ensuring that they're in the best possible shape.  So, if you want to get involved, while I would always be happy to have a private message from you, and I would be happy to chat with your organization when you're first getting involved, the way for you actually to get involved is very simple: join the mailing lists about things you're knowledge about and interested in.  And, while we will most strongly encourage the most qualified people, who will be given authority according to their ability and time, anyone will be able to join the lists (as long as they haven't been kicked off for being disruptive).


My own role will be as much facilitator and organizer as leader.  I will start off discussions as necessary, but I will be highly interested in reading and responding to the best remarks that appear.  Participants should not wait for me to articulate every policy.  When organizing Nupedia and Wikipedia, as well as other projects I've been associated with, I led by integrating, and that is what I hope to do here.  Here's the procedure: I present a well-developed plan, and then invite people to comment on it for as long as necessary, to present alternative plans, to debate and elaborate all details, and so forth.  Next, I edit or rewrite the plan, trying my best to take into consideration all reasonable remarks; then, when it seems there is as close to a consensus as the most credible discussants can arrive at, I make an executive decision that such-and-such is the plan we will pursue.  Then software requirements will be written and coders get to work.


The names of moderators and of Board of Advisors members will be posted on textop.org and identified as the project's leadership.  At some point--once the project is well under way, and we are familiar with each other--I will appoint a Board of Directors, based strictly on merit and time for the project.  After that, the Board will select its own members.  At the same time--again, once the project is well under way--my own position will become either elected or selected, depending on what the Board of Directors decides.


Whether or not we do this, or adopt other elements of a governance framework, what the governance framework is will be (1) clearly articulated, (2) enforceable, i.e., those who do not wish to play by project rules will be ejected, (3) meritocratic first, and then democratic, but generally open.  In particular, we will debate and adopt a community charter that does our considered best to solve the familiar problems endemic in other online communities, Wikipedia being only one but an excellent example.  One thing I will insist on in any case is that all participants use their own real names, identities, and credentials.  Other Internet projects and features do and perhaps should permit anonymity; this one will not.


According to the vision I have for the project, it will be possible, in the end, to read the entire outline and its contents in dozens of languages.  That said, I strongly suspect that English will have to be designated the central language of the outline--something I am sorry to have to say, because it is bound to be unpopular--but there must be a common language if there is to be a single outline for all languages.  This means that, while works in languages other than English might be marked up and summarized in the language of the original, the master copy of the outline, and discussion of the shape of the outline, must be done in English.  Of course, the outline and changes to it would be continuously translated to various languages, and there would certainly have to be a robust subproject devoted to that task.  But in saying that there must be a common language of outline work and that it must be English, I know that I am virtually guaranteeing that, if the outline project is successful at all, there will be competing projects in other languages.  I hope this will not be necessary, though, because this project will make a concerted effort to incorporate works in all languages, and mailing lists devoted to major online languages will all be among the first mailing lists to launch.


Although this will be obvious to the open source crowd, I should add that the software we will use will from the start be open source software (OSS), and we will launch an OSS development mailing list with the others.  Moreover, of course the outline, the summaries, and other data that the project produces will be open content (all contributions will be considered donations to the DUF and released en masse under Creative Commons Attribution-ShareAlike license).


I suppose the conclusion of this manifesto would be the place to predict the impact of the success of the Collation Project, and the Analytical Dictionary, Debate Guide, and Event Summary projects.  But that would, I think, be a little tedious, because so uncertain.  What seems clear is that the future of publishing, education, and society as a whole would never be quite the same.  In what ways they might change are matters I will reserve for another time, or leave to others.  Besides, I think that the free availability of these resources themselves would prove to be revolutionary.


More interesting to consider, because a little more certain, is the potential impact of the success of these projects on individual minds.  Quite possibly the greatest works of humanity will be created by people educated by and immersed in this great outline.  Endlessly repetitive research all having been done once for all, it will be up to well-educated, creative minds to use the Olympian perspective afforded by the outline to articulate new syntheses that, we can only hope, will enlighten humanity as the encyclopédistes dreamt of doing.  Or perhaps what we will find is that the perspective the outline affords will itself do the job of enlightenment--that there are few new syntheses that will appear that will be more consequential than scholars and students simply seeing related texts deeply analyzed and set side-by-side.  In any event, it will certainly be much harder for the truth to hide from view.

Back to home page