Project Gutenberg (PG) is a
volunteer effort to
digitize and archive
cultural works, as well as to "encourage the creation and distribution of
eBooks." It was founded in 1971 by American writer
Michael S. Hart and is the oldest
digital library. Most of the items in its collection are the full texts of
books or individual stories in the
public domain. All files can be accessed for free under an
open format layout, available on almost any computer. As of 3 October 2015[update], Project Gutenberg had reached 50,000 items in its collection of free eBooks.
The releases are available in
plain text as well as other formats, such as
Plucker wherever possible. Most releases are in the
English language, but many non-English works are also available. There are multiple affiliated projects that provide additional content, including region- and language-specific works. Project Gutenberg is closely affiliated with
Distributed Proofreaders, an Internet-based community for proofreading scanned texts.
Project Gutenberg is named after the inventor
Johannes Gutenberg, whose works in developing printing technology led to an increase in the mass availability of books and other text.
Michael S. Hart began Project Gutenberg in 1971 with the digitization of the
United States Declaration of Independence. Hart, a student at the
University of Illinois, obtained access to a
Xerox Sigma Vmainframe computer in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of computer time; its value at that time has since been variously estimated at $100,000 or $100,000,000. Hart explained he wanted to "give back" this gift by doing something one could consider to be of great value. His initial goal was to make the 10,000 most consulted books available to the public at little or no charge by the end of the 20th century.
On July 4, 1971, after being inspired by a free printed copy of the U.S. Declaration of Independence, he decided to type the text into a computer, and to transmit it to other users on the computer network.
This particular computer was one of the 15
ARPANET, the computer network that would become the
Internet. Hart believed one day the general public would be able to access computers and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg
e-text. He named the project for
Johannes Gutenberg, the fifteenth century German printer who propelled the
movable typeprinting press revolution.
Italian volunteer Pietro Di Miceli developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, contributing to the project's popularity.
Hart died on 6 September 2011 at his home in Urbana, Illinois, at the age of 64.
CD and DVD project
In August 2003, Project Gutenberg created a
CD containing approximately 600 of the "best" e-books from the collection. The CD is available for download as an
ISO image. When users are unable to download the CD, they can request to have a copy sent to them, free of charge.
In December 2003, a
DVD was created containing nearly 10,000 items. At the time, this represented almost the entire collection. In early 2004, the DVD also became available by mail.
In July 2007, a new edition of the DVD was released containing over 17,000 books, and in April 2010, a dual-layer DVD was released, containing nearly 30,000 items.
The majority of the DVDs, and all of the CDs mailed by the project, were recorded on recordable media by volunteers. However, the new dual layer DVDs were manufactured, as it proved more economical than having volunteers burn them. As of October 2010[update], the project has mailed approximately 40,000 discs. As of 2017, the delivery of free CDs has been discontinued, though the ISO image is still available for download.
Scope of collection
As of August 2015[update], Project Gutenberg claimed over 70,000 items in its collection, with an average of over 50 new
e-books being added each week. These are primarily works of
literature from the
Western cultural tradition. In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has
reference works and issues of periodicals. The Project Gutenberg collection also has a few non-text items such as audio files and music-notation files.
Most releases are in English, but there are also significant numbers in many other languages. As of April 2016[update], the non-English languages most represented are: French, German, Finnish, Dutch, Italian, and Portuguese.
Whenever possible, Gutenberg releases are available in
plain text, mainly using
US-ASCIIcharacter encoding but frequently extended to
ISO-8859-1 (needed to represent accented characters in French and
Scharfes s in German, for example). Besides being copyright-free, the requirement for a
character set) text version of the release had been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believed it was the format most likely to be readable in the extended future. Out of necessity, this criterion has had to be extended further for the sizable collection of texts in East Asian languages such as Chinese and Japanese now in the collection, where
UTF-8 is used instead.
Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is
HTML, which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be easier to read. But some formats that are not easily editable, such as
PDF, are generally not considered to fit with the goals of Project Gutenberg. Also Project Gutenberg has two options for master formats that can be submitted (from which all other files are generated): customized versions of the
Text Encoding Initiative standard (since 2005) and
reStructuredText (since 2011).
Beginning in 2009, the Project Gutenberg catalog began offering auto-generated alternate file formats, including HTML (when not already provided),
Michael Hart said in 2004, "The mission of Project Gutenberg is simple: 'To encourage the creation and distribution of ebooks'". His goal was "to provide as many e-books in as many formats as possible for the entire world to read in as many languages as possible". Likewise, a project slogan is to "break down the bars of ignorance and illiteracy", because its volunteers aim to continue spreading public
literacy and appreciation for the literary heritage just as
public libraries began to do in the late 19th century.
Project Gutenberg is intentionally decentralized; there is no selection policy dictating what texts to add. Instead, individual volunteers work on what they are interested in, or have available. The Project Gutenberg collection is intended to preserve items for the long term, so they cannot be lost by any one localized accident. In an effort to ensure this, the entire collection is backed-up regularly and
mirrored on servers in many different locations.
Project Gutenberg is careful to verify the status of its ebooks according to
United States copyright law. Material is added to the Project Gutenberg archive only after it has received a copyright clearance, and records of these clearances are saved for future reference. Project Gutenberg does not claim new copyright on titles it publishes. Instead, it encourages their free reproduction and distribution.
Most books in the Project Gutenberg collection are distributed as
public domain under United States copyright law. There are also a few copyrighted texts, such as those of
science fiction author
Cory Doctorow, that Project Gutenberg distributes with permission. These are subject to further restrictions as specified by the copyright holder, although they generally tend to be licensed under
"Project Gutenberg" is a
trademark of the organization, and the mark cannot be used in commercial or modified redistributions of public domain texts from the project. There is no legal impediment to the reselling of works in the public domain if all references to Project Gutenberg are removed, but Gutenberg contributors have questioned the appropriateness of directly and commercially reusing content that has been formatted by volunteers. There have been instances of books being stripped of attribution to the project and sold for profit in the
Kindle Store and other booksellers, one being the 1906 book Fox Trapping.
The website was not accessible within
Germany, as a result of a court order from
S. Fischer Verlag regarding the works of
Thomas Mann and
Alfred Döblin. Although they were in the public domain in the United States, the German court (Frankfurt am Main Regional Court) recognized the infringement of copyrights still active in Germany, and asserted that the Project Gutenberg website was under German jurisdiction because it hosts content in the German language and is accessible in Germany. This judgment was confirmed by the Frankfurt Court of Appeal on 30 April 2019 (11 U 27/18). The Frankfurt Court of Appeal has not given permission for a further appeal to the Federal Court of Justice (Bundesgerichtshof), however, an application for permission to appeal has been filed with the Federal Court of Justice. As of 4 October 2020 that application was still pending (Federal Court of Justice I ZR 97/19). According to Project Gutenberg Literary Archive Foundation, "In October 2021, the parties reached a settlement agreement. Under the terms of the agreement, Project Gutenberg eBooks by the three authors will be blocked from Germany until their German copyright expires. Under the terms of the settlement, the all-Germany block is no longer in place. Other terms of the settlement are confidential."
The website has been blocked in
Italy since May 2020.
This section needs to be updated. Please help update this article to reflect recent events or newly available information.(December 2019)
The text files use the format of
plain text encoded in
UTF-8 and are typically wrapped at 65–70 characters, with paragraphs separated by a double line break. In recent decades, the resulting appearance and the lack of a markup possibility have often been perceived as bland and as a drawback of this format. Project Gutenberg attempts to address this by making many texts available in HTML, ePub, and PDF versions as well. HTML versions of older texts are autogenerated versions. Another not-for-profit project,
Standard Ebooks, aims to address these issues with its collection of public domain titles that are formatted and styled. It corrects issues related to design and typography.
In December 1994, Project Gutenberg was criticized by the
Text Encoding Initiative for failing to include documentation or discussion of the decisions unavoidable in preparing a text, or in some cases, not documenting which of several (conflicting) versions of a text has been the one digitized.
The selection of works (and editions) available has been determined by popularity, ease of scanning, being out of copyright, and other factors; this would be difficult to avoid in any crowd-sourced project.
In March 2004, an initiative was begun by Michael Hart and John S. Guagliardo to provide low-cost intellectual properties. The initial name for this project was Project Gutenberg 2 (PG II), which created controversy among PG volunteers because of the re-use of the project's trademarked name for a commercial venture.
Project Gutenberg Consortia Center specializes in collections of collections. These do not have the editorial oversight or consistent formatting of the main Project Gutenberg. Thematic collections, as well as numerous languages, are featured. This is sponsored by worldlibrary.net, which hosts self.gutenberg.org, a self-publishing portal.
Distributed Proofreaders: In 2000, Charles Franks founded
Distributed Proofreaders (DP), which allowed the proofreading of scanned texts to be distributed among many volunteers over the Internet. This effort increased the number and variety of texts being added to Project Gutenberg, as well as making it easier for new volunteers to start contributing. DP became officially affiliated with Project Gutenberg in 2002. As of 2018[update], the 36,000+ DP-contributed books comprised almost two-thirds of the nearly 70,000 books in Project Gutenberg.
All sister projects are independent organizations that share the same ideals and have been given permission to use the Project Gutenberg trademark. They often have a particular national or linguistic focus.
Project Gutenberg of the Philippines aims to "make as many books available to as many people as possible, with a special focus on the Philippines and Philippine languages".
Project Gutenberg Russia (Rutenberg) aims to collect public domain books in Slavic languages, particularly in Russian. The discussion of the project and its legal side began in April 2012. The word Rutenberg is a combination of words "Russia" and "Gutenberg".
Project Gutenberg Self Publishing Portal also known as Project Gutenberg Self-Publishing Press, by the Project Gutenberg Consortia Center Unlike the Gutenberg Project itself, Project Gutenberg Self-Publishing allows submission of texts never published before, including self-published ebooks. Launched in 2012, also owns the "gutenberg.us" domain.
Project Gutenberg of Taiwan seeks to archive copyright free books with a special focus on Taiwan in English, Mandarin and Taiwan-based languages. It is a special project of Forumosa.com
Hart, Michael S.
United States Declaration of Independence by United States. Project Gutenberg.
Archived from the original on 26 January 2007. Retrieved 17 February 2007. "The Declaration of Independence of the United States of America by Thomas Jefferson" is the bold heading of the linked webpage twelve years later (6 June 2019). No author but Jefferson is identified, nor is Hart otherwise named. Officially this is Project Gutenberg Ebook #1 (assigned December 1993?), or the current index to multiple formats of the same.
What Ebook #1 actually contains is heavily annotated re-release of the first two e-texts that were released in December 1971 (as by Michael S. Hart?). For more information, open the HTML format, for instance, and search for "December" or "Michael".
gutindex-2006Archived 13 November 2012 at the
Wayback Machine, there were 1,653 new Project Gutenberg items posted in the first 33 weeks of 2006. This averages out to 50.09 per week. This does not include additions to affiliated projects.
^For a listing of the categorized books, see:
"Category:Bookshelf". Project Gutenberg. 28 April 2007.
Archived from the original on 11 July 2007. Retrieved 18 August 2007.
^Various Project Gutenberg FAQs allude to this. See, for example:
"File Formats FAQ".
Archived from the original on 2 November 2012. Retrieved 2 November 2012. You can view or edit ASCII text using just about every text editor or viewer in the world. [...] Unicode is steadily gaining ground, with at least some support in every major operating system, but we're nowhere near the point where everyone can just open a text based on Unicode and read and edit it.