Project Runeberg's front page section for December 2012:
Twenty years December 1992-December 2012
Project Runeberg started in the evening of December 13, 1992, which
means we are now entering our 21st year. During 2012 we have reached
several milestones:
Our 3,000th scanned volume was uploaded on November 27.
Our 1,000,000th scanned page was uploaded on November 22.
Our 250,000th page was proofread on November 20.
Our 500,000th page was indexed on May 31.
During December we converted all books scanned before 2005 from
ISO 8859-1 (Latin-1) to the UTF-8 (Unicode) character set. Books
scanned after 2005 already use the new standard, which allows a mix of
Greek, Cyrillic, Chinese and other characters.
During the year, we recruited new volunteers in scanning and
bought a faster scanner, which allowed our collection to grow much
faster. One third of our 1,000,000 pages were uploaded during 2012.
Our Facebook
page remained popular and active, and attracted new fans,
currently 1800 in all.
Our website remained popular with between 1 and 2 million page
views per month. As a comparison, this is roughly 1 percent of the
web traffic to the Swedish language Wikipedia.
There are also many things we didn't accomplish:
The speed of proofreading didn't increase, but remained at circa
30,000 pages per year. That's a good speed, but doesn't correspond to
our increased scanning speed.
We failed to secure funding for expanding the project. It is
still hard to explain why digitizing books is beneficial to society,
or why digitized books or newspapers should also be made easily and
openly available.
Unicode conversion
An 8 minute
video explains in Swedish how and why we're converting to Unicode.
If you have a problem to view the video, perhaps
the
version on Facebook works better.
One fifth of our collection or some 200,000 scanned pages was
uploaded before 2005 and its text was encoded in ISO 8859-1 (Latin-1),
an international 8-bit standard defined in the 1980s for letters of
the north European languages. It can represent Danish æøå, French àçë,
German äöüß, Swedish åäö, and Icelandic þð, but not Greek, Cyrillic,
Hebrew, Arabic, and Chinese letters or other special characters.
More recently, all texts have been uploaded in UTF-8 (Unicode, ISO
10646), an international standard capable of representing many
thousands of characters from virtually all languages of the earth.
During December 2012, we have been converting texts from the old
standard to the new one. The video (above) illustrates this. Another
example is seen in our preface to Fänrik Ståls
sägner, one of our first e-texts from 1993.