After having existed as a student radio club since 1923, we’ve accumulated quite some history over the years. Most of it exists as written records in our diligently kept archive.
We’ve been wanting to get some overview over the archive for quite some time now. Since these documents are a bit old, and our premises can be a bit corrosive at times (especially after a reorganization of the room where the archives were kept), it was a ticking time bomb we had to deal with as soon as possible. It further represents a rich historic archive which would be a fun read for many of our members, so we thought the best way would be to just digitize it in its entirety before delivering the physical copy to the Student Society’s safer archive (and/or the national archive), and get an overview over the digital copy rather than the physical copy in our own time.
The shutdown of other activities due to covid-19 meant that this was a perfect time for working our way through the archive in its entirety, non-stop – plus that LA3WUA was getting rather tired of having to store the archive in his bedroom closet. LA3WUA and I therefore agreed not to associate ourselves with any other people, defined a rigid infection cohort between the two of us and started scanning all approximately 12 000+ documents.
A lot of these documents are old, thin, tender, have various non-standardized paper formats, and are not very resilient against rough handling. We therefore had to do it manually rather than using a scanner with a document feeder system.
The easiest way turned out to be to set up non-contact system using a DSLR camera on a tripod. This would retain constant focus distance between document and camera, and we could scan each document rather fast with minimal equipment or software intervention.
With one player being the camera handler and document feeder, and one player turning pages, taking away the previous document and putting it back on the pile, we were able to get quite a good scan rate without the process becoming too tiresome.
Even the oldest documents were surprisingly well-kept, thanks to many generations of diligent record-keepers. The archive’s existence has also been an under-broadcasted secret, keeping it safe from the inquisitive but sweaty palms of unhygienic students (these can soon look through it in the digital version). The documents from 1923-1938, 1950-1990 were all rather nice. The late 90s were worse off, and the 40s bore marks of curious students wanting to look into the pre- and post-war records.
Our scan setup yielded reasonable image quality:
The high quality means that we can easily enhance contrast using some simple batch image processing techniques. It is easily readable without such treatment, however, as seen above. Our main mantra throughout this project is that it is a best effort-type of scanning – it might not be perfect, but it is good enough.
The next step after we’ve finished this endeavor will be to go over the digital copies, add some simple content tags and try to date them. We’re planning to crowdsource this task among ARK’s members once we’ve had a small break and time to plan the process properly. It is soon exam time, which is a perfect time to trap students into procrastination tasks like these.
We’ll also do some OCR in order to further increase the searchability of the documents. Lots of nice things to get out of this. It is something we will probably get back to it in future blog posts in order to even out the work/blog post ratio, which is currently approximately 100 hours per archive-related blog post (this one).
Nightmares about endless streams of old documents will hopefully stop haunting us by the end of April if we have any luck. Happy Easter!