Large Scale Digital Conversion Case Study

Large Scale Digital Conversion Case Study


A very large and prestigious Western Canadian university had been considering the digital conversion of over thirty thousand thesis documents stored as contributed, hardcover books. The collection covered seven decades and numbered approximately five million pages. There were many reasons to consider such a conversion. The collection consumed a considerable amount of physical room, space that could easily be repurposed. The paper was aging well, but it would not last forever in its native state, and most importantly, thesis documents contributed more recently in digital format were proving to be a very valuable research tool. Could access to the historical thesis documents also provide such scholarly communication? The answer was yes, but many factors had to be taken into account.

With thirty thousand bound books to process, a streamlined conversion protocol would need to be established. Could the books be scanned without removing the bindings? If so, what type of scan device would be required? If the bindings were to be removed, how could it be done without sacrificing written words on pages? What about scanning five million pages? That’s a significant amount of work. Should it be done in house, or could an outside service bureau be used? After scanning, which OCR product would provide the best data extraction results? Again, would it be better to perform this work in house, or farm it out? How long would the project take?

In early 2008, our firm was contacted to consult on this project, as we had a very good long-term relationship with the university. After familiarizing ourselves with the scope and goals of this task, we examined each portion of the workflow to determine what would make the best sense from the standpoints of productivity and cost effectiveness.

We quickly identified the binding issue as a critical part of this assignment. While we have equipment that can capture extremely high quality, high resolution images from bound books, the time required to process individual pages in this manner would inflate the costs of image capture to an unrealistic extreme.

We had lots of experience in accurately slicing binding from bound books so that the valuable pages inside were intact, with healthy margins. We acquired an ancient guillotine from a commercial printing operation in the early 1980’s just for this purpose.


With this decision made, we turned our attention to the most timely way to scan approximately five million pages. MCS has always been very partial to Fujitsu scanners, and this was an excellent opportunity to test their newest model at the time, the fi-5900. It was fairly fast at 120 pages per hour, featured advanced double feed detection, and was mated to the latest version of Kofax VRS (Virtual ReScan, a technology that would assist us in obtaining the best possible image the first time, significantly reducing rescans). We ordered a fi-5900 and aggressively evaluated it for a week using actual client documents. The results were very encouraging.


Our workflow model was binding removal, document preparation (very little was required), scanning, quality control (second pass scans as required), indexing (author name and thesis title) and OCR.

Our results through the scanner evaluation indicated that a scan resolution of 300 DPI (dots per inch) allowed us to achieve the best data extraction results through the OCR (optical character recognition) process. We have used a variety of OCR engines throughout the years, but we settled on Abbyy Recognition Server, as its speed and accuracy eclipsed all the other products we tested.

There were a few ancillary details to iron out. The university needed to ensure that the scanned images would be cleansed of any personal author details (other than name) in respect to Freedom of Information and Personal Privacy legislation. We have developed a fairly comprehensive set of image-related tools over the years for a variety of clients. One of these was a redaction utility that could be used to completely obscure any details (such as address or signature) that were deemed to be sensitive.

We needed to arrive at a very competitive, all-inclusive price to accomplish the tasks required for this conversion, binding removal, scanning, data extraction and redaction. The component parts of the pricing formula were the equipment and technology investments, staff labour, software licensing and project duration.

Finally, we had to find a method whereby the images and corresponding full text data for each thesis could be transferred to the university. In this case, external USB hard drives were swapped back and forth.

We came to terms with the university in late 2008 and began a small pilot project as a proof of concept. Approximately 500 volumes were converted end to end and uploaded to the university research website. The feedback was very strong and grant monies were obtained to begin the task of processing the balance of the material.

The university had limited resources to monitor and upload the material we were creating, so it was determined that a project duration of roughly two-and-a-half years would allow for the correct utilization of their staff.

The project proceeded as planned and was a complete success. Today, students around the world can access these theses over the internet as an invaluable research tool.

Craig Hollingum has been in the Document Imaging business for well over half of his life. He has been involved in Micro Com Systems Ltd. on an evolutionary path as an employee/partner/sole owner since 1982




Published by valentine belonwu


  1. g · June 22, 2020

    Asking questions are really nice thing if you are not understanding something completely, except this article gives fastidious
    understanding yet.

  2. · June 24, 2020

    Cbd oil that works 2020
    Today, I went to the beachfront with my children. I found
    a sea shell and gave it to my 4 year old daughter and
    said “You can hear the ocean if you put this to your ear.” She placed the shell to her ear and screamed.
    There was a hermit crab inside and it pinched
    her ear. She never wants to go back! LoL I know this is entirely off topic but
    I had to tell someone! best rated cbd oil cbd oil that works 2020

  3. · June 26, 2020

    My family members every time say that I am wasting my time
    here at net, except I know I am getting experience everyday by reading such fastidious posts.

  4. · June 26, 2020

    Do you have a spam issue on this site; I also am a blogger, and I was curious about your situation; many of us have created some
    nice methods and we are looking to trade solutions with other folks, be sure
    to shoot me an email if interested.

  5. cbd oil that works 2020 · June 29, 2020

    Hi, I do think this is an excellent site.
    I stumbledupon it 😉 I will revisit yet again since I bookmarked
    it. Money and freedom is the best way to change, may you be rich
    and continue to guide others.

  6. web hosting reviews · July 16, 2020

    It’s genuinely very difficult in this busy life to
    listen news on Television, therefore I just use web for that purpose, and get the most up-to-date news.

  7. best hosting · July 17, 2020

    Hi there everyone, it’s my first visit at this website, and post is actually fruitful for
    me, keep up posting such articles.

  8. best website hosting · July 18, 2020

    Very nice post. I simply stumbled upon your weblog and wanted
    to say that I have really loved surfing around your weblog posts.
    After all I’ll be subscribing on your rss feed
    and I hope you write again soon!

  9. web hosting company · July 25, 2020

    Hi there! I realize this is sort of off-topic but I
    needed to ask. Does operating a well-established website such as yours require a massive amount work?
    I am completely new to blogging but I do write in my journal on a daily basis.
    I’d like to start a blog so I can share my personal experience and
    thoughts online. Please let me know if you have any kind of ideas or tips for
    new aspiring blog owners. Thankyou!

  10. scott's cheap flights · July 27, 2020

    This is very interesting, You are a very skilled blogger.
    I have joined your rss feed and look forward to seeking more of your
    great post. Also, I have shared your web site
    in my social networks!

  11. web hosting · August 7, 2020

    Greetings! Very helpful advice within this post! It’s the little
    changes that make the greatest changes. Many thanks
    for sharing!

  12. adreamoftrains web hosting service · August 7, 2020

    Hey! Quick question that’s completely off topic. Do
    you know how to make your site mobile friendly?

    My site looks weird when browsing from my iphone 4. I’m trying to find a theme
    or plugin that might be able to correct this problem.
    If you have any suggestions, please share. Thanks! adreamoftrains webhosting

  13. web hosting companies · August 10, 2020

    Its not my first time to pay a quick visit this site, i am browsing this site dailly and obtain fastidious facts from here all the time.

  14. web hosting companies · August 14, 2020

    It’s going to be end of mine day, but before finish I am reading this fantastic paragraph to increase my experience.

  15. web hosting service · August 25, 2020

    Hello to all, how is all, I think every one is getting more from this
    site, and your views are good designed for new users.

  16. cheap flights · August 26, 2020

    I have read so many articles about the blogger lovers
    however this piece of writing is actually a
    nice post, keep it up. cheap flights 3aN8IMa

  17. cheap flights · August 27, 2020

    Hi there, There’s no doubt that your site might be having internet browser compatibility problems.

    Whenever I take a look at your blog in Safari, it looks fine but when opening in Internet Explorer, it has some overlapping issues.

    I merely wanted to give you a quick heads up! Aside from that, great blog!

  18. ljqgbdlvxq · August 30, 2020

    Large Scale Digital Conversion Case Study | Business Gross

  19. black mass · August 30, 2020

    I’ve been browsing on-line more than 3 hours nowadays, yet I by no
    means found any fascinating article like yours. It is pretty price sufficient for
    me. Personally, if all site owners and bloggers made good content material as you probably did, the
    net can be much more helpful than ever before.