In an evolving digital-first era, Qatar National Library is transforming the way that Arab and Islamic history can be studied. By diligently preserving and digitizing rare Arabic-language books and materials, treasured photos and artefacts, the Library is paving the way for future generations to access and explore the rich cultural knowledge and insights embedded within these sources.
Best-in-class expertise and technology
At the heart of the Library’s massive endeavor lies the state-of-the-art Digitization Center, where a team of highly-trained experts employ cutting-edge technology, hardware, and software to overcome the unique challenges associated with digitizing Arabic resources.
The Digitization Center’s remarkable achievements in manuscript scanning and digitization since its establishment in 2015 have caught global attention. The center has made it possible to read, share, annotate, download, and print over 15 million digitized images – including books, research datasets, manuscripts, newspapers, maps, archival documents, photos, slides, and posters sourced from the Library’s Heritage Collection and other partner institutions.
These digitized materials are housed on the QNL Digital Repository – at https://ediscovery.qnl.qa/. Spanning more than 11 languages, Arabic content accounts for nearly 75%, making this a repository of immense cultural wealth for researchers.
The complexities of digitizing Arabic materials
“Arabic is a language of immense beauty and complexity, and the unique attributes of the Arabic script present a formidable challenge when it comes to digitization,” asserts Hany Abdellatif, Head of Digitization Services at Qatar National Library. “Digitizing Arabic content demands a deep understanding of the language’s intricacies and a commitment to preserving its essence and the integrity of the content.”
Some of the language’s unique characteristics, which digitization tools and platforms struggle to handle, are the shape, size and position variation of characters, extensive use of dotting, and cursive characters, among others. The Library’s substantial investments in advanced technology and software capable of transferring these complexities into the digital realm are yielding excellent results against the highest international standards.
The center’s laboratory and studio feature sophisticated digital and photography equipment, with state-of-the-art scanners capable of capturing images with exceptional clarity and detail.
The team uses advanced optical character recognition (OCR) technologies to copy, use, and index Arabic text from scanned images. OCR enriches the digitized images by converting them into searchable text, an aspect that makes the QNL Digital Repository instrumental for research work.
An innovative OCR-based solution
Although Arabic is the second-most widely used alphabet writing system globally, Arabic text recognition has lagged behind due to the challenges it poses for OCR developers, which in turn limits the availability of digital Arabic content. This makes the Library’s achievements in Arabic OCR all the more noteworthy.
OCR spans multiple areas, including image processing, machine learning, information retrieval and artificial intelligence. To achieve the highest possible accuracy of the extracted text, the team has mastered multiple techniques and algorithms.
Hany Abdellatif explains: “We have built an accurate yet scalable system using appropriate tools and algorithms that analyze the layout and structure of the content. By identifying distinct sections, paragraphs, images, and other elements, we developed comprehensive machine learning libraries reaching the highest accuracy of Arabic printed text recognition. Our process of Arabic OCR stands out in its ability to recognize intricate characters, fonts, and layouts, guaranteeing a smooth and uninterrupted conversion procedure. We also devised a streamlined workflow to orchestrate the entire digitization process and improve image quality.”
The results of this image enhancement process can be seen in the remarkable clarity of historical images on the repository, such as a photo of a scene in Bethlehem, Palestine, taken more than 130 years ago.
Once combined with the trained machine learning libraries, the center’s OCR technologies achieve 99 percent accuracy in Arabic. From there, the content-search feature enables users to efficiently explore the QNL Digital Repository, which has descriptive, technical, and textual information for each item.
Beyond the technical endeavor: transforming research work
The Library’s commitment to the digital preservation of unique Arabic-language materials benefits almost every scholarly field. It is also part of a global shift in how historical research is being carried out in the digital age. Lessons drawn from the pandemic underscore how digitization enables research continuity even in the face of limited physical access to heritage collections. Digital archives help cut down the laborious and costly aspects of research and creates new ways for researchers to engage with libraries and their immense resources.
In tandem, digitization enhances public access to historical materials, safeguards fragile documents, and may reduce physical storage requirements.
The wider impact
Qatar National Library through the Digitization Center is taking a leading role in preserving the culture and rich history of Qatar, the Gulf, and the Arab and Islamic worlds.
“Our dedication to preserving Arab heritage extends beyond Qatar National Library’s own collections,” states Nasser Al Ansari, Director of IT Operations and Infrastructure. “We actively collaborate with local and international institutions to digitize and safeguard Arabic texts that have been unearthed in the region. In doing so, we ensure that these invaluable resources are accessible to scholars, researchers, and students around the world.”
One such international collaboration is a joint project with New York University (NYU), to apply the OCR system to digitize more than 11,000 Arabic books in the NYU library collections.
The Library also provides OCR support to the Doha Historical Dictionary of Arabic project. The outcomes will help researchers working on the origin and meaning of Arabic words. An agreement with the Museum of Islamic Art includes a project to digitize 232 of the rarest books and manuscripts in the museum as well as its library collections.
“Preserving and digitizing Arabic and Islamic content is not merely a technical endeavor; it is an act of safeguarding our collective memory and identity,” concludes Nasser Al Ansari. “Arabic texts offer profound insights into our social fabric, political views, educational systems, and religious traditions. By digitizing these texts and making them more discoverable online, we contribute to enriching global scholarship, helping to meet the research and educational needs of people wherever they are in the world, and fostering a deeper appreciation for our cultural heritage.”