When digitizing journals and books, we use different scanners based on the quality and size of the volumes.

Robotic scanner for the digitization of particularly valuable individual copies.

The digitization of particularly valuable individual copies happens with the so called robotic scanner. In order to avoid any damages of the volumes, the robotic scanner allows to open the book for only 90 degrees. Two high resolution cameras ensure producing high-quality images at up to 400 DPI in TIFF or JPG format. The clamping prism is very gentle, the paging is typically done in semiautomatic mode.

Robotic scanner for valuable individual copies

Book scanner for large sized documents

We digitize large sized documents (larger than A3) with a book scanner. The size of the page can be even up to A2. Scanning illumination and the scanning itself result a very good picture quality. With our book scanner it is not necessary to open the volumes up to 180 degrees that ensures a gentle handling.

Book scanner for large sized documents

Document scanners for fast and high-quality results

The most efficient, fastest and best-quality digitization is processed with document scanners. These models are able to scan both sides of an A3 + (up to 30.7 cm width) document at high speed and in high quality at the same time. The roller system and scanning technology of the scanners are extremely gentle, so we can handle poor quality, fragmented, torn or even strongly acidic pages with great safety. Often we encounter very long documents even up to 1meter length. With our professional devices we handle them without any problem. Output formats can be edited flexibly from 200 DPI black and white scanning to 600 DPI uncompressed TIFF format.

Document scanners for speed and quality

We often encounter particularly large size documents as attachments (maps, sheets, art reproductions). We process them with large-format scanners, that have a gentle roller system and non-destructive illumination.

Text recognition (OCR)

The next step in the processing of printed documents is the so called text recognition (OCR), where text will be transferred into the image. The efficiency and accuracy of today's softwares are very high. The text recognition of a 19th century print is at 98-99%, in case of high-quality prints it can reach 99.5% accuracy. The result of automatic text recognition is the so-called double-layer PDF, where the top layer is the scanned image and the lower layer is the text itself. With the help of that methodology the user see the authentic image while the search takes place on the text.

Example of text recognition (OCR).

Double-layered PDFs for fast and detailed searching

We insert bookmarks into the double-layered PDFs, which can be the title, author, date, year or title of a book chapter. The result is a standard double layer PDF, which is suitable for publishing on the Internet.

For publishing double-layered PDFs we use self-developed software that enables sophisticated high-speed and full-text search, browsing between search words, displaying and highlighting results. During the search users can use logical (AND, OR, NOT) and proximity operators (two or more words to be next to each other) as well, even truncation of a search word from the right or from the left, or inside the word is possible. For presenting these PDF pages we have developed an own program which is also able to highlight the results, scale and download pages.

Arcanum’s manufacturing technology and device system is appropriate for digitization and text recognition of any type, size and quality of documents and for publishing the double-layer PDFs online through a fast and sophisticated search and display system.

Try it here

Millions of pages of scientific journals, encyclopaedias, weekly and daily newspapers.

Let's see
Try it here

Millions of pages of scientific journals, encyclopaedias, weekly and daily newspapers.

Let's see

Arcanum logo

Arcanum is an online publisher that creates massive structured databases of digitized cultural contents.

The Company Contact Press room