Researchers of Digital Humanities (DH) at the Indian Institute of Technology Jodhpur are developing a framework for creating software – Comic-to-Video Network (C2VNet) to convert born-digital, or digitized comic books, to video. Growing digitization has increased the need to digitize our country’s cultural heritage. This framework revolves around creating an audio-video storybook.
DH at IIT Jodhpur focuses on and contributes to, a composite of approaches (ideas and methods), rather than different approaches, that lay emphasis on preserving, reconstructing, transmitting, and interpreting human record historically and contemporaneously. This, in major ways, attends to epistemological questions on knowledge production about generating digital data from material objects, and rethinking of existing processes of knowledge production. Methods and methodologies that can create the desired multimedia content have grown as a result of advances in technology. One such instance is “Automatic image synthesis”, which has gained a lot of attention among researchers. In contrast, audio-video scene synthesis, such as that based on document images, remains challenging and under researched. This field of DH lacks sustained analysis of multimodality in automatic content synthesis and its growing impact on digital scholarship in the humanities. The C2VNet is a step towards bridging this gap.
The C2VNet evolves panel-by-panel in a comic strip and eventually produces a full-length video (with audio) of a digitized or born-digital storybook. The goal was to design and develop software that takes a born-digital or digitized comic book as input and produces an audiovisual animated movie from it. Along with the software, IIT Jodhpur researchers have proposed a dataset titled “IMCDB: Indian Mythological Comic Dataset of Digitized Indian Comic Storybook” in the English language. This has complete annotations for panels, binary masks of the text balloon, and text files for each speech balloon and narration box within a panel and plans to make the dataset publicly available.
Dr. Chiranjoy Chattopadhyay, Assistant Professor, Department of Computer Science and Engineering, IIT Jodhpur, said that the panel extraction model C2VNet has two internal networks to support the video creation. “CPENet developed by the team gives over 97% accuracy, and the speech balloon segmentation model SBSNet gives 98% accuracy with fewer parameters. Both have outperformed state-of-art models. C2VNet is the first step towards the big future of automatic multimedia creation of comic books to bring new comic reading experiences.”
This is a one-of-a-kind study to discuss the automation of creating audio-visual content from scanned document images. In the future, the team is working towards improving the software so that these multimedia books become more immersive and engaging for the target audience. Usually, this kind of work takes more time and effort, but with this software, it can be done quickly and in a more interactive way.