Thesis - Interactive Labeling of Brands / Python
Updated: 29 Jan 2021
Bachelor’s or Master’s Thesis with the goal to design and develop an interactive labeling system for identifying brands in advertisements from scanned newspaper archives. WHO CAN APPLY? Only enrolled students from KIT (Karlsruher Institut für Technologie) with course of studies Wirtschaftsinformatik, Wirtschaftsingenieurwesen, Informationswirtschaft, or Technische Volkswirtschaftslehre.
As the digitization of the worlds libraries and print archives continues steadily, the demand for automated processing of such documents grows. Hereby, researchers and practitioners would like to digitally process such documents with tools from computer vision (CV) and optical character recognition (OCR). Further they would like to search and filter for certain document meta-data. However, all of this presumes the availability of such extracted features and meta-data. As state-of-the-art machine learning (ML) classifiers still do not reach desired accuracy levels, especially on old documents or those from fringe contexts, manual labeling effort is required.
For the scope of this thesis, we limit the context to identifying the brands in advertisements from scanned pages of newspapers and magazines. This poses an interesting use-case for, for instance, advertising researchers. Associated colleagues at the University of Mannheim (UniMA) have already roughly extracted the brands of advertisements in the US magazine "The Economist", ranging from the 1840s to today. Hereby they used OCR to arrive at a simple representation of the advertising brand. We expect a thesis student to develop an interactive labeling system in order to support the extension of this brand identification to arrive at a cleaner representation. Interactive labeling hereby strives to combine automatic steps (e.g. the trained model) with incremental user input. The work-packages entail:
- analyzing the state-of-the-art of such instance identification tools (potentially by conducting a structured literature review)
- exchange with the researchers at UniMA regarding their needs and requirements
- development of an interactive labeling system as part of a design science research process
- writing a thesis document according to research group requirements & participation in our thesis colloquium
We expect the student to be familiar with web development. The system should be developed with a modern web application frontend framework or be forked from an existing open source labeling system. Further we expect the backend to be based on standard Python frameworks. Experience in this regard is required as well.
If you are interested in this topic and want to apply for this thesis, please apply via Campusjäger.
30 - 40 hours per week hours per week