Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML). A new GLO Discussion Paper by Stephen Meisenbacher & GLO Fellow Peter Norlander.

A new GLO Discussion Paper demonstrates that CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text.

GLO Discussion Paper No. 1214, 2022

Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML) – Download PDF
by Meisenbacher, Stephen & Norlander, Peter

GLO Fellow Peter Norlander

Author Abstract: Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, and expert classification of any documents with any scheme. To demonstrate this process for building data from text with Machine Learning, we publish open-source resources: the software, a new public document corpus, and a replicable analysis to build an interpretable classifier of suspected “no poach” clauses in franchise documents.

Featured image: Mika-Baumeister-on-Unsplash

PUBLISHED
Vol. 36, Issue 1, January 2023: Journal of Population Economics (JOPE) 16 articles. https://link.springer.com/journal/148/volumes-and-issues/36-1
Watch the videos of article presentations on December 1, 2022 during the GLO Global Conference 2022.

JOPE has CiteScore 6.5 (2021, LINK) & Impact Factor 4.7 (2021, LINK)

GLO Discussion Papers are research and policy papers of the GLO Network which are widely circulated to encourage discussion. Provided in cooperation with EconStor, a service of the ZBW – Leibniz Information Centre for Economics, GLO Discussion Papers are among others listed in RePEc (see IDEAS,  EconPapers)Complete list of all GLO DPs – downloadable for free.

The Global Labor Organization (GLO) is an independent, non-partisan and non-governmental organization that functions as an international network and virtual platform to stimulate global research, debate and collaboration.

Ends;