Improving text classification with Boolean retrieval for rare categories: A case study identifying firearm violence conversations in the Crisis Text Line database

· · · · · · · · ·
· RTI Press
Ebook
18
Pages
Eligible
Ratings and reviews aren’t verified  Learn More

About this ebook

Advancements in machine learning and natural language processing have made text classification increasingly attractive for information retrieval. However, developing text classifiers is challenging when no prior labeled data are available for a rare category of interest. Finding instances of the rare class using a uniform random sample can be inefficient and costly due to the rare category’s low base rate. This work presents an approach that combines the strengths of text classification and Boolean retrieval to help learn rare concepts of interest. As a motivating example, we use the task of finding conversations that reference firearm injury or violence in the Crisis Text Line database. Identifying rare categories, like firearm injury or violence, can improve crisis lines' abilities to support people with firearm-related crises or provide appropriate resources. Our approach outperforms a set of iteratively refined Boolean queries and results in a recall of 0.91 on a test set generated from a process independent of our study. Our results suggest that text classification with Boolean retrieval initialization can be effective for finding rare categories of interest and improve on the precision of using Boolean retrieval alone.

About the author

MS, is a senior research data scientist and program manager in RTI International's Center for Data Science and AI.

PhD, is the director of RTI International’s Mental Health, Risk & Resilience Research Program

MPH, is a research public health analyst in RTI International’s Transformative Research Unit for Equity.

PhD, is a research clinical psychologist in RTI International's Mental Health, Risk & Resilience Research Program.

BA, is a public health analyst in RTI International's Substance Use, Prevention, Evaluation & Research Program.

BA, is a public health analyst in RTI International's Substance Use, Prevention, Evaluation & Research Program.

BS, is a public health analyst in RTI International's Community Safety and Wellness Program.

PhD, is a research public health analyst in RTI International's Mental Health, Risk & Resilience Research Program.

MS, is a senior research data scientist in RTI International's Center for Data Science and AI.

MS, is a research data scientist in RTI International's Center for Data Science and AI.

Rate this ebook

Tell us what you think.

Reading information

Smartphones and tablets
Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.
Laptops and computers
You can listen to audiobooks purchased on Google Play using your computer's web browser.
eReaders and other devices
To read on e-ink devices like Kobo eReaders, you'll need to download a file and transfer it to your device. Follow the detailed Help Center instructions to transfer the files to supported eReaders.