|
Can we use automated processes like web scraping and LLMs to speed up the process of collecting biograpihcal data on sitting politicians? In this project, student researchers will evaluate the outputs of three different semi- or fully-automated processes for extracting OCCUPATION descriptions from unstructured web text. If two or three processes agree, the student will deem those processes to be correct and rank them in terms of precision (e.g., model 1 produced correct more precise results, model 3 produced correct less precise results, model 2 produced incorrect results). If there is not clear consensus, the student will independently conduct online searches for the politician in order to determine which models/processes produced the best and worst outputs. There are about 7,000 legislators -- I am hoping each student will process about 800 legislators worth of results in December and January (approximately 25-35 total hours of work). This process must be conducted by human evaluators and cannot be automated in any way. Students must work a minimum of five hours per week and attend weekly check-in meetings. |