Senior Project Advisor
Brian Hutchinson
Document Type
Project
Publication Date
Spring 2023
Keywords
Large Language Models, LLM, LLMs, Economics, ML Evaluation, ML, AI
Abstract
This paper describes a novel dataset, EconQA, constructed to assess the performance of large language models within multiple choice economics questions. I present results from 10 experiments, varying prompts and model choices. Results challenge previous findings that prompt choice makes a large impact on quality of response. Using the GPT 3.5 Turbo model, observed performance levels ranged from 70-77% for all prompt choices, with the no prompt baseline scoring 73%. When prompted to use Chain-of-Thought reasoning with examples, performance was highest at 76%. Contrary to previous research, performance on mathematical questions when prompted with Chain-of-Thought was high. This paper closes with an analysis of the types of questions the models performed best on and common errors.
Department
Computer Science
Recommended Citation
Van Patten, Tate, "Evaluating Domain Specific LLM Performance Within Economics Using the Novel EconQA Dataset" (2023). WWU Honors College Senior Projects. 657.
https://cedar.wwu.edu/wwu_honors/657
Subjects - Topical (LCSH)
Natural language processing (Computer science); Artificial intelligence; Text data mining; Economics
Type
Text
Rights
Copying of this document in whole or in part is allowable only for scholarly purposes. It is understood, however, that any copying or publication of this document for commercial purposes, or for financial gain, shall not be allowed without the author’s written permission.
Language
English
Format
application/pdf