THE UNIVERSITY of EDINBURGH

DEGREE REGULATIONS & PROGRAMMES OF STUDY 2024/2025

Timetable information in the Course Catalogue may be subject to change.

University Homepage
DRPS Homepage
DRPS Search
DRPS Contact
DRPS : Course Catalogue : School of Economics : Economics

Postgraduate Course: Machine Learning, Big Data and Text Analysis for Economists (ECNM11094)

Course Outline
SchoolSchool of Economics CollegeCollege of Arts, Humanities and Social Sciences
Credit level (Normal year taken)SCQF Level 11 (Postgraduate) AvailabilityAvailable to all students
SCQF Credits10 ECTS Credits5
SummaryThis course provides an introduction to Machine Learning, Big Data and Text Analysis tools, with an emphasis on their applications in Economics. In the first part of the course, students will learn about the most popular methods for classification and prediction, as well as tools for working with large datasets. The second part of the course focuses on practical tools for working with text and string variables, including statistical text analysis methods. The course is delivered through a series of lectures that include in-class Python examples.
Course description This course provides an introduction to the main methods in Machine Learning, Big Data and Text Analysis and it discusses some empirical applications in the Economic literature.

This course combines lectures, in-class examples, and exercises to teach students how to implement machine learning methods with actual data. Examples and studies from the economics literature are used to illustrate some of the topics. While no previous knowledge of machine learning or experience with Python is required, experience with data and programming in statistical packages like Stata is highly recommended. The course aims to provide essential elements required to implement theoretical algorithms using real data but does not provide a deep training in Python.

The first part of the course gives an overview of the most popular methods for classification and prediction and discusses some tools for working with large datasets. Prediction and classification are the two most important tasks that ML can perform. The empirical literature in Economics is increasingly relying on these tasks with two purposes. First, to understand features in the data. Second, to analyse and organize large datasets in order to generate new information in a semi-automated way, a process also known as Data Mining. This new and user-ready information extracted from large and/or unstructured datasets (such as images or text corpuses) is used later to answering economic research questions.

The second part of the course covers a number of practical tools and methods for working with text data and string variables. Text has rapidly become a major source of information for Economic studies, but traditional statistical methods are unfeasible to analyse this type of data. The typical size of these databases and the complex dependencies across the elements in the text (e.g. across words) require that analysis are performed using Machine Learning methods.
Entry Requirements (not applicable to Visiting Students)
Pre-requisites Co-requisites
Prohibited Combinations Other requirements Students should be registered for MSc Mathematical Economics and Econometrics. All other students must email sgpe@ed.ac.uk in advance to request permission.
Information for Visiting Students
Pre-requisitesNone
High Demand Course? Yes
Course Delivery Information
Academic year 2024/25, Available to all students (SV1) Quota:  None
Course Start Block 4 (Sem 2)
Timetable Timetable
Learning and Teaching activities (Further Info) Total Hours: 100 ( Lecture Hours 18, Programme Level Learning and Teaching Hours 2, Directed Learning and Independent Learning Hours 80 )
Assessment (Further Info) Written Exam 80 %, Coursework 20 %, Practical Exam 0 %
Additional Information (Assessment) There is an in-person exam worth 80% of the course marks. The exam is based on the material covered during the lectures and required readings.

The remaining 20% of the marks corresponds to a coursework activity (computer exercise) that students need to submit.
Feedback Feedback on all coursework assessment must be provided within 16 calendar days.
No Exam Information
Learning Outcomes
On completion of this course, the student will be able to:
  1. Implement the main algorithms for classification and prediction tasks
  2. Use text data and perform statistical analysis on it
  3. Organise, process and create large datasets based on semi-structured information
  4. Understand how Machine Learning methods have been applied into the Economic literature
Reading List
The main reference of the course will be the slides used in the lectures and the material used for the in-class activities. Students can get a deeper understanding of some models and techniques covered in the course from the references below.

Textbooks on Statistical Learning:

Hastie, T., Tibshirani, R. & Friedman, J. (2009). ¿The Elements of Statistical Learning: Data Mining, Inference, and Prediction¿, 2nd Edition, Springer, New York. [Essential Reading]

James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). ¿An Introduction to Statistical Learning with Applications in R¿, Springer, New York. Free available from Editor Site [Essential Reading]

Chollet, F. (2017). ¿Deep Learning with Python¿. Manning. ISBN: 9781617294433. [Recommended Reference]

Other references and articles that may be used during the course (this list can be updated during the course, depending on the material covered):

Bandiera, O., Hansen, S., Prat, A., and Sadun, R. (2017) ¿CEO Behavior and Firm Performance.¿ NBER Working Paper 23248.

Baker, S., Bloom, N., and Davis. S., (2016), ¿Measuring Economic Policy Uncertainty.¿ Quarterly Journal of Economics 131(4): 1593¿636.

Bird, S., Klein, E., and Loper, E. (2009) ¿Natural Language Processing with Python.¿ O¿Reilly, ISBN: 978-0-596-51649-9.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Chernozhukov, Victor, et al. ¿Double machine learning for treatment and causal parameters.¿ (working paper)

Duchi, J., Hazan, E., and Singer, Y., (2011), ¿Adaptive Subgradient Methods for Online Learning and Stochastic Optimization¿, Journal of Machine Learning Research 12 (2011) 2121-2159.

Fawcett, T., (2016), ¿An introduction to ROC analysis.¿, Pattern Recognition Letters, Volume 27, Issue 8. 861-874, ISSN 0167-8655,

Gentzkow, M., Kelly, B., Taddy, M. (2019) ¿Text as Data¿, Journal of Economic Literature 2019, 57(3), 535¿574

Gentzkow, M. and Shapiro, J.M. (2010), ¿What Drives Media Slant? Evidence From U.S. Daily Newspapers.¿, Econometrica, 78: 35-71. doi:10.3982/ECTA7195

Gilchrist, D.S. Sands, E. G. (2016). ¿Something to Talk About: Social Spillovers in Movie Consumption¿, Journal of Political Economy. vol. 24(105), pp. 1339-1382.

Freund, Y. (1995) ¿Boosting a weak learning algorithm by majority.¿, Information and Computation 121(2), 256¿285.

Friedman, J. (2002), ¿Stochastic gradient boosting.¿, Computational Statistics & Data Analysis 38(4).

Friedman, J. and Schapire, R. (1997), ¿A decision-theoretic generalization of online learning and an application to boosting.¿ Journal of Computer and System Sciences 55(1), 119¿139.

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, Larry (2009). "Detecting influenza epidemics using search engine query data". Nature 457 (7232): 1012¿1014.

Goldberg, Y. 2016. ¿A Primer on Neural Network Models for Natural Language Processing.¿ Journal of Artificial Intelligence Research 57(1), 345¿420.

Hansen, S., McMahon, M. & Prat, A. (2018) ¿Transparency and Deliberation within the FOMC: A Computational Linguistics Approach.¿ Quarterly Journal of Economics 133(2): 801¿70.

Lazer, D., Kennedy, R., King, G., and Vespignani, A., (2014), ¿The Parable of Google Flu: Traps in Big Data Analysis.¿ Science 343(6176), 1203¿05.

Le, Q. and Mikolov, T. (2014), ¿Distributed Representations of Sentences and Documents.¿ Proceedings of Machine Learning Research 32, 1188¿96.

Manning, C., Raghavan, P., and Schütze, H. (2008), Introduction to Information Retrieval, Cambridge: Cambridge University Press.

Mayr, A., Binder, H., Gefeller, O., and Schmid, M. (2014) ¿The Evolution of Boosting Algorithms. From Machine Learning to Statistical Modelling.¿ Methods Inf Med 2014; 53(6), 419-427.

Mikolov, T., Sutskever, I.,Chen, K., Corrado, G., and Dean, J., (2013), ¿Distributed Representations of Words and Phrases and Their Compositionality.¿ In Proceedings of the 26th International Conference on Neural Information Processing Systems, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3111¿19. Red Hook: Curran Associates. Morin, Frederic, and Yoshua Bengio. 2005.

Mullainathan, S., Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106.

Porter, M., (1980), ¿An Algorithm for Suffix Stripping.¿ Program 14(3): 130¿37.

Stephens-Davidowitz, S., (2014), ¿The Cost of Racial Animus on a Black Candidate: Evidence Using Google Search Data.¿ Journal of Public Economics 118(C): 26¿40.

Stock, J. and Trebbi, F. (2003), ¿Who Invented Instrumental Variable Regression?¿, Journal of Economic Perspectives 17(3), 177¿194.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014) ¿Dropout: A Simple Way to Prevent Neural Networks from Overfitting¿ Journal of Machine Learning Research 15, 1929-1958.

Taddy, M. (2013). ¿Measuring Political Sentiment on Twitter: Factor Optimal Design for Multinomial Inverse Regression.¿ Technometrics 55 (4), 415¿25.

Taddy, M. (2013). ¿Multinomial Inverse Regression for Text Analysis.¿ Journal of the American Statistical Association 108(503), 755¿70.

Taddy, M. (2012). ¿On Estimation and Selection for Topic Models.¿ In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 1184¿93. New York: Association for Computing Machinery.

Tibshirani, R. (1996). ¿Regression Shrinkage and Selection via the Lasso.¿ Journal of the Royal Statistical Society, Series B (Methodological) 58(1): 267¿88.

Varian, H. R. (2014). Big data: New tricks for econometrics. The Journal of Economic Perspectives, 28(2), 3-27.
Additional Information
Course URL www.sgpe.ac.uk
Graduate Attributes and Skills Not entered
KeywordsNot entered
Contacts
Course organiserDr Diego Battiston
Tel:
Email: Diego.Battiston@ed.ac.uk
Course secretary
Navigation
Help & Information
Home
Introduction
Glossary
Search DPTs and Courses
Regulations
Regulations
Degree Programmes
Introduction
Browse DPTs
Courses
Introduction
Humanities and Social Science
Science and Engineering
Medicine and Veterinary Medicine
Other Information
Combined Course Timetable
Prospectuses
Important Information