Table of Contents

## Introduction

As a student in the Eastern University MS in Data Science program, I’ve been taking courses in Python, R, statistics and databases for the last 9 months. I’m half way through the program now, and thought it would be helpful to potential students to provide more information on what each course in the program (that I’ve taken so far) is about.

This is a continuation of my REVIEW: Eastern University Master of Science in Data Science 2021. I expect to make another post once I’ve gotten further into the program.

### Courses I’ve Completed:

- DTSC-520 Fundamentals of Data Science
- DTSC-550 Introduction to Statistical Modeling
- DTSC-660 Data Analytics in R
- DTSC-575 Principles of Python Programming
- DTSC-660 Data and Database Management with SQL

### Upcoming courses:

- DTSC-670 Foundations of Machine Learning Models (starts August 30)
- DTSC-680 Applied Machine Learning
- DTSC-600 Information Visualization
- DTSC-690 Data Science Capstone: Ethical and Philosophical Issues in Data Science
- DTSC-691 Data Science Capstone: Applied Data Science

## DTSC-520 Fundamentals of Data Science

This course is the first one in the program, so it covers a lot of material. It’s an introduction to Python and the Anaconda distribution, numpy, pandas, matplotlib, seaborn, and then the general principles of data science.

This class is a whirlwind. There are optional coding assignments that you can complete but it is mostly theory based. You’ll be asked things on exams like what a specific slice of a string is, or how many times a loop will repeat, etc., but you won’t have to submit detailed coding assignments.

This course has 4 exams currently, each one worth 25% of your grade.

## DTSC-550 Introduction to Statistical Modeling

This course is, again, a theory course but this one focuses on statistics and R. It goes right from the basics of measures of central tendency into variance, covariance, standard deviation, hypothesis testing (T-Tests, Z-Tests, and ANOVA.)

It also discusses parametric and nonparametric statistical testing and includes optional labs in R. I actually skipped the labs, which was a mistake (!) because it made 650, the course that came after, quite a bit harder than it needed to be.

This course includes 5 exams, each worth 20% of your grade.

## DTSC-650 Data Analytics in R

This course is an introduction to R. It continues the statistical education but focuses on applying all of the concepts you learned in 550 with the R programming language. While 550 briefly touches on linear and logistic regression, 650 goes into depth with how to perform these in R and how to interpret the results.

650 also adds other components like using the AIC to assess model fit, how to interpret R-squared and how to use the Bonferroni correction to adjust p-values.

This course, when I took it, included 60% exams and 40% CodeGrade coding assignments.

CodeGrade assignments are repeatable coding assignments, where you’re given a dataset and then asked to answer questions on it.

For example, one of the datasets included a series of orders from a pizza place, and you might be asked, “Find the average number of orders delivered by Lisa on Fridays” or “Write a regression predicting whether a customer got wine based on their total order price and day of the week.” These assignments were really enjoyable and helped solidify my knowledge of R, but they ended up taking me a long time because I never used R until then.

There were 8 CodeGrade assignments making up 30% of the grade, and 4 exams making up 60% of the grade. My average CodeGrade assignment was 40 lines of code.

The last 10% of the grade was a large final project. This was also done in R, and it used a real dataset: https://www.kaggle.com/cdc/behavioral-risk-factor-surveillance-system.

We had to answer a variety of questions and also do our own analysis (exploratory data analysis and regressions, etc.) It was a lot of work but also a lot of fun. My assignment was about 400 lines of R, and I cut down some of it by writing a function to calculate some specific summary statistics I wanted for the variables I had selected.

## DTSC-575 Principles of Python Programming

This course is an introduction to Python course. It reviews what you did in 520 and adds on object-oriented programming. The first module is basically a review of the Python from 520, but it adds some information on list comprehensions.

Module 2 goes over strings, string formatting (also from 520), conditionals, the walrus operator, loops (with the addition of the break and continue commands.)

Module 3 goes over how to create functions including giving them arguments and parameters, decorators and exceptions and how to use lambdas which are little self-contained one line programs.

Module 4 goes over object-oriented programming, how to create objects and classes and make them parent/childs of each other, which is called inheritance.

Module 5 is called “odds and ends” and it goes over how to do statistics in Python, including how to use the scipy package, and how to do different tests from 550 and 650 in Python including ANOVA, t-test and linear regression.

This course was surprisingly short but packed a lot of material in. There are 24 small CodeGrade assignments. In contrast to 650 where there were 8 assignments averaging 40 lines of code each (320 lines in total), this course had 24 small assignments that were under 10 lines of code each (240 lines in total.)

I did need to look up the quadratic formula to answer one of these questions, but otherwise it was pretty straightforward.

## DTSC-660 Data and Database Management with SQL

I completed 660 and 575 at the same time. In retrospect, that was a bad idea. This course turned out to have 20 hours of video, 5 exams and 4 assignments!

The first two modules focus on the basics of database design. This is a lot of theory and mostly involves just drilling the definitions and trying to understand how they all fit together.

The first assignment involves designing an entity-relationship (ER) diagram for a fictional business. Assignment 2 is designing a relational schema for a fictional business and answering some questions about primary and foreign keys, among others.

Modules 3-6 were 1000% better than Modules 1 and 2. Starting in Module 3, the professor walks through PostgresSQL syntax and shows you how to achieve different tasks. This is a comprehensive course (as the 20 hours of video indicate) – you will be well-versed in SQL when you are finished.

Assignment 3 is to write a short SQL query, worth 3%.

Assignment 4 is pretty big. It involves writing a number of SQL queries, some procedures, functions and triggers, all in PostgresSQL. Assignment 4 is worth 20% of the grade.

Finally, the last module, Module 6, is on Git and Github. I was really happy to see this module because I wanted to create a Github and start posting my contributions. I still need to do some more project work (and get it looking nice – right now it’s just used as a repository for work-in-progress code.)

For DTSC-660, all the assignments total up to 44%. The 5 quizzes are worth 56%. There are 6 modules, so one module does not have a quiz, since it has Assignment 4 in it.

## Conclusion

I have really enjoyed this program. I am super excited for DTSC-670, which is the Foundations of Machine Learning course. I’ve already gotten to Chapter 4 in the textbook and hope to get up to Chapter 5 by the time the course starts (it actually goes up to Chapter 7 in the book.)