Dustin K MacDonald

Menu
  • Home
  • About
  • Economic and Community Development
  • Nonprofit Management
    • Counselling and Service Delivery
    • Suicide Prevention / Crisis Intervention
  • Politics and Governance
  • Math and Statistics
  • Salesforce
Menu

Review of UpLevel Data Science 2020

Posted on November 27, 2020November 27, 2020 by Dustin

Table of Contents

Introduction

Data Science and data analysis have been an interest of mine for a while. Combining statistics, computer programming and domain knowledge into one applied field ticks a lot of boxes of interest for me. I learn best in a structured environment though, and I’d been looking for something to help me build my skills.

I previously earned my Data Analyst with DataCamp certificate, but I found that DataCamp provided too much scaffolding. When I returned to Python a few months after finishing it, I found I had forgotten a lot of the syntax and even basic methods – because DataCamp had provided 80% of the sample code and had me fill in just small bits, even during the advanced parts of the course.

I recently stumbled upon UpLevel, which promised a different approach.

What is UpLevel?

UpLevel is a data science course provider similar to DataCamp, DataQuest, and numerous boot camps: you complete projects based on verified datasets, and in doing so build your skills.

But, UpLevel has a major difference: they don’t give you scaffolding. Instead, the course provides you with pseudocode descriptions of the tasks you need to accomplish and then has you refer to reference documents to figure out the syntax. If you get stuck, there are references along the way.

Getting Started

After reviewing the material on UpLevel’s website I was pretty excited. (Note: UpLevel appears very new, and the data science projects part of the company was so hard to find when I returned to it to write this article I had to get it from my email. Make sure you bookmark it.) The other main components are a recruitment arm for companies and a data science blog.

This was an opportunity for me to continue building my data science skills in a way that I felt would fix the major gap in my skills so far.

UpLevel has a sale going on right now, so while each course is normally $30, and a monthly subscription is a slight discount at $25, I got a major deal at $14.99 for a monthly subscription. Each month I get a code to access a new course, of which there are currently 17 – including one just available for pre-order in the time I subscribed and the time I wrote this article.

UpLevel Projects

Because I already have a working knowledge of Python and some basic data science principles, I opted for an Intermediate project. The one that really caught my eye was Identifying Mental Health Factors and Predicting Depression. As I spent years volunteering and then working for a crisis line, I’ve always been intruiged by the possibility of analytics to identify people at risk.

About 30 minutes after I finished subscribing and my payment was submitted, I was emailed a discount code. In order to get my lesson I went through the checkout for the lesson I wanted but entered that code, which deducted 100% of the price for me. Simple and elegant. After my checkout, the materials were immediately available to download.

Getting Into the Lessons

I already had Anaconda3 installed on my computer (which includes Spyder, a Python IDE, Jupyter, and several other tools I’m not familiar with.) This is important, because UpLevel courses come in the form of Jupyter notebooks.

If you don’t have this software already, you’ll need it.

The course materials after I unzipped them

These notebook files are used by professional data scientists as well as hobbyists and can contain entire data projects inside them, so that they can be easily emailed, uploaded to Github or otherwise moved around.

Part I. Data Cleaning

The first Jupyter notebook (Part I), looks like this:

Expanding on the brief project scenario presented on the website is a more “applied” work-like focus one:

Project Scenario

The notebook’s first steps recommend opening up the questionnaires and provided the actual study that the data come from, nice tough! Opening up the questionnaire, I immediately recognized the questions. The depression questions are part of the PHQ-9, a standardized depression and suicide screener that I used at Morneau Shepell. Even though the students are Japanese, the study chosen was a good one because the tools they’re using are standard.

Continuing through the notebook, are the pseudocode pieces that I talked about previously:

Steps 2, 3, and 4 in Part I

I worked through them fairly quickly because I was familiar with these pieces, mostly stopping just to refresh myself on the specific syntax or steps (how do I use an iloc to slice data again?)

Part II. EDA and Hypothesis Testing

I finished Part I in perhaps 90 minutes, so I decided to get started on Part II. At the end of Part I I had imported the dataframe into Python, dropped rows that were part of the data set originally (but included in error), imputed missing values in one of the remaining columns with the median and then exported the dataframe back to CSV format in order to confirm that it all looked right.

Part II involved splitting the dataframe into two: one for the numerical values and the other for the categorical (string) data and then plotting them as histograms and countplots and looking at the correlations between them.

Example of a histogram
Example of a countplot

Part II is more challenging because you’re requested to make a boxplot (for example) but not shown the “successful” plot. I didn’t realize that I was selecting the wrong dataframe except by accident. If I was provided with screenshots of the output (instead of a black box with a question mark on it indicating where the output will be), that would significantly help.

I also had this issue with the t-test. It wasn’t obvious if I had successfully calculated the value because I wasn’t provided with the “answer” of the right p value. Providing the end results without the code used to generate them would really help.

Another note, when instructed to do a t-test they note that “If you do this right, you’ll see that the pvalue is larger than 0.05, which means the means of the two groups are the same” which is definitely not how p values work!

If our hypothesis is that there is a statistically significant difference between the level of depression in domestic versus international students, then the null hypothesis is that there is no difference.

The mean of the total depression score for international students is 8.04. The mean of the total depression score for domestic students is 8.61. If we want to know if there is a statistically significant difference, we can run a t-test.

The t-test tells us the probability of the same result occurring by chance. Usually we set p to 0.05 or 0.01 (so a 5% or 1% probability of the same results happening by chance.) If our p-value < 0.05, we reject the null hypothesis. If it’s above 0.05, we fail to reject the null hypothesis (which means the results are not statistically significant.)

It sounds like a nitpick, but it’s very important that we get it right. In this course, the p-value is 0.41 so we fail to reject the null hypothesis.

Later, we do more boxplots, more t-tests, and calculate the chi square test of the relationship between suicidality and religiosity. In this step, it was helpful to have some of the output to compare that I had done the calculation correctly.

Part III. Data Coding

Part III is shorter than the previous two parts, and focuses primarily on preparing the data for machine learning. In fact, since these steps repeated steps from the earlier lessons (and I still had my dataframes from those steps available), it was very quick.

The end result of Part III is having a dataframe that is “coded”, so the different factors (for example, Gender) are replaced with numerical values, for example where male = 0 and female = 1.

Coded data

Part IV. Machine Learning

Part IV focuses on machine learning. While I do have an okay grasp of other statistical and data science concepts, machine learning (which some people consider to be the only true data science, not that I agree with that) is something I haven’t been exposed to much.

Part IV is by far the longest part of the workbook. While Part I consists of 9 steps, Part II of 16 steps, Part III of 9 steps as well, Part IV is 27 steps!

Part IV involves use of the library scikit-learn, a library I was familiar with by name but actually hadn’t used at all, in contrast to the others. Luckily they included a few resources.

Continuing with the theme of providing almost enough help, I found Step 6 to contain far too little detail:

Perhaps the solution here would be to sort the lesson difficulty on two matrices: difficulty in data science and difficulty in machine learning. It’s possible for hypothesis testing, EDA and programming to be easy while machine learning is more difficult.

Essentially the lesson has you split your data into test and training groups, and then set up 6 different machine learning algorithms:

  • DummyRegressor (which actually doesn’t do anything more than predict one specific Y output for all of your X inputs and is used to establish a baseline)
  • Linear Regression
  • Decision Tree
  • Random Forest

And then has you do a bit of manipulation of your variables. Unfortunately the lack of detail here means that this part of the workbook is not very useful, despite being the longest. With no sample code, specific steps, or even examples of the outputs, it’s impossible to know if you’re on the right track.

This part of the course was the most disappointing of the bunch, and finding out that the Telegram group promised didn’t exist, and the Facebook page is just a regular business page with no chat feature, didn’t help.

Conclusion

I’m fairly satisfied with UpLevel. A little bit of additional detail to confirm calculations I’ve made are correct (or perhaps the option to unlock the solution) would be a great addition. Second, it’s important to have opportunities for students to connect with each other.

Despite these minor issues, I’m looking forward to next month, and might even buy a course early to keep my momentum going.

When searching for a way to describe it, I’d say it’s almost like an asynchronous internship. You have an expert “looking over your shoulder” and providing different ways in which to accomplish each task towards an agreed-upon goal, but also giving you the freedom to go at it.

I hope UpLevel continues to improve their subscriptions.

Happy learning!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Privacy Policy

See here for our privacy policy. This site uses affiliate links and Adsense ads to provide targeted advertising.

Tags

analytical technique assessment communication community development counselling crisis chat crisis intervention data science eastern university economic development education empathy evaluation forecasting fundraising governance information technology intelligence intelligence analysis keokuk county language learning legal management peer support personal development phd politics professional development protective factors psychosocial risk factors safety planning salesforce sigourney social media statistics suicide suicide assessment suicide risk assessment technology terrorism training university of the cumberlands violence risk assessment youth

Recommended Posts

  • Conducting Psychosocial Assessments
  • DCIB Model of Suicide Risk Assessment
  • ABC Model of Crisis Intervention
  • My Friend is Suicidal - What do I do?

Recent Posts

  • ITS833 Information Governance
  • Enhanced Care Management (ECM) with Salesforce
  • ITS835 Enterprise Risk Management
  • Glorifind Christian Search Engine
  • Sigourney Iowa Election Results, 2023

Archives

  • November 2023 (6)
  • October 2023 (1)
  • September 2023 (3)
  • August 2023 (1)
  • July 2023 (1)
  • May 2023 (1)
  • March 2023 (1)
  • February 2023 (2)
  • January 2023 (4)
  • December 2022 (2)
  • May 2022 (1)
  • April 2022 (2)
  • March 2022 (1)
  • February 2022 (1)
  • December 2021 (1)
  • October 2021 (1)
  • August 2021 (2)
  • May 2021 (3)
  • December 2020 (1)
  • November 2020 (4)
  • July 2020 (1)
  • June 2020 (1)
  • April 2020 (1)
  • March 2020 (4)
  • February 2020 (7)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (2)
  • September 2019 (4)
  • August 2019 (2)
  • March 2019 (1)
  • February 2019 (1)
  • January 2019 (1)
  • December 2018 (4)
  • November 2018 (3)
  • October 2018 (3)
  • September 2018 (19)
  • October 2017 (2)
  • September 2017 (2)
  • August 2017 (1)
  • July 2017 (39)
  • May 2017 (3)
  • April 2017 (4)
  • March 2017 (4)
  • February 2017 (4)
  • January 2017 (5)
  • December 2016 (4)
  • November 2016 (4)
  • October 2016 (5)
  • September 2016 (4)
  • August 2016 (5)
  • July 2016 (5)
  • June 2016 (5)
  • May 2016 (3)
  • April 2016 (2)
  • March 2016 (2)
  • February 2016 (2)
  • January 2016 (4)
  • December 2015 (2)
  • November 2015 (2)
  • October 2015 (2)
  • September 2015 (2)
  • August 2015 (1)
  • June 2015 (2)
  • May 2015 (5)
  • April 2015 (3)
  • March 2015 (8)
  • February 2015 (12)
  • January 2015 (28)

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Recent Comments

  • Dustin on Starting a Crisis Line or Hotline
  • HAPPINESSHEALTHCOURAGE, LLC on Starting a Crisis Line or Hotline
  • ITS833 Information Governance - Dustin K MacDonald on University of the Cumberlands PhD in Information Technology
  • Elected Officials in Sigourney, Iowa - Dustin K MacDonald on Sigourney Iowa Election Results, 2023
  • ITS 835 Enterprise Risk Management - Dustin K MacDonald on University of the Cumberlands PhD in Information Technology

Tags

analytical technique assessment communication community development counselling crisis chat crisis intervention data science eastern university economic development education empathy evaluation forecasting fundraising governance information technology intelligence intelligence analysis keokuk county language learning legal management peer support personal development phd politics professional development protective factors psychosocial risk factors safety planning salesforce sigourney social media statistics suicide suicide assessment suicide risk assessment technology terrorism training university of the cumberlands violence risk assessment youth
© 2023 Dustin K MacDonald | Powered by Minimalist Blog WordPress Theme