About Me & Resources I've Found Useful

STL

Where I'm From:

I grew up in Saint Louis, Go Cardinals and Go Blues!!

My Background:

I have my Ph.D. in Industrial and Organizational Psychology and within the past 3 years have become increasingly interested in Python, Machine Learning, and Deep Learning. I've taught myself how to program, use advanced machine learning algorithms and techniques, manipulate data, and even leverage Tensorflow and PyTorch for deep learning. I constantly like to learn new things, so I always have something I am working on.

The Purpose of This Blog:

A lot of the language in programming and Data Science is very foreign to the language we were taught as social scientists. It took me a lot of time and research to start to tie everything together. I hope this site can help other social scientists because I will try to tie what I am doing back to it's purpose for work we are likely to do.

I think programming is extremely important for everyone. I see a lot of people ask if there are packages or pre-built resources to do certain things when it comes to data. The beauty of being able to program is you can build them.

As of now I will likely post about the following in reference to what/how I've learned and what I've found useful/important.

  • Python/Programming
  • Machine Learning
  • Data Wrangling/Manipulation
  • Deep Learning
  • Version Control (Git)
  • Math
  • I/O relevant topics I think are important as they come up
  • The integration of AI and I/O

From my experience reading other resources many assume a basic familiarity with the computer that honestly many of us in the social sciences don't have. They assume you are coming from a developer type background. For this reason I will likely focus on some of the very basics, like how to install stuff step-by-step. I will also focus on certain packages and certain functions within them that I believe are valuable and useful for social science type research and data analysis. However, because there are a plethora of resources (that are free or extremely affordable) I won't teach you how to write python or how to use every single feature in a particular package. However, I will do my best to point you to resources to learn more. But the resources I provided below should be adequate

programmer_joke

Resources

I don't intend for this blog to teach you how to use Python and/or R from scratch. I will show features and packages I think are important, but as many people have mentioned, most intro stuff has already been done 100+ times, so what I'll do is link you to resources I've found useful. Hopefully that helps you build your mental model around DS. I'm thinking if we came from similar backgrounds the resources that resonated with me will hopefully resonate with you as well.

Introduction to Python:

  • Intro to CS with Python
    • This course is provided by MIT and is fairly difficult for a MOOC, but it provides you with a great intro background to Python and basic concepts in Computer Science. It's free to audit.
  • Complete Python Bootcamp
    • A great intro to Python course from Udemy taught by Jose Portilla, who IMO is a great instructor when it comes to teaching the basics. I included a coupon code that gets the price down to 11 dollars.

Python Podcasts

I like to listen to podcasts when I'm lifting/running/driving, etc. I think it's a really good way to keep up-to-date and pick up on the langauge used. Here are a couple that I've found useful focused on Python

  • Talk Python to Me
  • Python Bytes
  • Podcast.init

Datascience/ML Focused Courses:

I would recommend these next resources for data science and machine learning in the order provided below as the first courses hold your hand a bit more than some of the other resources. Some of my basic thoughts when it comes to learning.

  • I recommend using Jupyter Notebooks. It allows you to iterate through code quickly and fix broken code. When you are learning this is great.

  • One of the things I struggled with early on was getting frustrated when my code wouldn't run. Maybe this is just me, but when you are coding you will almost certainly screw up, lol. You can't always get it right the first time.

  • Google and Stackoverflow are your friend. The great thing about Python (and R) compared to SPSS and/or SAS is that there are tons of resources out there. If you have a question about something it's almost a gaurantee that someone else has had the same question and it has been answered on stackoverflow. Take advantage of that. As you get more familiar with Pandas, Numpy, etc. you will start to understand the lingo a bit better and your searches will yield you the results you are looking for much more quickly. At first for me it was searching something 4-5 different ways to find an answer, now I can usually find the information I am looking for on the first search.

  • Python for Data Science and Machine Learning Bootcamp

    • Again I included a coupon so you can get the course for 11 dollars. I highly recommend this course, which is also taught by Jose Portilla. It goes over the basics of NumPy, Pandas, Matplotlib, Bokeh, and Scikit Learn. He uses actual datasets and provides you with challenges at the end of each section to help the topics sink in.
  • Machine Learning A-Z

    • This course isn't as engaging as Jose's, but it does a really good job of showing you what a data pipeline process looks like. It really helps you understand how similar most of the commands are across algorithms. I.e. initialize the model as a python object, fit the model, then predict.
  • Python Data Science Handbook

    • Jake made his entire book available for free on his github page and I think it's a great resource. I would highly encourage you to purchase the book as well to support more experts like this helping to teach. After you get the basics down Jake's book does a great job of helping walk you through more complex topics and code.
  • Python Machine Learning

    • Sebastian's book is great and starts to get into some more advanced topic and code. This is one I personally own and reference very often. I would highly recommend.
  • Machine Learning Specialization

    • This is a Machine Learning specialization through Coursera taught by professors from the University of Washington. IMO this course does a great job of going deeper into the math behind linear and logistic regression. For those of you, like me, your grad program didn't really go into much detail about how the algorithms work, so this was a very interesting specialization for me.
    • My only issue with the course was that it used an ML API different than scikit learn that actually costs money (although they provide it to students for free for one year). Early on when I was taking it I decided to do the assignments twice. Once with their API and then again with scikit learn, but I started to speed it up towards the end and didn't do it for the last 2 courses.

Datascience Focused Podcasts:

I probably listen to DS podcasts more than Python podcasts and here are ones I've found useful:

  • Machine Learning Guide
  • SuperDataScience Podcast
  • DataFramed
  • This Week in Machine Learning & AI
  • Linear Degressions
  • Data Journeys
  • Practical AI
  • O'Reilly Data Show
  • Not So Standard Deviations
  • Data Stories

Deep Learning

One thing I want to mention about all of the Udemy courses. For many of them I do not have a coupon code. Udemy's prices fluctuate like crazy. So if there is a course I listed or another one on the site you find interesting if it's like 99 dollars (like the PyTorch one when I just looked) wait a few weeks and you can probably get it on a sale for 10-12 dollars. I paid 10 dollars for that same PyTorch course a few weeks ago for example.

Sidenote: I had to use the word dollars instead of the dollar sign because in markdown the dollar sign is the start of mathemtical writing :)

My Advice for Python

If I were trying to learn how to be a data science type person and had to start over I would take the following two courses:

  1. Complete Python Bootcamp
  2. Python for Data Science and Machine Learning Bootcamp

In that order.