Using Sentiment to Predict Personality (SIOP ML Competition) Part I

 — 

This article leverages a pre-built sentiment python package to create features from open-ended text that are used to predict self-reported personality traits. The data used comes from the 2019 SIOP Machine Learning Competition and this simple methodology produces a 20th place score.

Category: data science Tags:

Using Parallel Processing and Vectorization to Speed Up Your Code

 — 

Often times as data scientists we are looking for ways to speed up our data pipeline. In this article I discuss and briefly test a new dataframe that allows you to leverage multiple CPU cores. I also explore 3 different ways to process dataframe columns and compare their speeds across both the traditional dataframe (pandas) and the parallel dataframe (modin). This is meant to be a brief article highlighting some research I have been doing lately.

Category: programming Tags:

Topic Modeling Company Reviews with K-means

 — 

This is the third of three articles in my series on using unsupervised machine learning algorithms in Python to understand open-ended survey responses. I'll again re-visit the "Cons" responses, but this time I will use the K-means clustering algorithm. After the responses our clustered I will examine the responses within each cluster to identify themes and examine the breakdown of themes across companies.

Category: data science Tags:

Topic Modeling Company Reviews with LDA

 — 

This is the second of three articles in my series on using topic modeling in Python to understand open-ended survey responses. I'll again re-visit the "Cons" responses, but this time I will use Latent Dirichlet Allocation (LDA). I'll also walk through how to use an interactive visualization library to view the results of the LDA model.

Category: data science Tags:

Topic Modeling Company Reviews with SVD

 — 

Unsupervised learning is an important part of machine learning and as I/Os we often find ourselves with data that we are asked to make sense of but we don't have any target to optimize for. When it comes to NLP in surveys, employee feedback forms, and customer reviews, a common request is to help break down all the responses into general categories. This is where a method like topic modeling may be useful. In this article we'll walk through how to leverage Singular Value Decomposition (SVD) to do topic modeling on company reviews.

Category: data science Tags:

© N. Koenig 2016

Powered by Pelican

Fork me on GitHub