Natural Language Processing Using the 2019 SIOP Machine Learning Competition Data

In recent years academics and practitioners a like have realized that while structured surveys and assessments are valuable, employees and candidates can provide us much more rich data via the written or spoken word. In this article I'll leverage the 2019 SIOP Machine Learning Competition Dataset to explore using NLP to create features to predict outcomes of interest. In future articles I'll create additional features, use bag of words, and even leverage deep learning methods. I hope this is engaging for readers because with this comes a benchmark via the leaderboard with which we can compare each method via ranking compared to other teams.

The 2019 SIOP Machine Learning Competition

The 2019 SIOP Machine Learning Competition was an NLP task where the goal was to predict someone's big 5 trait scores from a series of 5 open-ended situational judgment items, that were designed to each elicit one of the big 5 traits. The metric of interest was the mean correlation across all traits. The training set had 1088 respondents, who each answered 5 questions for a total of 5,440 open-ended responses and the public and private leaderboard datasets each had 300 respondents for a total of 1,500 open-ended responses. Below are some screenshots from the competition website.

img img img

Personality Traits

For the purpose of this article I'm going to assume most readers are familiar with the Big 5 personality traits, but for those that are not wikipedia is a good starting point, but for simplicity you can remember the traits by the Acronym OCEAN

Items

As mentioned each item was designed to elicit a specific trait. Let's quickly look at the items under the trait they were designed to elicit.

Openness To Experience

  • "The company closed a deal with a client from Norway and asks who would like to volunteer to be involved on the project. That person would have to learn some things about the country and culture but doesn't necessarily need to travel. Would you find this experience enjoyable or boring? Why?"

Conscientiousness

  • "You have a project due in two weeks. Your workload is light leading up to the due date. You have confidence in your ability to handle the project, but are aware sometimes your boss gives you last tasks that can take significant amounts of time and attention. How would you handle this project and why?"

Extraversion

  • "You and a colleague have had a long day at work and you just find out you have been invited to a networking meeting with one of your largest clients. Your colleague is leaning towards not going and if they don't go you won’t know anyone there. What would you do and why?"

Agreeableness

  • "The company closed a deal with a client from Norway and asks who would like to volunteer to be involved on the project. That person would have to learn some things about the country and culture but doesn't necessarily need to travel. Would you find this experience enjoyable or boring? Why?"

Neuroticism

  • "Your manager just gave you some negative feedback at work. You don’t agree with the feedback and don’t believe that it is true. Yet the feedback could carry real consequences (e.g., losing your annual bonus). How do you feel about this situation? What would you do?"

Let's start the way we would any project, by importing some of our packages and doing a little bit of Exploratory Data Analysis (EDA)

In [1]:
# import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
In [2]:
# load the data
df = pd.read_csv('data/siop_ml_train_full.csv')
In [3]:
list(df.columns)
Out[3]:
['Respondent_ID',
 'open_ended_1',
 'open_ended_2',
 'open_ended_3',
 'open_ended_4',
 'open_ended_5',
 'E_Scale_score',
 'A_Scale_score',
 'O_Scale_score',
 'C_Scale_score',
 'N_Scale_score',
 'label',
 'all_open']

From looking at the columns included in the dataset we can see they have the 5 open-ended responses, the trait scores for each of the respondents and I added a label column so the data could be split identically to the way it was split for the competition. The first thing we want to do is ensure that each of the open-ended responses are all strings. It's a small formatting process that could potentially cause errors later if we don't do it up front. So I'll create a list of column names and then write a quick for loop to ensure they are all strings.

In [4]:
oes = ['open_ended_1',
 'open_ended_2',
 'open_ended_3',
 'open_ended_4',
 'open_ended_5']

for i in oes:
    df[i] = df[i].astype(str)

Examining the Criterion

Let's look at the trait level score distributions.

In [5]:
df[['E_Scale_score','A_Scale_score','O_Scale_score','C_Scale_score','N_Scale_score']].describe()
Out[5]:
E_Scale_score A_Scale_score O_Scale_score C_Scale_score N_Scale_score
count 1688.000000 1688.000000 1688.000000 1688.000000 1688.000000
mean 3.488398 4.120458 3.856783 4.403288 2.070843
std 0.786168 0.612833 0.706988 0.590336 0.762244
min 1.000000 1.333333 1.166667 1.000000 1.000000
25% 3.000000 3.750000 3.416667 4.083333 1.500000
50% 3.500000 4.166667 3.916667 4.583333 2.000000
75% 4.083333 4.583333 4.416667 4.916667 2.583333
max 5.000000 5.000000 5.000000 5.000000 4.833333

What do we immediately notice?

  1. Everyone is pretty positive. Not very many means near 3.
  2. The conscientiousness scale is extremely skewed with a mean score of over 4.4 out of 5. This could provide us problems, given the large mean and small standard deviation. You need variance for prediction.

Examining the Predictors

Let's take a look at the responses as well.

Let's look at length, common words, etc.

For simplicity sake let's just focus on the Neuroticism item for this. Another way to do this might be to concatenate them all together and ignore the fact that they are responses to different prompts, but for the sake of this article we'll continue to keep them separate.

Length

Let's write a quick for loop that leverages the oes list above and create new columns for the length of each open-ended response.

In [6]:
for i in oes:
    col = str(i)+"_len"
    df[col] = df[i].apply(lambda x: len(x.split(" ")))
In [7]:
df['open_ended_4_len'].describe()
Out[7]:
count    1688.000000
mean       53.186611
std        22.474700
min        15.000000
25%        38.000000
50%        49.000000
75%        63.000000
max       235.000000
Name: open_ended_4_len, dtype: float64

So the longest response to the neuroticism prompt is 235 words and the shortest is 15, with a median response length of 49 words.

Most Common Words

To get the word count we'll want to do a small bit of pre-processing that takes out punctuation and stopwords as it will almost certainly be the case a few stopwords like; I, the, etc. will be the most common if we do not.

  • The first cell creates a list of all responses to open_ended_4 and then joins them all together.
  • Then the function text_process removes punctuation and stopwords
  • Then we can use the FreqDist and word_tokenize methods from nltk to produce our list of the most commone words.
In [8]:
oe_4 = df['open_ended_4'].tolist() # neuroticism
oe_4_corpus = " ".join(oe_4)
In [9]:
import nltk
from nltk.corpus import stopwords
import string
#Pre-processing the data
def text_process(mess):
    """
    Takes in a string of text, then performs the following:
    1. Remove all punctuation
    2. Remove all stopwords
    3. Returns a list of the cleaned text
    """
    # Check characters to see if they are in punctuation
    nopunc = [char for char in mess if char not in string.punctuation]
    
    # Join the characters again to form the string.
    nopunc = ''.join(nopunc)
    nopunc = nopunc.lower()
    # Now just remove any stopwords
    return " ".join([word for word in nopunc.split() if word.lower() not in stopwords.words('english')])
In [10]:
oe_4_corpus_clean = text_process(oe_4_corpus)
In [11]:
from nltk.probability import FreqDist
from nltk import word_tokenize

words = word_tokenize(oe_4_corpus_clean)
fdist = FreqDist(words)
fdist
Out[11]:
FreqDist({'would': 4675, 'feedback': 1471, 'manager': 1310, 'ask': 730, 'try': 676, 'feel': 640, 'negative': 520, 'situation': 463, 'work': 461, 'could': 448, ...})

So we can see here that after we remove the stop words the most used words are "would", which is part of the prompt and should be included as an additional stop word IMO, "feedback", "manager", and "ask". Not necessarily anything that readily jumps off the page thus far, but nevertheless interesting data to have.

Creating Features

The first thing you have to do in machine learning is create features you think will provide meaningful signal when it comes to predicting our criterion of interest. One way to do that is to leverage others' work. For the purposes of this article we will leverage a pre-trained sentiment analysis package.

Sentiment Analysis

img

Sentiment Analysis is a fairly simple concept. A trained sentiment based model has the capability of analyzing a set of written (or spoken text) and identifying whether the corpus is positive, negative, or neutral. A good overview of sentiment analysis in action is available here, but for the focus of this article I will assume you are generally familiar with sentiment and it's uses.

One potential use is for the sentiment proportions to be used as predictors for each of the trait scores.

We could go through the effort of collecting data from several sources (Yelp, IMBD, etc.) and building our own sentiment analysis algorithm, or we can leverage what is already available to make it much easier. For the purposes of this article we will use Vader which stands for Valence Aware Dictionary & sEntiment Reasoner.

Let's first install Vader, which we can do following the documentation from the github repo. Just for quick reference you can pip or conda install packages in the Jupyter Notebook by using the exclamation point at the beginning of the line. This actually acts as a terminal command in the notebook, you can read more here. For the purposes of this article I am going to comment mine out as I have already installed the package.

In [12]:
#!pip install vaderSentiment

After we install the package we can follow the documentation from the repo and load the Sentiment Analyzer by running the following command.

In [13]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Now let's test it out on a response. Let's first pick a few written responses from our dataframe. We can do that by indexing into a column. [:2] will get us the first two responses from a column.

In [14]:
sentences = df['open_ended_1'][:2]; sentences
Out[14]:
0    I would change my vacation week, because I am ...
1    I would talk to my colleague and see if they w...
Name: open_ended_1, dtype: object

Now let's take the code from the repo and feed it the sentences.

In [15]:
analyzer = SentimentIntensityAnalyzer()
for sentence in sentences:
    vs = analyzer.polarity_scores(sentence)
    print("{:-<65} {}".format(sentence, str(vs)))
I would change my vacation week, because I am a better employee than he or she. We can't both be off work at the same time because then the other people at work would suffer because of our selfishness. Also, the other employee/ or even my boss now kind of owes me... {'neg': 0.113, 'neu': 0.835, 'pos': 0.053, 'compound': -0.5106}
I would talk to my colleague and see if they were willing to change the vacation date because I really need to have this week off. If the colleague was not willing to change it I would go talk to my boss. I would explain to my boss that I would still like to have the week off I requested but if it is not possible than I will take the week after. I would do this like this because I think it would make it easier and I would not want to start a workplace conflict over a vacation week {'neg': 0.044, 'neu': 0.866, 'pos': 0.09, 'compound': 0.6616}

Output

You can see here we get 4 outputs from the sentiment analyzer.

  1. Negative
  2. Neutral
  3. Positive
  4. Compound

Negative, neutral, and positive outputs provide the proportions of text that fall into each category and compound is a value between -1 and 1 that provides the intensity. This is great, but we are going to have to write our own function to save all of these as this loop prints and then writes over. So....how can we write a function that we can use?

We are going to be feeding it a list of text, so let's assign that as the input to the function. We want to loop through that list and we want to save the output of the list to another a list (a list of lists). So that looks like this:

In [16]:
def get_sentiment(text_list):
    json_list = []
    for i in text_list:
        sentiment = analyzer.polarity_scores(i)
        json_list.append(sentiment)
    return json_list
In [17]:
get_sentiment(sentences)
Out[17]:
[{'neg': 0.113, 'neu': 0.835, 'pos': 0.053, 'compound': -0.5106},
 {'neg': 0.044, 'neu': 0.866, 'pos': 0.09, 'compound': 0.6616}]

Now, what do we get as output if we test our sentences object on it? We get a list of dictionaries, which we clearly need to parse. Luckily pandas makes that easy with json_normalize

In [18]:
from pandas.io.json import json_normalize

Let's first turn all of the text rows into a list.

In [19]:
oe_1 = df['open_ended_1'].tolist() # agreeableness
oe_2 = df['open_ended_2'].tolist() # conscientiousness
oe_3 = df['open_ended_3'].tolist() # extraversion
oe_4 = df['open_ended_4'].tolist() # neuroticism
oe_5 = df['open_ended_5'].tolist() # openness to experience

Then let's run the function on each set of text, use json_normalize to turn the dictionary into a pandas dataframe, and rename each column to be specific to the text column it came from.

Open_Ended_1

This open ended question is designed to elicit agreeableness, refer to the beginning of the article for the exact question.

In [20]:
oe_1_sentiment = get_sentiment(oe_1)
oe_1_sent_df = json_normalize(oe_1_sentiment)
oe_1_sent_df.columns = ['oe_1_compound','oe_1_neg','oe_1_neu','oe_1_pos']
In [21]:
oe_1_sent_df.head()
Out[21]:
oe_1_compound oe_1_neg oe_1_neu oe_1_pos
0 0.113 0.835 0.053 -0.5106
1 0.044 0.866 0.090 0.6616
2 0.038 0.962 0.000 -0.3818
3 0.000 1.000 0.000 0.0000
4 0.089 0.911 0.000 -0.6120

Open_Ended_2

This open ended question is designed to elicit conscientiousness, refer to the beginning of the article for the exact question.

In [22]:
oe_2_sentiment = get_sentiment(oe_2)
oe_2_sent_df = json_normalize(oe_2_sentiment)
oe_2_sent_df.columns = ['oe_2_compound','oe_2_neg','oe_2_neu','oe_2_pos']

Open_Ended_3

This open ended question is designed to elicit extraversion, refer to the beginning of the article for the exact question.

In [23]:
oe_3_sentiment = get_sentiment(oe_3)
oe_3_sent_df = json_normalize(oe_3_sentiment)
oe_3_sent_df.columns = ['oe_3_compound','oe_3_neg','oe_3_neu','oe_3_pos']

Open_Ended_4

This open ended question is designed to elicit neuroticism, refer to the beginning of the article for the exact question.

In [24]:
oe_4_sentiment = get_sentiment(oe_4)
oe_4_sent_df = json_normalize(oe_4_sentiment)
oe_4_sent_df.columns = ['oe_4_compound','oe_4_neg','oe_4_neu','oe_4_pos']

Open_Ended_5

This open ended question is designed to elicit openness to experience, refer to the beginning of the article for the exact question.

In [25]:
oe_5_sentiment = get_sentiment(oe_5)
oe_5_sent_df = json_normalize(oe_5_sentiment)
oe_5_sent_df.columns = ['oe_5_compound','oe_5_neg','oe_5_neu','oe_5_pos']

Then let's concatenate all 6 of the dataframes back together.

In [26]:
new_df = pd.concat([df,oe_1_sent_df,oe_2_sent_df,oe_3_sent_df,oe_4_sent_df,oe_5_sent_df],axis=1)
In [27]:
new_df.to_csv('data/vader_features.csv',index=False)

Features/Variables to Include

As we mentioned earlier, for the purposes of this article we will only be focusing on Sentiment as features, so we will take the proportion scores of each set of text being negative, neutral, and positive as well as the compound score as the input features for our model.

Later we'll consider some others and see which one's appear the most important.

In [28]:
X = new_df[['oe_5_compound','oe_5_neg','oe_5_neu','oe_5_pos','oe_4_compound','oe_4_neg','oe_4_neu','oe_4_pos','oe_3_compound','oe_3_neg','oe_3_neu','oe_3_pos',
           'oe_2_compound','oe_2_neg','oe_2_neu','oe_2_pos','oe_1_compound','oe_1_neg','oe_1_neu','oe_1_pos','label']]
ys = new_df[['E_Scale_score','A_Scale_score','O_Scale_score','C_Scale_score','N_Scale_score','label']]
In [29]:
new_df.label.unique()
Out[29]:
array(['train', 'dev', 'test'], dtype=object)

Remember we have 3 different datasets here, so we'll need to seperate them into train, dev, and test then drop the label column from the dataframes.

In [30]:
X_train = X[X['label']=='train']
y_train = ys[ys['label']=='train']

X_dev = X[X['label']=='dev']
y_dev = ys[ys['label']=='dev']

X_test = X[X['label']=='test']
y_test = ys[ys['label']=='test']
In [31]:
y_train.drop(columns='label',inplace=True)
y_dev.drop(columns='label',inplace=True)
y_test.drop(columns='label',inplace=True)
X_train.drop(columns='label',inplace=True)
X_dev.drop(columns='label',inplace=True)
X_test.drop(columns='label',inplace=True)

Ridge Regression

Ridge regression is essentially an Ordinary Least Squares technique that leverages L2 Regularization. In simple terms this method help us to avoid overfitting our models to the data. L2 regularization specifically penalizes each of the features. Unlike L1 regularization it doesn't necessarily try to force the features to be extremely positive or negative (hence acting as a feature selection technique). The following article goes into much more depth regarding ridge regression. We will use the sklearn ridge regression implementation and for simplicity sake we'll use the penalization term of 1.0, which is the default.

img

Let's build a simple function that takes in the training and testing data as well as the label column and returns a correlation coefficient.

In [32]:
from sklearn.linear_model import Ridge

def run_ridge(X_train, X_test, y_train, y_test, y_label):
    ridge = Ridge()
    ridge.fit(X_train,y_train[y_label])
    test_preds = ridge.predict(X_test)
    test = pd.DataFrame(y_test[y_label])
    test['Pred_score'] = test_preds
    return test.corr().values[0][1]

Development Set

i.e. the Public Leaderboard

In [33]:
# Extraversion
E_pred = run_ridge(X_train, X_dev, y_train, y_dev, ['E_Scale_score'])
# Agreeableness
A_pred = run_ridge(X_train, X_dev, y_train, y_dev, ['A_Scale_score'])
# Conscientiousness
C_pred = run_ridge(X_train, X_dev, y_train, y_dev, ['C_Scale_score'])
# Openness
O_pred = run_ridge(X_train, X_dev, y_train, y_dev, ['O_Scale_score'])
# Neuroticism
N_pred = run_ridge(X_train, X_dev, y_train, y_dev, ['N_Scale_score'])
In [34]:
print("mean correlation: ", round(np.array([E_pred,A_pred,C_pred,O_pred,N_pred]).mean(),3))
mean correlation:  0.232

So this would have gotten us right in about 20th place on the public leaderboard. Not bad for spending just a few minutes cleaning the data and setting up an out of the box ridge regression only leveraging some simple sentiment features.

img

Let's look at which traits were better and worse.

In [35]:
np.array([E_pred,A_pred,C_pred,O_pred,N_pred])
Out[35]:
array([0.25960968, 0.35637792, 0.09777237, 0.23296561, 0.21310606])

By looking at the individual scores we can see that ridge regression is best at predicting agreeableness and by far the worst at predicting conscientiousness.

What about some other machine learning algorithms?

Random Forest

img

Next, let's look at one of the more popular "Machine Learning" algorithms today the Random Forests algorithm. In fairly simple terms the Random Forests algorithm uses bagging where it takes random subsets of both data and features and creates a tree-based model to predict the label. All of these "trees" are then aggregated together to form an ensemble model using the wisdom of the crowds theory. For more specifics I refer you to this article.

We will implement sklearn version of the algorithm.

One quick note, with ridge regression I originally included the length features and found that they slightly lowered the predictions of each of the traits, so I excluded them. But for Random Forests I found the opposite, so we'll include them here.

In [36]:
X = new_df[['oe_5_compound','oe_5_neg','oe_5_neu','oe_5_pos','oe_4_compound','oe_4_neg','oe_4_neu','oe_4_pos','oe_3_compound','oe_3_neg','oe_3_neu','oe_3_pos',
           'oe_2_compound','oe_2_neg','oe_2_neu','oe_2_pos','oe_1_compound','oe_1_neg','oe_1_neu','oe_1_pos','label','open_ended_1_len','open_ended_2_len','open_ended_3_len',
            'open_ended_4_len','open_ended_5_len']]
ys = new_df[['E_Scale_score','A_Scale_score','O_Scale_score','C_Scale_score','N_Scale_score','label']]

X_train = X[X['label']=='train']
y_train = ys[ys['label']=='train']
X_dev = X[X['label']=='dev']
y_dev = ys[ys['label']=='dev']
X_test = X[X['label']=='test']
y_test = ys[ys['label']=='test']

y_train.drop(columns='label',inplace=True)
y_dev.drop(columns='label',inplace=True)
y_test.drop(columns='label',inplace=True)
X_train.drop(columns='label',inplace=True)
X_dev.drop(columns='label',inplace=True)
X_test.drop(columns='label',inplace=True)
In [37]:
from sklearn.ensemble import RandomForestRegressor
/home/nick/miniconda3/envs/tensorflow_cpu/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
In [38]:
def run_rf(X_train, X_test, y_train, y_test, y_label):
    rf = RandomForestRegressor(n_estimators = 100,n_jobs=-1)
    rf.fit(X_train,y_train[y_label])
    test_preds = rf.predict(X_test)
    test = pd.DataFrame(y_test[y_label])
    test['Pred_score'] = test_preds
    return test.corr().values[0][1]
In [39]:
# Extraversion
E_pred = run_rf(X_train, X_dev, y_train, y_dev, ['E_Scale_score'])
# Agreeableness
A_pred = run_rf(X_train, X_dev, y_train, y_dev, ['A_Scale_score'])
# Conscientiousness
C_pred = run_rf(X_train, X_dev, y_train, y_dev, ['C_Scale_score'])
# Openness
O_pred = run_rf(X_train, X_dev, y_train, y_dev, ['O_Scale_score'])
# Neuroticism
N_pred = run_rf(X_train, X_dev, y_train, y_dev, ['N_Scale_score'])
In [40]:
print(np.array([E_pred,A_pred,C_pred,O_pred,N_pred]))
print("mean correlation: ", round(np.array([E_pred,A_pred,C_pred,O_pred,N_pred]).mean(),3))
[0.14970309 0.24288334 0.04091096 0.23121258 0.11565877]
mean correlation:  0.156

Using these specific features random forests did not perform anywhere near as good as ridge regression performed overall, but you'll notice we did get a slight lift on oppenness to experience. What about Gradient Boosted Trees?

Gradient Boosted Trees

The final algorithm we will try is typically considered the most powerful. Unlike random forests which randomly develops trees via bagging, gradient boosting takes a purposeful approach and focuses on improving the weak learners or the trees that are performing relatively poorly. By doing this it boosts the overall success of the combination of trees at the end. For more specifics I refer you to this article.

We will actually not be using sklearn for the gradient boosting algorithm (although I have heard that the most recent version of sklearn does have a gradient boosting algorithm on par with XGboost and LightGBM. For this we will focus on XGBoost. Luckily for us the developers built the API to reflect the sklearn implementation, which makes it almost exactly the same.

In [41]:
from xgboost import XGBRegressor

def run_xgb(X_train, X_test, y_train, y_test, y_label):
    xgb = XGBRegressor(n_jobs=-1)
    xgb.fit(X_train,y_train[y_label])
    test_preds = xgb.predict(X_test)
    test = pd.DataFrame(y_test[y_label])
    test['Pred_score'] = test_preds
    return test.corr().values[0][1]
In [42]:
# Extraversion
E_pred = run_xgb(X_train, X_dev, y_train, y_dev, ['E_Scale_score'])
# Agreeableness
A_pred = run_xgb(X_train, X_dev, y_train, y_dev, ['A_Scale_score'])
# Conscientiousness
C_pred = run_xgb(X_train, X_dev, y_train, y_dev, ['C_Scale_score'])
# Openness
O_pred = run_xgb(X_train, X_dev, y_train, y_dev, ['O_Scale_score'])
# Neuroticism
N_pred = run_xgb(X_train, X_dev, y_train, y_dev, ['N_Scale_score'])

print(np.array([E_pred,A_pred,C_pred,O_pred,N_pred]))
print("mean correlation: ", round(np.array([E_pred,A_pred,C_pred,O_pred,N_pred]).mean(),3))
[ 0.18850013  0.25210048 -0.05598434  0.17794888  0.07289148]
mean correlation:  0.127

So, if we combine our best trait scores from each model it would be ridge regression for the following:

  • extraversion
  • agreeableness
  • conscientiousness
  • neuroticism

and random forests for:

  • openness
In [43]:
np.array([0.25960968, 0.35637792, 0.09777237, 0.24055125, 0.21310606]).mean()
Out[43]:
0.233483456

This gets us an average correlation of 0.2335 a slight improvement to what ridge regression was able to provide on its own.

Recap:

img

  • We examined the data from the 2019 SIOP Machine Learning Competition
  • We implemented an off the shelf Sentiment package to produce features we could feed into our machine learning algorithms
  • We tried 3 different algorithms; Ridge Regression, Random Forests, and XGBoost
  • Ridge Regression performed the best and provided us with an average correlation that would have put us in 20th place on the public leaderboard out of 39 teams
  • Conscientiousness' high mean and small standard deviation made it the most difficult to predict with a top correlation of 0.098

In the next article we'll explore another off the shelf feature creation package and compare the two. We can also then combine all the features together and see if we can beat our current best of 20th place.

In [ ]: