Natural Language Processing on Earnings Call Transcripts

Investment strategy using natural language processing on earnings call transcripts. Based on the S&P Global papers Natural Language Processing – Part II: Stock Selection and Natural Language Processing – Part III: Feature Engineering.

Background

S&P Global released two papers using natural language processing on stocks' earnings call transcripts, which purported an outperforming investment strategy backtest.

The Part II paper suggested that sentiment scores could be created using the Loughran and McDonald Sentiment Word Lists. Using the net positive score (the number of positive words minus the number of negative words divided by the total number of words in the transcript), an investment strategy was created. The investment strategy takes the top 20% quintile of transcript scores over a four-month lookback period. The stocks chosen are equal weighted and are rebalanced at month-end. The paper suggested that a long-only strategy yielded a 2.35% monthly average return, while a long-short strategy yielded a 4.14% monthly average return.
The Part III paper suggested that scores could be created using descriptor tags (i.e. revenue, earnings, profitability) along with positive or negative keywords within each transcript sentence. Using the net positive score (the number of positive descriptor tag sentences minus the number of negative descriptor tag sentences divided by the total number of sentences in the transcript), an investment strategy was created. The investment strategy is to take the top 20% quintile of transcript scores over a four-month lookback period. The stocks chosen are equal weighted and are rebalanced at month-end. The paper suggested that a long-only strategy yielded a 4.24% monthly average return, while a long-short strategy yielded a 9.16% monthly average return.

My tests focus on S&P500 stocks from Apr 2012 - Aug 2022. I did not factor in stocks that had moved in or out of the S&P500 over the investment strategy period. Simply, I only looked at stocks that were present in the S&P500 as of Sep 2022. If the indicators were truly indicative of market outperformance, I did not believe that small stock selection differences in the S&P500 over time would yield a significantly different result.

Results

Over the investment strategy period, a buy-and-hold strategy would have generated a 4.07x return (again, based on stocks that were presently in the S&P500 as of Sep 2022 and not factoring in stocks that moved in or out of the index over time).

My results show much worse results than suggested in the papers. The long-only strategy slightly outperforms the buy-and-hold strategy, but not significantly. The long-short strategy fails to generate a profit, as the short side of the trades eats away at all of the long trades' profits.

I am surpised that the short trades results were significantly different than the papers' results. Perhaps I incorrectly programmed the back test, although I heavily reviewed the back test programming logic.

All of the earnings calls and stock data are in this repo, so feel free to test yourself.

NLP II long-only back test

Return	Long Hit Rate	Short Hit Rate
4.36x	59.1%	N/A

NLP II long-short back test

Return	Long Hit Rate	Short Hit Rate
0.98x	59.1%	42.0%

NLP III long-only back test (revenue topic + directionally positive)

Return	Long Hit Rate	Short Hit Rate
4.10x	59.1%	N/A

NLP III long-short back test (revenue topic + directionally positive)

Return	Long Hit Rate	Short Hit Rate
0.99x	59.1%	42.2%

Programs

There are several programs in this repository:

Two PDFs which are the two papers
article folder - contains all of the earnings call transcripts (24k earnings call transcripts in total)
stock_price folder - contains Yahoo Finance stock prices for each ticker
SP500 Download Prices.py - downloads stock price data from Yahoo Finance
Download Transcript Links.py - downloads SeekingAlpha earnings call transcript links
Download Transcript Text.py - downloads SeekingAlpha earnings call transcript text
NLP II - Sentiment Score.py creates net positive sentiment scores based on the Loughran and McDonald Sentiment Word Lists from each earnings call transcript text
NLP II - Backtest.py - creates a stock back test based on the net positive sentiment scores
NLP III - Topic Positive Direction Score.py - creates scores based on sentences that contain a topic (revenue, earnings, profitability) plus net positive words from each earnings call transcript text
NLP III - Guidance and Topic Score.py - creates scores based on sentences that contain a topic (revenue, earnings, profitability) plus a guidance word from each earnings call transcript text
NLP III - Backtest.py - creates a stock back test based on the score created by NLP III - Topic Positive Direction Score.py or NLP III - Guidance and Topic Score.py

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
article		article
old		old
stock_price		stock_price
.gitattributes		.gitattributes
Download Transcript Links.py		Download Transcript Links.py
Download Transcript Text.py		Download Transcript Text.py
NLP II - Backtest.py		NLP II - Backtest.py
NLP II - Sentiment Score.py		NLP II - Sentiment Score.py
NLP III - Backtest.py		NLP III - Backtest.py
NLP III - Guidance and Topic Score.py		NLP III - Guidance and Topic Score.py
NLP III - Topic Positive Direction Score.py		NLP III - Topic Positive Direction Score.py
NLP-II-Paper-S&P-Global.pdf		NLP-II-Paper-S&P-Global.pdf
NLP-III-Paper-S&P-Global.pdf		NLP-III-Paper-S&P-Global.pdf
README.md		README.md
SP500 Data.csv		SP500 Data.csv
SP500 Download Prices.py		SP500 Download Prices.py
articles_links.txt		articles_links.txt
earnings_to_search.csv		earnings_to_search.csv
guidance_topic_score_final.txt		guidance_topic_score_final.txt
loughran_mcdonald_score_final.txt		loughran_mcdonald_score_final.txt
net_topic_postive_score_final.txt		net_topic_postive_score_final.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing on Earnings Call Transcripts

Background

Results

Programs

About

Releases

Packages

Languages

personal-coding/Stock-Earnings-Call-Transcript-Natural-Language-Processing

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing on Earnings Call Transcripts

Background

Results

Programs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages