The Influence of Media on Tesla’s Stock Price
Introduction
Motivation: For those of you who have been following the financial news for the past year, there are a lot of discussions about what happened to the company, what the CEO Elon Musk said and did and how its stock price has changed accordingly. So we wonder: does what happened in the media– Elon Musk’s twitter, the attitude of the press and investors–has a significant influence on Tesla’s stock price? This is the motivation of our project.
Data: Here is a brief summary of the datasets we used. We downloaded the daily historical price of Tesla for the past two years.We also got the minutely prices from Bloomberg just in case that we need to look into the changes in more details. We scrapped the news and tweets data from twitter. The two datasets “CNBC_TSLA” and “Elon_Musk” are the tweets of CNBC’S twitter account regarding Tesla and the past tweets of Elon Musk himself. All data are stored in the “data” folder.
Methodologically, We extract data from popular news sources and social media, such as Twitter, Facebook, CNN, WSJ, etc from from 11/10/2016 to 11/10/2016. After conducting sentiment analysis (through R libraries), we use the sentiment, the number of related news, and the speed of transmission for the quanlitative analysis. We analyze Tesla’s daily stock prices and volumes from 11/10 /2016 to 11/10/2018 and minutely stock prices and volumes from from 04/11/2018 to 11/10/2018.
Hypothesis: We hypothesize that news from major media outlets such as Twitter and Wall Street Journal may have a strong influence (such as correlation) on the stock price of Tesla, Inc. One might imagine that stock prices are particularly susceptible to breaking news on social media since the news reflect new market information. For instance, when Elon Musk (Tesla’s CEO) himself gives out the unplanned news, it seems like the market would experience extreme momentary fluctuations.
The purpose of the project is to analyze and (explore correlation) between social media and news platforms with stock price. We believe that if our research finds a correlation between stock movements and news, then we might be able to create trading strategies. Otherwise, the absence of correlation should teach us not to trade stocks purely based on breaking news.
Other: The tool we majorly use is R language by Rstudio. We might also utilize Python if needed for specific libraries.
Exploratory Data Analysis
Stock Data
We get the historical data for Tesla stock from Yahoo! Finance. We take the close and open price and volume of Teala stock from 2016-11-10 to 2018-11-10.
Daily Price Data
We import the daily price and plot the log of close price over time to see how Tesla’s stock price has changed over time. We also computed the basic summary statistics(mean, range, variance, etc.) of the stock price.
library(tidyverse)
daily_price<-read.csv('data/Stock_Data/TSLA.csv')%>%
transmute(Date=as.character(Date),Open,Close,Volume)
problems(daily_price)
# tibble [0 x 4]
# ... with 4 variables: row <int>, col <int>, expected <chr>, actual <chr>
as.tibble(list(mean=mean(daily_price$Close),min=min(daily_price$Close),
max=max(daily_price$Close),variance=var(daily_price$Close), standard_deviation=sd(daily_price$Close)))
# A tibble: 1 x 5
mean min max variance standard_deviation
<dbl> <dbl> <dbl> <dbl> <dbl>
1 306. 181. 385 2084. 45.7
As we can see from the graph, Tesla’s stock price exprienced a relatively steady growth at first but started to have more volatility during the past one year or so. This is the period (07/10/2017 to 11/10/2018) in which we are mainly interested in. The mean price is $305 dollars. This relatively high price increases the barrier of investment and should theoratically decrease the volatility. The highest price is 385 while the min is only 181. The standard deviation is 45.
We also made a shiny app so that you can explore with the more detailed change in stock price. [Shiny App: https://zacklight.com/shiny/news_stocks_sentiment_analysis/ ]
Minutely data
Considering that we may need to look into the change in stock price in more detail to see how it responded to news on social media, we also obtained the minutely price and volume data for the past 7 months from Bloomberg and imported it.
library(readxl)
minutely_price<- read_excel("data/Stock_Data/bloomberg_tsla_minutely_price_04252018_11072018.xlsx",
sheet = "Sheet1")%>%transmute(Date=Dates,Open,Close,Volume)
problems(minutely_price)
# tibble [0 x 4]
# ... with 4 variables: row <int>, col <int>, expected <chr>, actual <chr>
as.tibble(list(mean=mean(minutely_price$Close),min=min(minutely_price$Close),
max=max(minutely_price$Close),variance=var(minutely_price$Close), standard_deviation=sd(minutely_price$Close)))
# A tibble: 1 x 5
mean min max variance standard_deviation
<dbl> <dbl> <dbl> <dbl> <dbl>
1 308. 248. 386. 806. 28.4
The minutely data (04/11/2018 to 11/10/2018) have similar statistics (a lower standard deviation).
Comparing to S&P500
To see how Tesla’s stock change is related to the change in stock market, we compared it to the price of S&P 500.
SP500<-read.csv('data/GSPC.csv')%>%
transmute(Date=as.character(Date),Open_SP500=Open,Close_SP500=Close,Volume_SP500=Volume)
daily_price%>%left_join(SP500,by="Date")%>%
ggplot(aes(Date))+geom_point(aes(y=log(Close)))+
geom_point(aes(y=log(Close_SP500)-2,color="red"))+
scale_y_continuous(sec.axis = sec_axis(~.+2,name="log(Close_SP500)"))
As shown in the graph, the correlation between S&P500 and Tesla is not very strong. While S&P500 is generally growing over time, there is more fluctuation in the stock price of Tesla. So, there must be other reasons driven the change and we believe the news and releases on social media can be an explanation for that. Our assumption about the influence of social media can be a possible explanation for that.
Twitter Data
Next we scrapped data of Elon Musk’s own twitter account and relative reports about Tesla from the twitter account of a news source (CNBC). The source code is in scrape_twitter_data.py.
- elon_musk is all the tweets from Elon Musk’s tweeter account. (Note there is a high chance that he would delete undesired tweets.)
- CNBC_TSLA_News represents all tweets related to Tesla from the CNBC tweeter account.
- tesla_elon represents all the publish tweets with hashtags related to Tesla.
After performing the analysis on all three datasets, we have found while all having similar characteristics, the CNBC one to have the best correlation with the stock movements. Thus, we would use it for most of our analysis below.
First, we cleaned the tweets of CNBC’s twitter account and did some analysis on it.
cnbc_tsla <- read_csv("data/Twitter_Data/CNBC_TSLA_News.csv")
cnbc_tsla <- cnbc_tsla %>% filter(!duplicated(text) == TRUE)
problems(cnbc_tsla)
# tibble [0 x 4]
# ... with 4 variables: row <int>, col <int>, expected <chr>, actual <chr>
# A tibble: 2,244 x 7
year month day text replies retweets likes
<chr> <chr> <chr> <chr> <int> <int> <int>
1 2016 11 10 Tesla shares downshift into u~ 4 10 8
2 2016 11 10 Cramer explains why investors~ 3 8 12
3 2016 11 12 Elon Musk: Robots will take y~ 30 55 56
4 2016 11 16 Tesla's ludicrously fast car ~ 1 9 26
5 2016 11 16 How billionaire tech mogul El~ 1 16 17
6 2016 11 17 BREAKING: Tesla's acquisition~ 1 58 45
7 2016 11 17 Tesla and SolarCity sharehold~ 0 10 7
8 2016 11 19 How billionaire tech mogul El~ 3 12 19
9 2016 11 21 JUST IN: Tesla's acquisition ~ 4 33 29
10 2016 11 21 Musk got what he wanted in Te~ 0 10 6
# ... with 2,234 more rows
(cnbc_daily_res_table <- cnbc_by_date %>% group_by(year, month, day) %>% summarise(
c_dailyLikes = sum(likes),
c_dailyRep = sum(replies),
c_dailyRet = sum(retweets)
) )
# A tibble: 541 x 6
# Groups: year, month [?]
year month day c_dailyLikes c_dailyRep c_dailyRet
<chr> <chr> <chr> <int> <int> <int>
1 2016 11 10 20 7 18
2 2016 11 12 56 30 55
3 2016 11 16 43 2 25
4 2016 11 17 52 1 68
5 2016 11 19 19 3 12
6 2016 11 21 35 4 43
7 2016 11 22 82 8 62
8 2016 11 28 13 1 6
9 2016 11 29 15 2 15
10 2016 12 01 17 5 17
# ... with 531 more rows
cnbc_tsla<-cnbc_tsla%>%filter(!(duplicated(text)==TRUE))
daily_number_cnbc<-cnbc_tsla%>%mutate(Date=as.character.Date(time))%>%group_by(Date)%>%count()
ggplot(daily_number_cnbc)+geom_histogram(aes(n))
elon_musk <- read_csv("data/Twitter_Data/Elon_Musk_(@elonmusk)_Twitter.csv") %>% distinct()
problems(elon_musk)
# tibble [0 x 4]
# ... with 4 variables: row <int>, col <int>, expected <chr>, actual <chr>
by_date <- elon_musk %>% separate(time, into = c("year", "month", "day"), sep = "-" )
daily_res <- by_date %>% group_by(year, month, day) %>% summarise(
dailyLikes = sum(likes),
dailyReplies = sum(replies),
dailyRetweets = sum(retweets)
)
tesla_elon <- read_csv("data/Twitter_Data/(_) @elonmusk @Tesla - Twitter Search_with_scores.csv")
problems(tesla_elon)
# tibble [0 x 4]
# ... with 4 variables: row <int>, col <int>, expected <chr>, actual <chr>
by_date <- tesla_elon %>% separate(time, into = c("year", "month", "day"), sep = "-" )
daily_res <- by_date %>% group_by(year, month, day) %>% summarise(
dailyLikes = sum(likes),
dailyReplies = sum(replies),
dailyRetweets = sum(retweets)
)
As we expected, the log values of the number of likes, retweets, and replies are correlated. The number of tweets in a day are right-skewed so we would apply log transformation a lot in the later analysis.
We also found out that there some outliers with the number of tweets with #Tesla. So, we filtered these outliers to see what happened on these dates. These just happen to be the dates when there are major news about Tesla, for example, when Tesla released its Q2 earings call or when Elon Musk said that Tesla would go private. This tells us tweets do have critical business information.
# A tibble: 4 x 2
# Groups: Date [4]
Date n
<chr> <int>
1 2018-05-03 31
2 2018-08-08 29
3 2018-09-28 27
4 2018-08-07 22
# A tibble: 27 x 5
time text replies retweets likes
<date> <chr> <int> <int> <int>
1 2018-05-03 Sacconaghi on Musk earnings call: 'T~ 6 26 55
2 2018-05-03 Tesla bull sounds off after conferen~ 1 2 5
3 2018-05-03 Tesla opens 7 percent down the day a~ 5 10 8
4 2018-05-03 Musk's bizarre earnings call was 'th~ 6 18 20
5 2018-05-03 Elon Musk is acting like he 'plans t~ 1 8 17
6 2018-05-03 Short-sellers have been looking for ~ 1 9 15
7 2018-05-03 Tesla saw its worst day in more than~ 3 3 2
8 2018-05-03 .@JimCramer thanks Tesla CEO Elon Mu~ 6 11 21
9 2018-08-07 A Tesla leveraged buyout would be 'b~ 9 24 34
10 2018-08-07 Securities lawyers shocked by Elon M~ 2 13 19
# ... with 17 more rows
So we filtered out the days that has more than 20 tweets. The four dates are May 5, August 7 and 8, September 28.
May 5 is the day when Elon Musk cut off Wall Street analysts and call their questions boring, And in August he said on his personal twitter account that he will take Tesla private. Then in September he was sued by SEC for fraud. And all of this were followed by a significant drop in Tesla’s stock price. You can find the relates tweets from our shiny app.
Regression
Daily Number Regession
So, a natural thought would be that: Would the change of Tesla’s stock price be explained by the numbers of twitters on that day? Does more discussions on social media leads to more changes in stock price? We have done the same set of analyses on all three tweet dataset and they have all yeiled similar results. So we would just present the @elonmusk one here.
daily_number<-elon_musk%>%mutate(Date=as.character.Date(time))%>%group_by(Date)%>%count()
daily_number_price<-daily_number%>%left_join(daily_price,by=c("Date"))
mod_num1<-lm(log(Close)~n,data=daily_number_price)
coef1<-coef(mod_num1)
summary(mod_num1)
Call:
lm(formula = log(Close) ~ n, data = daily_number_price)
Residuals:
Min 1Q Median 3Q Max
-0.22290 -0.06323 0.00715 0.07811 0.19183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7514908 0.0101514 566.571 <2e-16 ***
n -0.0006115 0.0026716 -0.229 0.819
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09093 on 199 degrees of freedom
(79 observations deleted due to missingness)
Multiple R-squared: 0.0002632, Adjusted R-squared: -0.004761
F-statistic: 0.05238 on 1 and 199 DF, p-value: 0.8192
ggplot(daily_number_price,aes(x=log(n),y=log(Close)))+geom_point()+
geom_abline(intercept = coef1[1],slope = coef1[2],color="red")
daily_number_price_cnbc<-daily_number_cnbc%>%left_join(daily_price,by=c("Date"))
mod_num2<-lm(log(Close)~n,data=daily_number_price_cnbc)
coef2<-coef(mod_num2)
summary(mod_num2)
Call:
lm(formula = log(Close) ~ n, data = daily_number_price_cnbc)
Residuals:
Min 1Q Median 3Q Max
-0.51254 -0.06600 0.02346 0.10664 0.23487
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.713405 0.010263 556.712 <2e-16 ***
n 0.002485 0.001664 1.493 0.136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1448 on 436 degrees of freedom
(103 observations deleted due to missingness)
Multiple R-squared: 0.00509, Adjusted R-squared: 0.002808
F-statistic: 2.23 on 1 and 436 DF, p-value: 0.136
ggplot(daily_number_price_cnbc,aes(x=log(n),y=log(Close)))+geom_point()+
geom_abline(intercept = coef2[1],slope = coef2[2],color="red")
price_lag<-daily_price%>%mutate(change=(log(Close)-lag(log(Close),1)))
lag<-daily_number_price_cnbc%>%left_join(price_lag,by=c("Date"))
mod_num3<-lm(abs(change)~log(n),data=lag)
coef3<-coef(mod_num3)
ggplot(lag,aes(x=log(n),y=log(abs(change))))+geom_point()+
geom_abline(intercept = coef3[1],slope = coef3[2],color="red")
Call:
lm(formula = abs(change) ~ log(n), data = lag)
Residuals:
Min 1Q Median 3Q Max
-0.032147 -0.013294 -0.004232 0.007267 0.131593
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.011123 0.001789 6.216 1.19e-09 ***
log(n) 0.008865 0.001246 7.116 4.61e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02062 on 435 degrees of freedom
(104 observations deleted due to missingness)
Multiple R-squared: 0.1043, Adjusted R-squared: 0.1022
F-statistic: 50.64 on 1 and 435 DF, p-value: 4.607e-12
Then we also ran regression of the number of likes on the change of price.
elon_musk<-read_csv("data/Twitter_Data/Elon_Musk_(@elonmusk)_Twitter.csv") %>% distinct()%>%
mutate(Date=as.character(time))
problems(elon_musk)
# tibble [0 x 4]
# ... with 4 variables: row <int>, col <int>, expected <chr>, actual <chr>
daily_likes <- aggregate(elon_musk$likes, by=list(elon_musk$Date), sum) %>%
rename( Dates = Group.1, likes = x) %>% mutate( Date = as.character(Dates)) %>%
select(Date, likes)
daily_price<-read_csv('data/Stock_Data/TSLA.csv')%>%
transmute(Date=as.character.Date(Date),Open,Close,Volume)
t_elon_S_price <- inner_join( daily_price, daily_likes, by = "Date" )
beta <- coef(lm( log(Close) ~ likes , data = t_elon_S_price))
ggplot( t_elon_S_price ) + geom_point( aes(log(likes),log(Close) )) +
geom_abline(aes( intercept = beta[1], slope = beta[2]), color = 'red',
alpha = 0.3, size = 1 )
We can see there is no clear correlations between any of the aforementioned variables and the stock price.
Multivariate Regression
Since the change of stock price cannot be simply explained by the numbers of tweets, we thought it may be better explained by the actual content of the text of each tweet.
We assigned sentiment scores to each twitter text and regress them on the price change.
Sentiment Scores
So we decided to do some natural language processing, we washed off hashtags, and links, and used an NLP library to assign tweets with 8 different emotion scores.
name <- "Elon_Musk_(@elonmusk)_Twitter"
name <- "CNBC_TSLA_News"
# name <- "@elonmusk_@Tesla"
import_path <- paste("data/Twitter_Data/", name, ".csv", sep='')
tweets <- read_csv(import_path) %>%
mutate(text = str_to_lower(text)) %>%
mutate(text = str_replace_all(text, "https?.*\\s?","")) %>%
mutate(text = str_replace_all(text, "[#@].*\\s?","")) %>%
mutate(text = str_replace_all(text, "pic.twitter.*\\s?","")) %>%
mutate(text = str_replace_all(text, "rt","")) %>%
mutate(text = str_replace_all(text, "[[:punct:]]","")) %>%
mutate(text = str_replace_all(text, "[ |\t]{2,}","")) %>%
mutate(text = str_replace_all(text, "^\\s","")) %>% # Remove blank spaces at the beginning
mutate(text = str_replace_all(text, "\\s$", "")) %>% #Remove blank spaces at the end
mutate(text = removeWords(text,stopwords()))
sentiment_scores <- get_nrc_sentiment(tweets$text)
tweets_with_scores <- merge(tweets, sentiment_scores, by="row.names", all.x=TRUE) %>%
select(-1)
export_path <- paste("data/Twitter_Data/", name, "_with_scores.csv", sep='')
write.table(tweets_with_scores, file = export_path, sep=",", row.names=FALSE)
cnbc_score<-read_csv("data/Twitter_Data/CNBC_TSLA_News_with_scores.csv")
sentiment_cnbc<-list(x=c("anger","anticipation","disgust","fear","joy","sadness","surprise","trust","negative","positive"),y=cnbc_score%>%
select(anger,anticipation,disgust,fear,joy,sadness,surprise,trust,negative,positive)%>%
colSums())
sentiment_cnbc<-as_tibble(sentiment_cnbc)
colnames(sentiment_cnbc)<-c("sentiment","value")
ggplot(sentiment_cnbc)+geom_col(aes(sentiment,value,fill=sentiment))
As you can see all eight emotions increase in intensity as time goes by. It is explained by the fact that the number of CNBC tweets related to Tesla gradually increases.
Multivariate Regression
Then we can plot the emotions against the stock price fluctuations.
price_lag <- daily_price %>%
mutate(log_close = log(Close)) %>%
mutate(change=(Close/lag(Close,1)-1))
tweets_with_scores <- tweets_with_scores %>%
mutate(Date=as.Date(time, format = "%d.%m.%Y")) %>%
select(Date, everything()) %>%
select(-time)
emotions <- colnames(tweets_with_scores)[6:15] %>%
paste(shQuote(., type="sh"), collapse=", ")
summarized_tweets_with_scores <- tweets_with_scores %>%
group_by(Date) %>%
summarise(
anger = sum(anger),
anticipation = sum(anticipation),
disgust = sum(disgust),
fear = sum(fear),
joy = sum(joy),
sadness = sum(sadness),
surprise = sum(surprise),
trust = sum(trust),
negative = sum(negative),
positive = sum(positive)
)
daily_price_with_scores <- price_lag %>%
mutate(Date=as.Date(Date)) %>%
left_join(summarized_tweets_with_scores, by="Date") %>%
.[complete.cases(.), ]
daily_price_with_scores %>% gather("id", "value", 7:16) %>%
ggplot(., aes(Date, value))+
geom_point(position = "jitter")+
geom_smooth(method = "lm", se=FALSE, color="blue")+
facet_wrap(~id)
Note that the emotion scores do not add up to a fixed number like 100% but are discrete numbers based on the average emotion scores of all tweets on a given day. This also increases the difficulty of our research since the independent variables of emotions might be quite correlated with each other.
(daily_price_with_scores %>%
filter(negative>2.5) %>%
filter(Date>as.Date("2018-03-20")) %>%
ggplot(aes(x=Date)) +
geom_line(aes(y=negative,color="blue"))+
geom_line(aes(y=change*200,color="red"))+
scale_y_continuous(sec.axis = sec_axis(~.*10,name="stock")))
After filtering out the low-level changes in emotions, we can see a relatively nicely kinda symmetric chart reflecting how negativity influences stock prices.
#We cleaned the data and ran a multivariate regression on multiple sentiment scores.
mod_multi<-lm(log_close~anger+anticipation+disgust+fear+joy+sadness+surprise+trust+negative+positive,data=daily_price_with_scores)
summary(mod_multi)
Call:
lm(formula = log_close ~ anger + anticipation + disgust + fear +
joy + sadness + surprise + trust + negative + positive, data = daily_price_with_scores)
Residuals:
Min 1Q Median 3Q Max
-0.51553 -0.06532 0.02118 0.10198 0.23274
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7188756 0.0091740 623.379 <2e-16 ***
anger -0.0099632 0.0069658 -1.430 0.153
anticipation 0.0035029 0.0057313 0.611 0.541
disgust 0.0140225 0.0117404 1.194 0.233
fear -0.0016951 0.0077855 -0.218 0.828
joy 0.0085384 0.0081186 1.052 0.294
sadness -0.0062242 0.0084448 -0.737 0.462
surprise -0.0083060 0.0075390 -1.102 0.271
trust 0.0048249 0.0051459 0.938 0.349
negative 0.0006514 0.0056095 0.116 0.908
positive 0.0003931 0.0033989 0.116 0.908
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1428 on 426 degrees of freedom
Multiple R-squared: 0.0282, Adjusted R-squared: 0.005383
F-statistic: 1.236 on 10 and 426 DF, p-value: 0.2656
The correlations for the multivaraite independent variables are quite low that we believe they are not great indicators for the price variale.
(Binary) Logistic Regression
Finally, we also attempted to simplify the stock movements as up or down, a binary variable and conducted logistic regress. We take negative and positive and ran a logistic regression to see how these attitudes would influence the probality that the stock price would go up.
# We also take negative and positive and ran a logistic regression to see how these attitudes would influence the probality that the stock price would go up.
binary<-daily_price_with_scores%>%mutate(binary=as.integer(change>0))
mod_binomial<-glm(binary~positive+negative,data=binary,family=binomial)
coef(mod_binomial)
(Intercept) positive negative
0.24266726 0.01417427 -0.14016648
The result also confirms our hypothesis that the positive sentiment are positively correlated with the stock prices and negative negatively correlated.
Correlation Matrix
Lastly, we would use a correlation matrix to analyze the effect of every variable on every other variable. This table visualizes the correlations by showing positive as blue, negative as red, and depth and size of the bubbles as the magnitude of correlations.
library(Hmisc)
library(corrplot)
mydata_daily_price_with_scores <- daily_price_with_scores[, c(2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)]
(res <- cor(mydata_daily_price_with_scores) %>% round( 2))
Open Close Volume log_close change anger anticipation
Open 1.00 0.98 0.07 0.98 -0.07 -0.04 0.06
Close 0.98 1.00 0.08 0.99 0.08 -0.06 0.05
Volume 0.07 0.08 1.00 0.09 0.05 0.37 0.42
log_close 0.98 0.99 0.09 1.00 0.08 -0.04 0.06
change -0.07 0.08 0.05 0.08 1.00 -0.24 -0.13
disgust fear joy sadness surprise trust negative positive
Open 0.00 -0.04 0.10 -0.03 0.00 0.07 -0.03 0.07
Close 0.00 -0.06 0.08 -0.06 -0.01 0.06 -0.05 0.05
Volume 0.38 0.40 0.19 0.39 0.27 0.40 0.46 0.41
log_close 0.01 -0.04 0.10 -0.04 0.00 0.07 -0.03 0.06
change -0.07 -0.20 -0.09 -0.21 -0.11 -0.13 -0.23 -0.12
[ reached getOption("max.print") -- omitted 10 rows ]
res2 <- rcorr(as.matrix(res))
corrplot(res, type="upper", order="hclust", p.mat = res2$P, sig.level = 0.01, insig = "blank")
We found that stock price is weakly negatively correlated with emotion intensities. We hypothesize that people tend to get more emotional about low stock prices than the high ones.
However, we also find that emotion intensities have a mostly moderately positive correlation with the trading volume, which is the number of stocks people buy or sell in a day. It seems intuitive since when people are emotional, they are more `. All optimistic emotions such as anticipation and trust are slightly more positively correlated with the trading volume than all pessimistic emotions.
Conclusion
Now we conclude with the following points:
The numbers of tweets/likes is not a good predictor of the change in stock price.
The attitudes of the twitter have a influence on the probability of the increase of stock price. Postive tweets will rise the probability of the rice of stock price while negative tweets will lower the probability.
Using minutely (instead of daily) stock prices do not have a significant influence
The numbers of optimistic and pessimistic tweets tend to experience similar changes, reflecting people’s splitted view on Tesla.
All optimistic emotions such as anticipation and trust in tweets are weakly positively correlated with the trading volume of the Tesla stock, while all pessimistic emotions are less positively correlated
The explanatory power of our model may subject to the accuracy of the sentiment score we assigned and the way we measure the changes. The auto correlation between each sentiment can also influece the outcome of our regression.So there are still improvement that can be done to our model.
Future Analysis
Of course, different time-series analysis methodology might bring up new findings on the relationship between tweet emotions and stock prices.
Since TSLA stock surged in popularity both on Wall Street and on social media in recent years, our analysis window is likely limited and biased.
Shiny App
We also made a [Shiny App: zackLight.com ] app. It displays the news as you select the date range in the stock price chart. Please check it out.