Blog Analysis Part One: Sentiment

Published in Blogging - 23 mins to read

After playing around with TextBlob a little recently, I thought it'd be fun to run my blog posts through its sentiment analysis feature and see if I could gain any insights into how cynical my blog really might be. The only data cleaning I did was removing the blogs that were in Spanish, other than that every post went through the number cruncher, and you can see the results below. It's worth noting the x axis values are potentially slightly misleading, given the various gaps in writing over the years.

The first graph is simply post sentiment polarity by date, but as you can see, it is not exactly the most readable thing in the world. It's difficult to really draw any conclusions from it, but it is cool to note that majority of my posts are positive in sentiment, and we can easily spot some outliers.

The next graph is the ten day rolling average, which offers up slightly more in terms of trends over time, and notably only has one big dip into negative around the end of March 2018 (which makes some amount of sense, given I was very depressed and just playing Runescape 15 hours a day at that point). There are still a lot of peaks and valleys though, making it hard to assign meaning to them.

The last one is the 30 day rolling average, and finally some more distinct shapes have appeared. The question I'd somewhat hoped to answer with this analysis is "is the state of my mental health reflected in my writing?" and the answer is... No. Not really, not without massively clutching at straws at least. The only thing that the graph shows that supports that sentiment analysis could be an indicator of my mental health is that I am broadly more positive in summer, when I tend to be happier as well, but even that is a stretch.

There are an absolute tonne of gotchas when trying to do analysis like this. Firstly, I didn't use the subjectivity score for the sentiment values at all, partially because I'm lazy and just wanted to play around with Chart.js and partially because it became quite apparent that the data isn't particularly useful anyway. TextBlob itself is probably not best suited to this kind of NLP, as I believe it simply assigns every word a positive/neutral/negative value in a string and then takes the average - this is evidenced by my most positive and most negative posts (see below) both being very short, and definitely not being the posts I would consider most positive or negative on the site. Thirdly, while I do try to write honestly (and honestly of the time I feel prettty negative), I do try to "fake it 'til I make it" somewhat when writing here. I know depressed people tend to show it in their writing, even unintentionally, so I always try to at the very least finish my posts on a positive note, as much for my own sake, to try to force myself to see the good side of whatever negative thing I might be discussing. Obviously it's tough to say if that has any bearing on the overall positivity of my results,

My most positive post was supposedly What're You Feeling II, which is rather ironic. I originally wrote a different post that day, and then ended up removing it after it was pointed out to me how spectacularly I'd missed the point, leaving me with a whole lot of not-so-positive feelings. It's polarity was 0.8.

My most negative post was supposedly To Do List, which again I thought was kinda funny as it appears to my non-machine brain to be a somewhat positive post. It's polarity was -0.6.

Of the 650 posts analysed, 573(!) were positive, 6 were neutral and the remaining 72 were negative. The average polarity was 0.1177829444.

I could probably arbitrarily pull out some more stats but I'm not a data scientist and given that we've already established that it's a fairly low quality dataset anyway, I have better things to do with my time. See you tomorrow for part two, lexical analysis!