How Do People Feel About Drugs?

Sentiment Analysis of Erowid's Experience Vault

Erowid is an organization that aims to educate the public about illicit drugs. Their website contains lots of information about pretty much any drug imaginable. Perhaps the most interesting portion of the website is the 'experience vault' where users can share their experiences on substances. These stories are incredible - full of joy, sadness, violence, sex, and more. A social scientist's dream. It's a fun read.

I first became aware of this website in 2017. I thought the dataset was really interesting, so naturally, I wrote a scraper using Scrapy and gathered all the data in a gigantic .csv. In 2020, I revisted this project with the goal of collecting new data and refining the data structure. This time, I decided against using a scraper and instead reached out to the Erowid team, who provided me with an API key and the relevant documentation I needed. The Erowid API is easy to use and made importing the data to my database a breeze. In total, I gathered ~35,000 stories from the experience vault. I used TextBlob to perform sentiment analysis. Sentiment is analyzed on a sentence-by-sentence level, ascribing a value between 1 (very positive) and -1 (very negative), to each sentence in the forum post. I then took the average of all of these sentences to give each story an overall sentiment value.

Every story also includes demographic data, relevant dates, a user-perscribed story "category" summarizing the setting of the experience, and more. The result is a very interesting dataset that I think has been underexplored.

The first thing I'll do is share some sentences that my analysis concluded to be very positive and a few that were interpeted as negative sentences.


  • I had a tingling sensation, and a very pleasant sensation it was.
  • I had a wonderful night, perhaps one of the greatest moments I have ever had.
  • I was so relaxed and content and everything was just SO awesome.
  • I would have done anything to make that wonderful feeling go away.
  • I turn on VH1 and watch the 100 greatest Rock and Roll Artists.
  • Like I was always fighting the snakes, because they seemed evil.
  • According to the attending anaesthesiologist, I was 'insane' for five days.
  • Their skin would look disgusting, their voices would give me chills.
  • Then it's as if a pit or void opened up in my stomach a horrible feeling.
  • I watched some television, and it became boring so I turned it off.

I think the spread of sentences is interesting. The majority are correctly indentified, but some (namely the sentences referring to VH1's 'Greatest Rock and Roll Artists' and television being 'boring') are just a little off. That said, I can also see why the algorithm identified them they way it did. I think, after examining the data quite closely, the algorithm has done a fairly good job and I think the results are reasonable, particularly considering that I am taking the average of many thousands of sentences.

The following is a chart of the average sentiment of stories about the seven most popular substances on the message board. The gender of the author is also available, so I included that dimension in the analysis.


Average Sentiment by Substance, Gender

The results were not as drastic as I was anticipating. There are a few things to point out, though. Notice how closely the 'aggregate' data tracks the 'male' options are. Given that I am recording 'average' values, this implies that the bulk of stories are shared by men. Women generally share stories with a lower sentiment value than men. This is not the case for two out of seven substances - Mushrooms and LSD. MDMA, sometimes referred to as ecstacy, had the highest overall sentiment value for both men and women.


Each experience is given a user-perscribed category. There are fifteen categories available to choose from. In the following chart, I graph the average sentiment by category and substance. So for example, the stories about cocaine use with a category of 'Glowing Experiences' were the most positive, while stories with a category of 'Bad Trips' were most negative.



The results are about what I expected. The categories 'Glowing Experiences', 'Mystical Experiences' and 'Health Benefits' are routinely in the top three, while 'Bad Trips', 'Health Problems', and 'Trip Disasters' are almost always near the bottom. This visualization quite clearly demonstrates the generally positive nature of stories pertaining to MDMA and Mushrooms. LSD stories categorized as 'Glowing Experiences' had the highest sentiment value of any sustance/category combination.

Just from an aesthetic standpoint I'm really pleased with the way this visualization turned out. Looks nice and portrays data well.

I think we can agree that some of these category definitions are strongly emotional. A 'Glowing' or 'Mystical' experience generally conotes a very positive experience while terms like 'Trainwreck' or 'Bad Trip' conjure negative images, particularly in the context of drug use. I thought it would be interesting to explore the commonality of these sorts of an experiences as it relates to the entire body of experiences. To do so I defined the categories 'Glowing Experiences', 'Mystical Experiences' to be positive categories, 'Bad Trips', 'Trainwrecks / Trip Disasters' as negative categories and then calculated the percentage of the substance's total experiences these categories made up. I also included a third categorization - bodily harm. The bodily harm category group includes 'Health Problems' and 'Addiction / Habituation'. These two categories make up the stories that detail physical risk to the user.


Ratio of Strongly Emotional Categories

This chart is pretty interesting. The most obvious thing to note is the extremely high percentage of stories in the personal harm group for the stimulants cocaine and amphetamines. These are highly addictive substances with overt health effects, so the prevalence of these stories makes sense. But the fact that over 25% of the 707 cocaine experiences I captured are categorized this way is alarming. It's also interesting to note that neither of these substances produce very many intensely positive or negative stories.

The two hallucinogenics- mushrooms and LSD- on this chart are also worth discussing. They produce high rates of both negative and positive experiences, but very few stories detailing personal risk. This is likely because these substances are not addictive and tend to not put the user in harms way, but the experience is still very intense and can be pyschologically difficult.