Monday, April 06, 2009

Download Sentiment140 dataset with 1.6 million tweets


 

.

Sentiment140 dataset with 1.6 million tweets - Sentiment analysis with tweets

What is Sentiment140?

Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

How does this work?

You can read about our approach in our technical report: Twitter Sentiment Classification using Distant Supervision. There are also additional features that are not described in this paper.

How is this different?

Our approach is different from other sentiment analysis sites because:
  • We use classifiers built from machine learning algorithms. Other products use a simpler keyword-based approach which may have higher precision but lower recall.
  • We provide transparency for the classification results of individual tweets. Other sites only surface aggregated metrics, which makes it difficult to assess the accuracy of their classifiers.


Who created this?

Sentiment140 was created by Alec Go, Richa Bhayani, and Lei Huang, who were Computer Science graduate students at Stanford University.
What are the use cases?
Brand management (e.g. windows 10)
Polling (e.g. obama)
Planning a purchase (e.g. kindle)


Kaggle Site

This is the sentiment140 dataset. It contains 1,600,000 tweets extracted using the twitter api . The tweets have been annotated (0 = negative, 4 = positive) and they can be used to detect sentiment .

Content

It contains the following 6 fields:

  1. target: the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)

  2. ids: The id of the tweet ( 2087)

  3. date: the date of the tweet (Sat May 16 23:58:44 UTC 2009)

  4. flag: The query (lyx). If there is no query, then this value is NO_QUERY.

  5. user: the user that tweeted (robotickilldozr)

  6. text: the text of the tweet (Lyx is cool)

Acknowledgements

The official link regarding the dataset with resources about how it was generated is here
The official paper detailing the approach is here

Citation: Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009), p.12.

Inspiration

To detect severity from tweets. You may have a look at this.

.

DOCLINK:

https://www-cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf

.

WEBLINK:

https://www.kaggle.com/kazanova/sentiment140

.

WEBLINK:

http://help.sentiment140.com/home

No comments: