How PMI can be used to expand a sentiment lexicon?
The followings are the examples of how PMI can be used to expand a sentiment lexicon:
Let's say you have an existing sentiment lexicon consisting of words that are already labeled with sentiment polarity (e.g., positive or negative). However, you want to expand this lexicon by identifying new sentiment-bearing bigrams using PMI.
Corpus Preparation: Prepare a corpus of text data that is relevant to your sentiment analysis task. This corpus could consist of product reviews, social media posts, or any other text data that is representative of the sentiment you want to analyze.
Bigram Extraction: Extract all the bigrams from the corpus. A bigram is a sequence of two consecutive words. For example, from the sentence "The movie was amazing," the extracted bigrams would be "The movie," "movie was," and "was amazing."
Calculate PMI: Calculate the Pointwise Mutual Information (PMI) for each bigram in the corpus. PMI compares the observed frequency of a bigram with the expected frequency under independence. You can use the following formula to calculate PMI:
PMI(w1, w2) = log2((P(w1, w2) / (P(w1) * P(w2)))
Here, P(w1, w2) represents the joint probability of the bigram (i.e., how often the two words occur together), while P(w1) and P(w2) represent the individual probabilities of each word in the bigram.
Rank and Select Top PMI Scores: Rank the bigrams based on their PMI scores in descending order. Select the top-ranked bigrams with high PMI scores as potential sentiment-bearing bigrams. You can set a threshold or select a specific number of top-ranked bigrams based on your requirements.
Manual Validation: Manually review and validate the selected bigrams to determine their sentiment polarity. This can be done by domain experts or through crowdsourcing. Assign sentiment labels (positive or negative) to the selected bigrams based on their contextual meaning and association.
Lexicon Expansion: Incorporate the newly labeled bigrams into your sentiment lexicon as sentiment-bearing entries. You can include them as additional words or phrases mapped to their respective sentiment polarity.
By using PMI to identify significant and informative word associations, you can expand your sentiment lexicon with new sentiment-bearing bigrams that might not have been included in the original lexicon. This process allows you to capture more nuanced sentiment expressions and improve the coverage and accuracy of your sentiment analysis system.