In [3]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [4]:
data = pd.read_csv("C:/Users/Rahul/Desktop/Capstone papers/Reviews.csv")
In [5]:
df1 = data.iloc[:, [4,5,6,9]]
In [6]:
df1.head()
Out[6]:
HelpfulnessNumerator HelpfulnessDenominator Score Text
0 1 1 5 I have bought several of the Vitality canned d...
1 0 0 1 Product arrived labeled as Jumbo Salted Peanut...
2 1 1 4 This is a confection that has been around a fe...
3 3 3 2 If you are looking for the secret ingredient i...
4 0 0 5 Great taffy at a great price. There was a wid...
In [7]:
data.shape
Out[7]:
(568454, 10)
In [8]:
data.head()
Out[8]:
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary Text
0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian 1 1 5 1303862400 Good Quality Dog Food I have bought several of the Vitality canned d...
1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa 0 0 1 1346976000 Not as Advertised Product arrived labeled as Jumbo Salted Peanut...
2 3 B000LQOCH0 ABXLMWJIXXAIN Natalia Corres "Natalia Corres" 1 1 4 1219017600 "Delight" says it all This is a confection that has been around a fe...
3 4 B000UA0QIQ A395BORC6FGVXV Karl 3 3 2 1307923200 Cough Medicine If you are looking for the secret ingredient i...
4 5 B006K2ZZ7K A1UQRSCLF8GW1T Michael D. Bigham "M. Wassir" 0 0 5 1350777600 Great taffy Great taffy at a great price. There was a wid...
In [9]:
data.isnull().sum()
Out[9]:
Id                         0
ProductId                  0
UserId                     0
ProfileName               16
HelpfulnessNumerator       0
HelpfulnessDenominator     0
Score                      0
Time                       0
Summary                   26
Text                       0
dtype: int64
In [10]:
data = data.dropna()
In [11]:
data.isnull().sum()
Out[11]:
Id                        0
ProductId                 0
UserId                    0
ProfileName               0
HelpfulnessNumerator      0
HelpfulnessDenominator    0
Score                     0
Time                      0
Summary                   0
Text                      0
dtype: int64
In [14]:
for i in range(5):
    print("Review #", i+1)
    print(data.Summary[i])
    print(data.Text[i])
    print()
('Review #', 1)
Good Quality Dog Food
I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.
()
('Review #', 2)
Not as Advertised
Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".
()
('Review #', 3)
"Delight" says it all
This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.
()
('Review #', 4)
Cough Medicine
If you are looking for the secret ingredient in Robitussin I believe I have found it.  I got this in addition to the Root Beer Extract I ordered (which was good) and made some cherry soda.  The flavor is very medicinal.
()
('Review #', 5)
Great taffy
Great taffy at a great price.  There was a wide assortment of yummy taffy.  Delivery was very quick.  If your a taffy lover, this is a deal.
()
In [21]:
data.loc[:, 'Text']= data['Text'].str.lower()
In [22]:
data.head(5)
Out[22]:
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary Text
0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian 1 1 5 1303862400 Good Quality Dog Food i have bought several of the vitality canned d...
1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa 0 0 1 1346976000 Not as Advertised product arrived labeled as jumbo salted peanut...
2 3 B000LQOCH0 ABXLMWJIXXAIN Natalia Corres "Natalia Corres" 1 1 4 1219017600 "Delight" says it all this is a confection that has been around a fe...
3 4 B000UA0QIQ A395BORC6FGVXV Karl 3 3 2 1307923200 Cough Medicine if you are looking for the secret ingredient i...
4 5 B006K2ZZ7K A1UQRSCLF8GW1T Michael D. Bigham "M. Wassir" 0 0 5 1350777600 Great taffy great taffy at a great price. there was a wid...
In [27]:
for i in range(5):
    print("Review #", i+1)
    print(data.Summary[i])
    print(data.Text[i])
    print()
    
('Review #', 1)
Good Quality Dog Food
i have bought several of the vitality canned dog food products and have found them all to be of good quality. the product looks more like a stew than a processed meat and it smells better. my labrador is finicky and she appreciates this product better than  most.
()
('Review #', 2)
Not as Advertised
product arrived labeled as jumbo salted peanuts...the peanuts were actually small sized unsalted. not sure if this was an error or if the vendor intended to represent the product as "jumbo".
()
('Review #', 3)
"Delight" says it all
this is a confection that has been around a few centuries.  it is a light, pillowy citrus gelatin with nuts - in this case filberts. and it is cut into tiny squares and then liberally coated with powdered sugar.  and it is a tiny mouthful of heaven.  not too chewy, and very flavorful.  i highly recommend this yummy treat.  if you are familiar with the story of c.s. lewis' "the lion, the witch, and the wardrobe" - this is the treat that seduces edmund into selling out his brother and sisters to the witch.
()
('Review #', 4)
Cough Medicine
if you are looking for the secret ingredient in robitussin i believe i have found it.  i got this in addition to the root beer extract i ordered (which was good) and made some cherry soda.  the flavor is very medicinal.
()
('Review #', 5)
Great taffy
great taffy at a great price.  there was a wide assortment of yummy taffy.  delivery was very quick.  if your a taffy lover, this is a deal.
()
In [49]:
def clean_text(text):
    '''Remove unwanted characters, stopwords, and format the text to create fewer nulls word embeddings'''
    
    # Convert words to lower case
   
    
    # Replace contractions with their longer forms 
    
    
    # Format words and remove unwanted characters
    text = re.sub(r'https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE)
    text = re.sub(r'\<a href', ' ', text)
    text = re.sub(r'&amp;', '', text) 
    text = re.sub(r'[_"\-;%()|+&=*%.,!?:#$@\[\]/]', ' ', text)
    text = re.sub(r'<br />', ' ', text)
    text = re.sub(r'\'', ' ', text)

    return text
In [50]:
# Clean the summaries and texts


clean_texts = []
for text in data.Text:
    
    clean_texts.append(clean_text(text))
print("Texts are complete.")
Texts are complete.
In [52]:
for i in range(5):
    print("Clean Review #",i+1)
    
    print(clean_texts[i])
    print()
('Clean Review #', 1)
i have bought several of the vitality canned dog food products and have found them all to be of good quality  the product looks more like a stew than a processed meat and it smells better  my labrador is finicky and she appreciates this product better than  most 
()
('Clean Review #', 2)
product arrived labeled as jumbo salted peanuts   the peanuts were actually small sized unsalted  not sure if this was an error or if the vendor intended to represent the product as  jumbo  
()
('Clean Review #', 3)
this is a confection that has been around a few centuries   it is a light  pillowy citrus gelatin with nuts   in this case filberts  and it is cut into tiny squares and then liberally coated with powdered sugar   and it is a tiny mouthful of heaven   not too chewy  and very flavorful   i highly recommend this yummy treat   if you are familiar with the story of c s  lewis   the lion  the witch  and the wardrobe    this is the treat that seduces edmund into selling out his brother and sisters to the witch 
()
('Clean Review #', 4)
if you are looking for the secret ingredient in robitussin i believe i have found it   i got this in addition to the root beer extract i ordered  which was good  and made some cherry soda   the flavor is very medicinal 
()
('Clean Review #', 5)
great taffy at a great price   there was a wide assortment of yummy taffy   delivery was very quick   if your a taffy lover  this is a deal 
()
In [55]:
for i in range(5):
    print("Review #", i+1)
    print(data.Summary[i])
    print(data.Text[i])
    print()
('Review #', 1)
Good Quality Dog Food
i have bought several of the vitality canned dog food products and have found them all to be of good quality. the product looks more like a stew than a processed meat and it smells better. my labrador is finicky and she appreciates this product better than  most.
()
('Review #', 2)
Not as Advertised
product arrived labeled as jumbo salted peanuts...the peanuts were actually small sized unsalted. not sure if this was an error or if the vendor intended to represent the product as "jumbo".
()
('Review #', 3)
"Delight" says it all
this is a confection that has been around a few centuries.  it is a light, pillowy citrus gelatin with nuts - in this case filberts. and it is cut into tiny squares and then liberally coated with powdered sugar.  and it is a tiny mouthful of heaven.  not too chewy, and very flavorful.  i highly recommend this yummy treat.  if you are familiar with the story of c.s. lewis' "the lion, the witch, and the wardrobe" - this is the treat that seduces edmund into selling out his brother and sisters to the witch.
()
('Review #', 4)
Cough Medicine
if you are looking for the secret ingredient in robitussin i believe i have found it.  i got this in addition to the root beer extract i ordered (which was good) and made some cherry soda.  the flavor is very medicinal.
()
('Review #', 5)
Great taffy
great taffy at a great price.  there was a wide assortment of yummy taffy.  delivery was very quick.  if your a taffy lover, this is a deal.
()
In [56]:
data.head()
Out[56]:
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary Text
0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian 1 1 5 1303862400 Good Quality Dog Food i have bought several of the vitality canned d...
1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa 0 0 1 1346976000 Not as Advertised product arrived labeled as jumbo salted peanut...
2 3 B000LQOCH0 ABXLMWJIXXAIN Natalia Corres "Natalia Corres" 1 1 4 1219017600 "Delight" says it all this is a confection that has been around a fe...
3 4 B000UA0QIQ A395BORC6FGVXV Karl 3 3 2 1307923200 Cough Medicine if you are looking for the secret ingredient i...
4 5 B006K2ZZ7K A1UQRSCLF8GW1T Michael D. Bigham "M. Wassir" 0 0 5 1350777600 Great taffy great taffy at a great price. there was a wid...