WordCloud Visualization for Books that I Read

01 Feb 2019

Reading time ~2 minutes

My Visualizations on Shakespeare's Sonnets

Besides writing down the book reflections, I was wondering if there is anything else that I could do with the books that I read. The word cloud visualization for texts immediately popped into my mind. I tried a few online free tools but wasn’t really happy with what I got, especially for the fact that I can’t control what words to filter out. So I decided to do some research online and see if I could write my own script to generate the visualization. It turned out that it was not difficult at all because there is already an open source Python library available for generating word cloud, all I need to do is to understand its various APIs so that I could do my own customization.

When it comes to word cloud visualization, there are mainly five areas that are great for customization:

Color Theme of the Texts
Shape of Visualization (Mask)
Fonts
Words Filtering
Background Color

Color Theme of the Texts

To change the color theme, we could install the paltettable library and import different existing color combinations:

from palettable.wesanderson import Zissou_5,GrandBudapest1_4,Moonrise5_6,Margot1_5

Once the library is imported, we can then define the color function, which will then be passed into the WordCloud recolor function as an argument later:

def color_func3(word, font_size, position, orientation, random_state=None, **kwargs):
    return tuple(Zissou_5.colors[random.randint(1,4)])

Shape of Visualization

Having the visualization always in a rectangle or square shape is kind of boring, we would want the visualization to go into any shape that we like. Again, achieving that is simple:

def create_mask(imagename):
    icon = Image.open(imagename)
    mask = Image.new("RGB", icon.size, (255,255,255))
    mask.paste(icon,icon)
    mask = np.array(mask)
    return mask

image_path = "images/wine.png"

mask = create_mask(image_path)

This mask object would be passed into the WordCloud API as an argument to specify the shape of the visualization.

Fonts

Download the types of fonts that you prefer (there are a lot of open source fonts that you could download from https://fonts.google.com/). And then read them in as a variable, which would also be passed into the WordCloud API as an argument to specify the fonts of the texts:

font3 = "Fonts/LuckiestGuy-Regular.ttf"

Words Filtering

Although the WordCloud library comes with a list of filtered words. You would very often still want to filter your own words when some less meaningful words end up big in your visualization. You could simply add the words that you want to filter ontop of the already filtered list:

stopwords = set(STOPWORDS)
stopwords.update(["one","will","page","et","al","instead",'still'])

Background Color

backgroundlight = "#FAEBD7"

Generating WordCloud Visualization

Once we have all the customization ready, we are good to generate the word cloud visualization and save it as an image:

mycloud = WordCloud(font_path=font5, max_words=1000, max_font_size=100, scale = 2, 
                    stopwords=stopwords, mask=mask,random_state = 42,background_color=background_light)

mycloud.generate(text)
mycloud.recolor(color_func=color_func3)
mycloud.to_file('Visualizations/shakespeares_sonnets1.png')

You could also clone my repo and run the jupyter notebook with some existing color themes, images and fonts. If you find it useful, please also give the repo a star. Hope you enjoy it!