How to build a clean word cloud using pytagcloud without a crowded image - Python

Question 1

You are sorting the tags in ascending order instead of descending, as probably pytagcloud expects. You should change the sorting line to:

sorted_wordscount = sorted(wordscount.iteritems(), key=operator.itemgetter(1),reverse=True)

Once that is fixed, the key parameter is maxsize in make_tags :

create_tag_image(make_tags(sorted_wordscount[:],maxsize=200), 'filename.png', size=(1300,1150), background=(0, 0, 0, 255), layout=LAYOUT_MIX, fontname='Molengo', rectangular=True)

If I understand correctly this sets the maximum font size (that of the tag with the highest frequency) and it calculates all the other sizes in relation to this one. The other parameter that influences how the strings are distributed is the size of the window.

You will have to play with these parameters.

Take into account that the library function get_tag_counts does more than just returning the frequency : it also filters common words, apply lowercase, and in general should give you a better distribution of tags than a simple sorting, as you are doing.

With these changes you should get something like this (obtained with get_tag_counts over the file you linked in your post, in a 1000x1000 window, maxsize=260 and capping to the first 50 tags):

enter image description here

Edit - As requested, the code for creating the image above :

import operator
import os
import urllib2

from roundup.backends.indexer_common import STOPWORDS
import requests, collections, bs4
with open("./const11.txt") as file:
  Data1 = file.read().lower()
  Data = Data1.split()
two_words = [' '.join(ws) for ws in zip(Data, Data[1:])]
wordscount = {w:f for w, f in collections.Counter(two_words).most_common() if f > 5}
sorted_wordscount = sorted(wordscount.iteritems(), key=operator.itemgetter(1),reverse=True)

from pytagcloud import create_tag_image, create_html_data, make_tags, LAYOUT_HORIZONTAL, LAYOUTS, LAYOUT_MIX, LAYOUT_VERTICAL, LAYOUT_MOST_HORIZONTAL, LAYOUT_MOST_VERTICAL
from pytagcloud.colors import COLOR_SCHEMES
from pytagcloud.lang.counter import get_tag_counts

tags = make_tags(get_tag_counts(Data1)[:50],maxsize=260)
create_tag_image(tags,'filename.png', size=(1000,1000), background=(0, 0, 0, 255), layout=LAYOUT_MIX, fontname='Lobster', rectangular=True)`

Using python 2.7.5, on Ubuntu 13.04 with pygame installed with apt-get, and the rest of the packages with pip. "const11.txt" is the text file linked in the question.

Question 2

EDIT: While the TAG_PADDING parameter referenced below in my answer might be of interest for some cases, vinaut's answer is clearly the better one to start with.

Looking at https://github.com/atizo/PyTagCloud/blob/master/pytagcloud/__init__.py, it looks like TAG_PADDING might be the parameter that controls the spacing between words.

Because it's set to a literal value in the source code and it's referenced in several places, you will either have to alter the source code to a parameter that suits you better (and repackage/reinstall) or else copy the source into your own project and alter it accordingly.