Although I am a computer programmer, I also studied Mathematics in college. Naturally, I was excited when a computer program I was working on involved some “complex” math (don’t worry, no integrals or infinite series ahead).

## The Problem

I was trying to display a tag cloud on my company’s website. However, there was a problem; the tag “Ozone” appeared about 100 times more often then almost any other tag. When I tried to display them using a linear scale the results looked like this:

## What I tried

My first thought was to adjust the linear scale. To make the other tags visible, I could just increase the slope of the equation so instead of using something like y=1x+2 I would use y=5x+2. Here is what I got:

## The Solutions

Once I realized the linear transformation wouldn’t work, my next thought was logarithms. When I graphed the data it fit a power curve quite nicely (R^{2} of .917). Since the data fit a power curve, I was pretty sure I could make logarithms work. However, I had a few other constraints. In order to keep the tag cloud a consistent size, I needed the maximum of the equation to be 1 and the minimum to be 1/3.

Starting Equation: y = log(x)

### Adjusting the Maximum

To bring the maximum value down to 1, I just divided by the log of the biggest value in the data set:

y = log(x)/log(x_{max})

### Ajusting the Minimum

To adjust the minimum, I just preformed a simple linear transformation on my previous result:

y = f(x)/1.5 + 1/3

This gave me a final equation of

y = log(x)/log(x_{max})/1.5 + 1/3

The results were exactly what I was looking for

If you would like to see the data for yourself, here is a spreadsheet in odf format that contains the data, the equations, and the graph: Log-Transformations.ods (you can download a viewer here).

Awesome post. I found this extremely helpful while pondering how to generate a tag cloud of my own. I suspected that other mathematicians like me had already tackled the problem and turned to Google before spending too much time fitting my tags to a distribution.

I saw some excruciatingly tortured examples of how to get a weighting. Your solution is both easy to understand and elegant, and very easy to customize given different constraints.

Thanks for sharing it!

I am glad it was helpful