This weekend I started, or rather resurrected a project/idea that I have been kicking around for about a year. Last summer at work, I was busy fixing bugs and was wondering where the majority of the issues were coming from; were they mostly related to the feature I was working on, or were they spread out across the code, and also, who was handling most of these issues. I could dig through the commit history of svn, but its a big project, and where would I start?
Since we use cruisecontrol, I started there, since it tracks all the changes to a project, but even then, it just gives me lists and lists of changed files. So, I decided to do some processing on the cruisecontrol information, and then visualize the data. I decided to visualize the data as a tag cloud, where the more a file has been changed, the higher the weight. My first attempt worked, but it wasn't pretty. The scaling was off, I processed and stored the data in a weird way, and the resulting visualization was, well, pretty ugly.
Fast forward a year, and the same questions were in my head: what's changing?, who's changing it? and can we see any patterns? I was also interested in a more abstract representation, instead of straight text. One of my inspirations for the project was code swarm, which visualizes which developer is working on what pieces of code. By chance, I came across this library, and I was inspired back to work. I use that library to render the data, and I wrote a little grails app to do the heavy lifting and process all the commit data. The results of my work is this: Commit Visualizer.
NOTE: The app linked above uses a static snapshot of generated data, with the names of the files,packages, and developers randomly generated. Branch 1,2,and 3 are three fictional branches I created to demonstrate the tool. Also, the real app that I have locally will dynamically generate the output, every time a request is made. (initial loading, branch change, etc...)
After looking at the generated visualizations, there were a few things that I found interesting. Branch 1 has relatively few changes made to it, as well as most of the files being modified equally. Branch 2 is a bit more active and Branch 3 represents a branch under heavy feature development. This can be seen by the larger amount of large circles. Something interesting that I never saw before the visualizations is that even though there are alot of changes in Branch 2 and 3, they are mostly related (as seen by the small number of packages that are changed) and done by mainly one developer. I found this pretty cool, and one reason why this could be is that these changes line up with a new feature being developed or refined.
There is a lot of room for improvement, and will be things I might be tackling in the future. The UI of the app is very basic, and could be made a lot better. Right now when you click a node, it generates a basic alert, which is more or less just a debugging tool. I would like to do something other than that on a click. One thing that I think might be interesting would be to have the visualization be animated, showing the change over time. Another thing that might be cool would be to link the visualization with a bug database, and then use that to see how changes are related to the bugs. Or maybe even link it with key dates in the project lifecycle, to see how that relates to the changes. Also, I want to be able to hook up to other projects, ones that don't use cruisecontrol, so I need to find another way to get the data I need. Finally, I think I'll need to modify the rendering of the data. As you probably noticed, when there is a node that has dramatically higher frequency, it really scales the other nodes down.
Anyways, check it out, and let me know if you have any ideas for suggestions!