Thinking Craftsman Toolkit (TCToolkit)

Hosted on Bitbucket At :

Download Latest Version from

Since I am consulting on software development, I felt a need for analyzing code quickly to detect code hotspots, hotspots which need to addressed first. The techniques I am using are different than traditional 'static code analysis' (e.g. using tools like lint, PMD, FindBugs etc). I am using a mix of various code metrics and visualizations to detect 'anomalies'.

The Thinking Craftsman toolkit is a set of programs to quickly analyze the source code. These program are written in Python. The source code for Thinking Craftsman Toolkit is hosted on Bitbucket. It is published under New BSD license.

Given below are few tools in TCToolkit

Code Duplication Detector (CDD):

CDD can analyze files in directory tree and print the duplicates found (ingoring the comments). It is implemented in Python using the Rabin Karp string matching algorithm and Pygments lexers.

Duplicates found in Apache httpd server source using CDD (Complete List Httpd )
  • Number of files analyzed : 865
  • Number of duplicate found : 161
  • Time required for detection : 3 min 30 seconds
Duplicate found in Apache Tomcat source code using CDD (Complete List Tomcat )
  • Number of files analyzed : 1774
  • Number of duplicate found : 436
  • Time required for detection : 4 min 33 seconds

You can visualization the proliferation of duplication in various file as 'Treemap visualization'. Check the screen snap shot of how the visualization looks like. (Thumbnail below)

Treemap View of Code Duplication

Token Tag Cloud (TTC):

Sometime back I read the blog article 'See How Noisy Your Code Is'. I developed a python module for creating various tag clouds based on token types (e.g. keywords, names, classnames etc).

Treemap Visualization for Source Monitor Metrics data (SMTreemap)

Source Monitor is an excellent tool to generate various metrics from the source code (e.g. maximum complexity, averge compelxity, line count, block depth etc). However, it is difficult to quickly analyse this data for large code bases. Treemaps are excellent to visualize the hierarchicaldata on two dimensions (as size and color). This tool uses Tkinter (or Javascript) to display the SourceMonitor data as treemap.

You have to export the source monitor data as CSV or XML. can then use this CSV or XML file as input to display the treemap