Currently, there is a lack of text/network mining software available to the typical analyst end-user. Generally available text mining algorithms require extensive programming to implement. Typically, these more complex algorithms have an extremely steep learning curve, requiring a long-term commitment of professional software developer resources. Such solutions usually cannot be implemented by the typical analyst or small business.
The University of Michigan has developed an Excel-based tool and algorithm for text mining that ‘reads’ blocks of unstructured text for each word in a lexicon (supplied by the user) and assembles the words found into a common network analysis data structure called an “edge list.” This analysis includes additional descriptive data concerning the weight of lexicon words found. This ‘weight’ output allows for analysis of terms found. The network output allows for analysis of term “adjacency,” i.e. appearing together in the same block of unstructured text, the computation of network analysis measures, and the production of network visualizations. Outputs include user-specified data dimensions, carried over from the text input, for easily cross-referenced and more descriptive output.
Applications and Advantages
- Analysis of unstructured text for a large number of known lexical terms
- Analysis of occurrence and adjacency (co-occurrence) of terms in papers, abstracts, etc.
- Approachability / ease-of-use (single-click processing of input text)
- Easy copy/paste of input/output data