Office of Technology Transfer – University of Michigan

Simple Text Mining

Technology #4730

Questions about this technology? Ask a Technology Manager

Download Printable PDF

License this Technology
Simple Text Mining Opensource License (4730)
$0.00
Categories
Researchers
Jeff Horon
Managed By
Jessica Soulliere
Digital Technologies Licensing Specialist 734.647.9926

Background

Currently, there is a lack of text/network mining software available to the typical analyst end-user. Generally available text mining algorithms require extensive programming to implement. Typically, these more complex algorithms have an extremely steep learning curve, requiring a long-term commitment of professional software developer resources. Such solutions usually cannot be implemented by the typical analyst or small business.

Technology

The University of Michigan has developed an Excel-based tool and algorithm for text mining that ‘reads’ blocks of unstructured text for each word in a lexicon (supplied by the user) and assembles the words found into a common network analysis data structure called an “edge list.” This analysis includes additional descriptive data concerning the weight of lexicon words found. This ‘weight’ output allows for analysis of terms found. The network output allows for analysis of term “adjacency,” i.e. appearing together in the same block of unstructured text, the computation of network analysis measures, and the production of network visualizations. Outputs include user-specified data dimensions, carried over from the text input, for easily cross-referenced and more descriptive output.

Applications and Advantages

Applications

  • Analysis of unstructured text for a large number of known lexical terms
  • Analysis of occurrence and adjacency (co-occurrence) of terms in papers, abstracts, etc.

Advantages

  • Approachability / ease-of-use (single-click processing of input text)
  • Easy copy/paste of input/output data