BI and the “Unstructured Data” Challenge
A survey of this type, like an e-mail message, is “semi-structured.”
Exploit what is structured in interpreting and using the free text.
Generally, textual source information doesn’t come in without some form of envelope, of metadata that describes the information and its provenance.
It’s still hard to automate interpretation of the free text, that is, to do more than count words and note cooccurrence. Sentiment extraction comes into play.
©Alta Plana Corporation, 2008
The Data Warehousing Institute