Export to
Friday, July 20, 2012 at 10:01am.
Hadoop and Data Science Meetup
Access Notes
Front door on 11th; venues may have access through large garage door on Flanders
Description
This month's meet up will start with discussion of news items related to data science and big data led by William Taylor.
Presentation this month by Temese Szalai.
Title: Asking Questions About Big Data: A Basic How-To For Framing Problems When Working With (Unstructured Text) Data At Scale
Summary: Data is only as valuable as the questions we ask about it. The questions to ask need to be those that yield valuable insights, quantifiable results and whose answers lead to actionable information, i.e., help make a decision or meet the requirements of the people and systems consuming the analysis or output. Identifying good questions to ask and how to proceed with very large data sets is at the very heart of being a data scientist.
When working with large data sets, especially ones that are unstructured or semi-structured text data, asking questions and getting started is not always easy. In fact, it's sometimes the hardest part. Drawing on her experiences working with text data at scale, Temese will talk about strategies and methodologies for approaching this kind of data when doing initial discovery and analysis. She'll also cover some basic tools and techniques that are available and basic best practices. Although unstructured text data is a focus, the talk should be general enough to apply to analyzing other kinds of data as well.
Speaker Bio: Temese Szalai has worked as an industrial computational linguist/taxonomist for 13 years. Presently, she is the founder of Madarka, which leverages semantic analysis of large unstructured corpora for psychographic consumer segmentation.