Eduserve Big Data Symposium 2012

London today for a varied agenda dealing with Big Data hosted by Eduserve. The start of the day seemed to be too focused on data exhaust; data storage and analysis of actor interactions with corporate systems and services or activity data. The speakers focused on understanding the customer and how to build big data samples. I’d argue that in HE we understand our customers pretty well and already have big data sets. Perhaps we’d do well to address the skills, culture and opening up access to move our sector forward. Sadly I had to miss a couple of sessions where feedback was very positive. The afternoon had some really strong talks from Demos and Berkeley. Demos was about Public Sector and was mostly non technical highlighting those cultural / human issues as well as what might be possible to enhance public services. Berkeley had a more than a bit of technical, human and national level material. An exceptional key note. All in a good day. I hope the notes are of wider interest.

Kings Cross

Talks are numbered.

1. Storage Issues EMC2
a. Unstructured pervades along with Volume, Variety, Velocity
b. Case studies;
Deliver better healthcare with Big Data
Classifying and segmenting big data
Rich content stores, generated from workflow, develop new IP cased in Big Data, Mining data for business advantage, consumer data
c. Why store big data?
Google stores data, runs Plus, identifies trends
FB and Twitter store every message you send and graph trends
Amazon store your every purchase
Carriers store location based data
Graph Theory = Data Visualisations
d. Getting started
Big data leads to optimized organisation(???)
Big data takes a long time to build (from scratch);
warehouse, analytics tools, connect to business
CIOs should consider big data to stay ahead
e. Big Data Based Decisions
Real time analytics to support real time and better decision making in a more transparent manner with greater accuracy allowing the personalization of service offer to customers
f. What holds Big Data Back
Organisational change and talent / skills acquisition are key (not technical issues

St Pancras
St Pancras

2. Weathering the data deluge, Sanger
I had to miss this one
3. Bi Data and Knowledge Engineering, Leicester Uni
I had to miss this one too. Shame as I gather this Professor from Leicester put Big Data into the HE context. Peter Tinson and I discussed this. We also aired the idea of me sending appropriate updates to UCISA to drop feed their weekly bulletin to members. I’d include SICT, profiling SICT for Cloud, Analytics Recon, BI update, RM outputs, Staff and Change Pathway (and the other pathways), Digital lIteracies. Also the Analytics Recon might be a contender for CISG in October.

Euston

4. Lightning Talks
a. Digital Curation Centre
Hadron, Grid, Data / Information Management, GRIDPP
There’s an LHC at home project much like SETI at home
As one might expect this talk is about preservation and curation of Big Data for research initiatives such as Human Genome, Hadron etc.
b. Prescriptive Analytics Failure
Taking the example of a single tweet (about vampires) being machine interpreted as the Tweeter is an expert in the occult
Web 3.0 Analysis and Interpretation
c. Simon Hodson JISC and Big Data
Rather Research Data Management
1994 Group report HEIs carrying up to 250PB as ‘managed’ and ‘non managed’
Gist of this is that there’s tension between storage of old research data requirements, good practice of deleting / managing data storage, costs and practices to ensure security
d. NoSQL Tools / approaches

Euston Square

5. Making data a way of life for public servants (Max Wind-Cowie, Head of progressive conservatism, DEMOS – a think tank)
a. Open data tends to be the message from Government
b. Lack of resources to analyse Big Data
c. Ethical implications
d. Link to Open Data is implicit
e. Declining levels of trust in the services serving the public
f. Google and Amazon have embedded their referral services as an expectation, public services are well behind and could be using Big Data to adapt
g. Data can drive public service innovation and entrepreneurship
h. Requires accessibility and transparency of data
i. Can enhance prioritization and planning
j. Can forge unexpected links and predictions; children taught creative writing are less at risk of public health and law and order interventions
k. Issues here are around skills, competencies and culture change

Royal College of Physicians
College of Royal Physicians

6. Berkeley view of Big Data
a. Analysing user behaviour rather than user input; Twitter Based Earthquake Detector (monitors frequency and composition of earthquake related tweets). Also known as ‘now casting’ eg google.org/flutrends
b. US trade in scheme old cars for new. Took the gOvernment months to work out whether the scheme would be successful as had to wait for forms to be returned for rebates from dealers. Google knew instantly as monitored increases in searches related to car selling
c. Stored data is growing faster than available storage capacity; how do we determine what data to keep, what to delete?
d. The NSF has a data retention requirement; a data management plan is required for research with an agreement to hold data for a minimum of 3 years. Costs are considerable.
e. It’s hard to extract value from data. The challenge is to address this and make analysis accessible to all, not just data scientists
f. Three issues to address; algorithms (analytics), machines (storage and processing) and people (skills and culture)
g. US has a prototype to address this. States things HEIs need to do include a curriculum to address data scientists, recruit staff to address skills gaps, insist on data storage pools / warehouses
h. This very much aligns with the Educause studies by Donald Norris et al but shows a prototype system. Very impressive.

HADOOP the Mornington Crescent of Big Data!
Mornington Crescent

Leave a Reply

Your email address will not be published. Required fields are marked *