We broke new ground last year, attending the ‘horizontal,’ cross-industry data management conference organized by IRM-UK1. This co-located the Data Warehousing/Business Intelligence and Information Quality conferences as well as the EU meets of DAMA and IAIDQ. Our first surprise was that there was a decent sprinkling of oil and gas folks in attendance, notably from BP, Statoil and Shell.
DAMA president Peter Aitken (Data Blueprint) gave a keynote in ‘monetizing data management.’ In other words, how to sell DM projects to the CEO. One issue is that schools teach students how to build new (Oracle) databases while industry needs people who can harmonize old databases! Today, many IT projects are done without data managers’ involvement. Data migration projects costs are often underestimated by a factor of 10. Moreover, 7 out of 10 projects and investments have ‘small or no tangible returns.’ If a project has hundred of thousands of attributes that each take one minute to map, you have a train wreck in the making!
Frances Dean and Hugh Potter from BP described its integrated supply and trading modeling initiative—a.k.a. the ‘systems transaction pipe.’ A previous data modeling initiative— described as a ‘big boil the data ocean project that failed’—resulted in serious ‘data fatigue.’ A global data management business team was formed in the Trade Completion back office. Dean noted ‘We are not IT and we are not a project.’ Hugh Potter outlined the mechanics of physical oil trading—the transaction pipe—from deal, through confirmation to scheduled movements.
Previously, the process lacked visibility of the ‘big picture’ and was increasingly defect-prone. The new system has been running successfully for three years now. Dean’s initial attempts to leverage the metrics it was producing did not go down well with the traders—resulting in angry team meetings and people unwilling to talk about performance. Metrics need to be chosen judiciously so they are not a threat, so they just provide visibility. You need to ‘hold up a mirror to let folks decide if they need a haircut.’ Now BP has established targets for metrics (working capital) and a dashboard for managers. Some feedback from the business has corrected anomalies and metrics have been re-cast. But these are used in decision support, ‘you can’t go on tweaking indefinitely—you need a stable system and metrics.’ Potter continued noting the overlap between data quality metrics and business KPIs. These allow BP to track a trading team’s activity and a deal’s ‘criticality.’ Operations managers use the system to see how well traders are doing their job. Metrics are now used to optimize portfolio and supervise deals, to find out why some transactions go wrong. Teams are less defensive as a result and can investigate ‘known issues’ separately. Dean concluded, ‘It’s never about the data—the value is in the conversations. This started as a bottom up amorphous grass roots project. Don’t expose metrics to management without having the team on board. Let them tell the story their own way.’
Malcolm Chisholm’s (RefDataPortal) presentation heralded the arrival of data management in the cloud. Cloud computing is a return to the mainframe-based time sharing of the 1970s. But access is even easier now as the web browser has replaced the TTY. Cloud computing’s poster child is the ubiquitous SalesForce.com. But you can now also rent a database for a few cents/hour. The cloud provides open source and is potentially a way around the ‘3 evil axes’ of the corporation—procurement, legal and IT. But security is a big problem for the cloud. There is a greater need for data housekeeping in the public space of cloud—data governance is the key. Personnel need to be redeployed from data centers to manage data in the cloud. Tasks include provisioning—how to push data to cloud consistently. Users need to be prevented from going straight to the cloud sans supervision. RACI2 matrices are recommended for data governance introducing notions like provision, activation, use and deactivation of a server. Data removal and backup have to be considered and ‘zombie’ instances need to be killed—otherwise it’s easy to overrun costs. You may also have to educate your programmers in the use of columnar databases such as Google’s Bigtable, MapReduce, Hadoop and others. These non-relational data stores offer a path to scalability, fast retrieval and data time stamps but they are quite a paradigm shift. There is no data modeling, ‘no Ted Codd.’ In fact, they are a step backwards in data management maturity.
Barry Devlin (9sight Consulting) describes spreadsheets as the bane of the data warehouse. ‘Spreadmarts’ are prone to logic and data errors that may take years to discover and result in having to restate the accounts. In 2008, Credit Suisse was fined £5 million for duff, spreadsheet-based closing3! But getting rid of Excel is not an option. Spreadsheets can be a sandbox for playing with stuff—you just need to nurture and fence-off the playground. Then you can leverage users’ analyses and migrate this to mainstream BI. Devlin advocates an ‘adaptive analytic cycle’ using an enterprise data warehouse (EDW). This is augmented with a ‘playmart’ using certified EDW data plus other, possibly unsafe, information sources. There is ‘more control in the playmart than in a spreadmart.’
Karsten Muthreich described Nestlé’s decade-long ‘OneGlobe’ SAP program that set out to ‘transform independent markets into one global company.’ This has now been achieved with a standard data system supporting 160,000 users in 800 factories. Nestle’s main issue was the shift from ‘my’ data to ‘our’ data—there are still issues with some local units. Before OneGlobe there were 2 million vendor records across multiple files. Now one master file holds 600,000 records. DM is evolving from local to global with 350 data standards defined. Data busts—like ‘15 meters’ instead of 15mm got users thinking about data quality.
3 More in similar vein from www.oilit.com/links/1101_18.
© Oil IT Journal - all rights reserved.