First things first, what is data quality and why should I care?
Data quality is crucial to operational and transactional processes and to the reliability of business analytics and business intelligence reporting. You can’t make good decisions with bad data. Data quality is essentially ensuring that all of your data coming in and all of your loaded data is of high quality.
Well, what is “high quality?”
High quality data is:
- Up to date
- Consistent across data sources
So what happens when you realize your data doesn’t meet these criteria? How do you get started when you want to implement a data quality initiative? The first two steps are what this blog is about: preparing your data and standardizing your data.
Preparing Your Data
Ideally, you start with a Data Prep Phase. This is the process of collecting, cleaning, and consolidating your data into one place for use. Gartner estimates that up to 80% of the work in data analytics is done during the prep phase! So this isn’t an area to downplay the importance of.
Questions to ask yourself during this phase:
- Where is my data? Where does it live? What is the data source?
- Who uses the data? There will be stakeholders in varying business and functional areas to consider and involve; be sure to seek experts who not only understand the data, but also the business processes.
- Is the data any good? Is it usable?
- What is the best way to consolidate the data?
Standardizing Your Data
Once you’ve completed the prep phase, you’re ready to move on to the Data Cleansing and Standardizing Phase.
Data standardization is the next step to ensuring that your data is shareable across the enterprise. You want to make sure your data is the same across the organization. If not, sales figures may not match up, your detail reports may not confirm your summary reports, addresses will not be valid. These types of situations result in wasted time, additional overhead, bad decisions and a lot of frustration.
SAP Data Services is a great tool for getting to “one version of the truth.” SAP Data Services:
- Cleanses and standardizes customer data such as name, addresses, emails, phone numbers, and dates; prevents incorrect data such as invalid contact information
- Manages international data for over 190 countries and reads and writes Unicode data
- Removes errors to uncover the true content of the database
- Improves integrity of data to identify matches
- Ultimately creates a single customer view
SAP Data Services can also help you apply and enforce data quality rules whenever data is created, updated, or moved. It also allows you to perform data quality checks anytime, in real-time, on data sets before analyzing, moving, or integrating data.
SAP Data Services helps your organization move toward that “one version of the truth” and stave off hours of wasted time and rehashed problems. Your departments will have the same definitions and terms to work with, correct data and clean information.
Standardization is the cornerstone of business intelligence.
For more information and to see some examples of how SAP Data Services transforms data, you can listen to a pre-recorded webinar I gave called “Expert Guidelines for Building a Data Quality Assessment & Improvement Initiative.”
You can also read more about Data Quality and other SAP Resources in my other blog series:
About Bruce Labbate
Bruce is a business intelligence consultant specializing in data warehousing, data quality, and ETL development. Bruce delivers customized SAP Data Services solutions for customers across all industries. With Decision First Technologies, A Protiviti Enterprise, Bruce utilizes Data Integrator, Data Quality, Information Design Tool, and a variety of database technologies, including SQL Server, DB2, Oracle, Netezza, and HANA.
There’s a lot of value in unstructured data, but parsing it isn’t something any old analytics engine can do. In this E-Bite, find out how to use SAP HANA, SAP Data Services, and SAP Predictive Analytics for linguistic and sentiment analysis. Get a crash course in the fundamentals of text analysis, and then learn how to perform full-text indexing, text mining, entity extraction, and more. Do you know what your customers are saying?
- Perform linguistic and sentiment analysis with SAP HANA
- Explore SAP Data Services’ ready-to-use text analysis capabilities
- Use real-life sample data to parse customer reviews and other unstructured data
Checkout this new E-Bite from Hillary Bliss and Bruce Labbate! Now available for pre-order!
About Hillary Bliss
Hillary Bliss is the analytics practice lead at Decision First Technologies and specializes in data warehouse design, ETL development, statistical analyis, and predictive modeling.
It really takes convincing and in-depth stakeholder understanding to get buy in for new IT projects. It is equally difficult to get time from business users for any IT implementation, let alone a simple SAP upgrade, that seemingly has little return from a business perspective. This explains the reason many corporations are not upgrading their SAP system or even leveraging the existing SAP BI tools they’ve already acquired!
I was recently in a client demo with Don Loden and Roy Wells, listening to the client ask some questions about SAP Data Services and other EIM tools. We noticed the client had purchased Data Services with the BusinessObjects toolset, but they were not using the software and were having terrible performance issues with their old ETL tool. I believe there are a lot of SAP customers who are in the same situation, and who are at a crossroad about whether to invest more in a new tool or stick with the old, problematic one. Luckily, most organizations that use BusinessObjects for reporting either already have Data Services, or can easily add it to their existing license to resolve most of their current ETL issues.
For example, another client recently implemented BW on HANA. They invested a great amount of time and resources into creating well-designed DSOs, Cubes, multiproviders, as well as incorporating data from NON-SAP sources. Since they already had Data Services, there was no need for new development and modeling efforts. We were able to stage and transform the external data, SAP BW DSOs and Cubes into SAP HANA, allowing us to directly create views for their reports. Most of our look-ups were done in Data Services, and we were able to leverage ECC tables and extractors, providing increased performance.
SAP Data Services is still the best-of-breed ETL tool that delivers a single-enterprise solution for data integration, data quality, data profiling, and text data processing. It allows you to integrate, transform, improve, and deliver trusted data to critical business processes. There is so much that you can accomplish with it.
Whether you have BW, BPC, HANA or just use BusinessObjects for reporting, often your data extraction can take hours and reports can be delayed. With Data Services, you can get significant data load improvement and improve the quality of your data and reports, all delivered in a timely manner.
About Eric Vanderpuije
Eric Vanderpuije is a Business Intelligence Consultant with an extensive background in BW, data analysis, and ETL development. Eric has over 10+ years of experience of which 6 + focusing in BI Architecture, Data Modeling, BW tools set including developing Infocubes, DSOs, Multi provider, DTP, Transformation, Data sources, BW Reporting BEx Analyzer, InfoSet Query, and Web Reporting (WAD), BI Integration, and Datasource enhancement.
Eric came to Decision First Technologies after being with Wipro Technologies, where he was a BW lead. He also has experience in data warehousing with SAP DataServices, focusing on data integration and warehousing with SAP DataServices and a variety of database systems, including HANA and Netezza.
To adapt to the challenges of the SAP real-time enterprise, organizations have to shift from using ‘after the fact’ latency processes to address data quality to implementing those that take place at the point of data entry.
Governing data and measuring and monitoring data quality have always been important to companies, as a result, they spend lots of time and money governing data quality and processes. Data quality in the source system is more important than ever when a new technology like SAP HANA enters the picture.
Don is a principal consultant with full life-cycle data warehouse and information governance experience in multiple verticals. He is an SAP-certified application associate on SAP Data Services, and is very active in the SAP community, speaking globally at numerous conferences and events. He has more than 14 years of i information technology experience in the following areas: ETL architecture, development and tuning, logical and physical data modeling; and mentoring on data warehouse, data quality, information governance and ETL concepts.
Since the release of SAP Data Services 4.0, the design team has been floating the idea that the Data Services Designer would be replaced as the main design interface for SAP Data Services in the near future. Perhaps this is due to the thick-client installation that requires an unbroken connection to the repository database, which may find difficult to work with as telecommuting becomes so much more prevalent. With the release of Data Services 4.1, the Data Services Workbench was introduced as an automatic install whenever the Designer client was installed. Initially, the workbench really performed only the most basic function: replicating data from a source database or datastore to HANA or Sybase IQ databases. Since then, we have seen the functionality of the Workbench expand to support most applications and databases as sources and targets, and to incorporate additional functionality to develop new content within the Workbench interface. This blog post will review the progress towards full functionality and note some new features and differences between Data Services Designer and Workbench.
The first purpose of Workbench was to provide a quick replication tool to port data from other database systems to SAP HANA. Creating a new replication job opens up an interface that simply shows the tables being replicated with no dataflow-type structures.
In the properties window below for each table, the user can change settings like basic column mappings, adding filters, and setting the basis for delta loads, but there are no complex operations supported, like joins, or anything other than a single simple query transform.
To download full PDF and Continue Reading…
About Hillary Bliss
Hillary is a Senior ETL consultant at Decision First Technologies, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendors to integrate business analysis and predictive modeling solutions into the organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.
When developing code within SAP’s Data Services enterprise information management tool, generally requirements dictate having to extract and load data across multiple environments. Data services utilizes datastores to allow you to connect to a variety of different data sources such as a web service, an adapter, or more commonly a database. Those database data sources can include everything from SQL Server, Oracle, DB2, even Netezza. This blog will describe setting up datastores against a Netezza database to utilize the alias functionality to simplify the process of migrating code from development to production.
What is Netezza and how does Data Services connect to it?
Netezza is a powerful Data Warehousing appliance that integrates the database, server, and the storage components into a single system. Netezza is purpose-built for data warehousing and is specifically designed for running complex data warehousing workloads. As a result of using proprietary architecture to deliver fast query performance, minimal tuning and administration, and high scalability; Netezza is an ideal database system to use for your data warehousing needs. As with any relational database system, Data Services can easily connect to Netezza using a datastore connection. Data Services can then import the metadata that describes the data through that connection. If the metadata is identical across multiple environments you can have multiple data configurations within one datastore.
Having multiple configurations within one datastore eliminates the need to create a datastore for every single database you need to connect to; which can speed up development time and prevents unnecessary clutter in your local object library.
To download full PDF and Continue Reading…
Shaun Scott is a Business Intelligence consultant specializing in data warehousing and ETL development. Shaun delivers customized SAP BusinessObjects solutions for customers across all industries. With Decision First Technologies, Shaun utilizes SAP Data Services, Universe Design, Web Intelligence, and a variety of database technologies, including SQL server, DB2, and Netezza.
Sometimes, when capturing data in a data warehouse, we need to store time-variant pieces of data about a transaction. This somewhat blurs the lines between a traditional fact table and a dimension, since in the traditional model, time-variance is mainly the domain of a dimension.
Take the example of a production backlog at a manufacturer. When an order is made, particularly in an organization that manufactures large and/or complex goods, it may take some time to fulfill. Maintaining a consistent backlog is also a key to ensuring consistent production planning that’s not beset with shutdowns, inefficiencies, or missed delivery dates.
Keeping a backlog at a granular level generally requires tracking backlog on an order-by-order basis. That way, anything about an order (that’s in your warehouse) can be analyzed to look for trends in the business. There’s just one issue: keeping a snapshot of every order in backlog for the full amount of time it’s in backlog can take up a lot of space. For instance in a mid-size company: if the average order is in backlog for three months and the company receives 10,000 orders per year, that’s nearly a million records per year in a daily snapshot. After a while, that can really add up. It’s no wonder Bill Inmon said, “The management of these every day, ongoing changes can consume an enormous amount of resources and can be very, very complex. It is this type of load that rivets the attention of the data architect.” (Snapshots in the Data Warehouse, pg. 2, white paper at http://inmoncif.com/inmoncif-old/www/library/whiteprs/ttsnap.pdf)
This example could also apply to general A/R snapshots by account, though in many organizations, this snapshot is taken on a monthly basis, so the problem is less imperative.
As the snapshot grows, a simplified version may look something like this:
|Order Key||Date Key||Item Key||Past Due Backlog Amount||Future Backlog Amount|
|50||20131202 – 20131231||…||…||…|
|50||20140103 – 20140114||..||..||..|
In this example, an order for $65,000.00 was placed on December 1, 2013. $35,000.00 worth of product is for delivery on December 31, while $30,000.00 worth of product is scheduled for later. So, all $65,000 goes into future backlog when the order is received.
To download PDF and Continue Reading…
About Britton Gray
Britton has been working in software development and Business Intelligence for the past fourteen years, and has been working with SAP BusinessObjects reporting, analytics, and information management tools for six years. He is certified in Data Integrator and Web Intelligence, and enjoys developing end-to-end BI solutions that enable customers to gain new insights into their data.