Dataflow is a new addition to Power BI. When Microsoft announced Common Data Service for Analytics (CDS-A) on the 21st of March 2018, Power BI adopters instantly realised that it is going to be the ‘Next BI Revolution’.
Microsoft defines Dataflow as a “new extensive capability that allows business analysts and BI professionals to ingest, cleanse, transform, integrate, enrich and schematise data from a large set of transactional and observational sources. It handles the most complex data preparation challenges for users through its revolutionary model-driven calculation engine, cutting the cost, time and expertise required for such activities to a fraction of what they otherwise would be.”
The idea is to reduce the complexity of implementing business analytics from business apps and other data sources. According to Microsoft, the goal is to make it easy to ingest your data, from any source, at any scale.
With Dataflow and CDS-A, Microsoft brings data lake technology directly into Power BI. That way, users with Power Query skills can extract data from multiple sources, centralise, and manage the data within Power BI. The data lake comes with a standard schema, called the Common Data Model (CDM) which includes the most common business entities such as Marketing, Sales, Customer Service and Finance and includes connectors that ingest data from the most common sources into this schema.
“The new Power BI dataflows capability is designed to overcome existing limitations of data prep in Power BI Desktop and allow large-scale, complex and reusable data preparation to be done directly within the Power BI service.” Microsoft
CDS for Analytics vs. Apps
Common Data Service (CDS) for Apps allows you to securely store and manage data that’s used by business applications. Data within CDS for Apps is stored within a set of entities. Each entity is a set of records used to store data, similar to how a table stores data within a database. CDS for Apps includes a base set of standard entities that covers typical scenarios, but you can also create custom entities specific to your organization and populate them with data using Power Query. App makers can then use PowerApps to build rich applications using this data.
The Common Data Service for Apps allows data to be integrated from multiple sources into a single store which can then be used in PowerApps, Flow and Power BI combined with data already available from Dynamics 365 applications.
Common Data Model(CDM)
Microsoft has created the Common Data Model (CDM) which mirrors the industry ecosystem and defines a mechanism for storing data in an Azure Data Lake, permitting application-level interoperability.
More specifically, the goal is to permit multiple data producers and data consumers to easily interoperate by using the raw storage of an Azure Data Lake Storage (ADLSg2) account without necessarily requiring additional cloud services for indexing content. This accelerates the value you get from your data by reducing complexity and shortening development cycles.
The CDM bringsstructural consistency and semantic meaning to the data stored in a CDM-compliant Azure Data Lake.
Dataflow: missing piece of puzzle
Azure AS, SSAS and Power BI datasets all belong to the new OLAP family and their job is to provide a semantic model or “cube” with the business logic to end users.
CDS belongs to the Data Warehouse/Data Mart/Data Lake family. Its main job is to aggregate, clean, transform, integrate and harmonize data from many sources. It is the central repository for your business data.
Dataflows is for Self-Service ETL for use by business analysts and seems to be the missing piece of puzzle which allows you to transfer data from the source data to the business layer.
By moving the data to the tenant’s storage account (“BYOSA – Bring Your Own Storage Account”), IT organization can access that data by using a host of Azure data services allowing data engineers and scientists to use a rich toolset for processing and enriching your data.
Why will you need it?
Standard and custom entities within CDS for Apps provide a secure and cloud-based storage option for your data. Entities let you create a business-focused definition of your organization’s data for use within apps.
You can use dataflows to ingest data from a large and growing set of supported on-premises and cloud-based data sources which includes Dynamics 365 (using the new Common Data Service for Apps connector), Salesforce, SQL Server, Azure SQL Database and Data Warehouse, Excel, SharePoint, and more.
You can also ingest larger amounts of data with the “Incremental refresh” feature in Dataflow.
Every Premium node (P1 and above) gets 100TB of internal Power BI storage without any additional cost.
Is it the right time?
The Dataflow is in Public Preview and still evolving. It seems to be the case for CDS too. As you can see below the new version of CDS does not seem to be backward compatible as it shows differently on Microsoft Docs.
- Microsoft 2018, Coming soon to Power BI: Common Data Service for Analytics, <https://powerbi.microsoft.com/en-us/blog/coming-soon-to-power-bi-common-data-service-for-analytics/>
- Microsoft 2018, Introducing: Power BI Data Prep with Dataflows, <https://powerbi.microsoft.com/en-us/blog/introducing-power-bi-data-prep-wtih-dataflows/>