Author : MD TAREQ HASSAN | Updated : 2022/08/18
The concept of dataset
- A dataset (or data set) is “discrete items of related data” that may be accessed individually (or in combination) or managed as a whole entity
- A data set is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question
- A data set is organized into some type of data structure
- Datasets identify data within the linked data stores (such as SQL tables, files etc.)
- A logical grouping of units of data
- A dataset represents a data model
- Dataset vs Data Model
- From BI service perspective, it’s referred to as a dataset, and from a development perspective it’s referred to as a model.
- In some context (i.e. Power BI), they may mean much the same thing.
In the context of ADF, dataset is the information (i.e. column information for tables or CSV files) about source data, sink data
What is Dataset in ADF?
- A dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs.
- Dataset represents data structure giving select view into the data store
- Dataset points to specific data subset to use in activity for input and output
- Dataset defines “what, data looks like”
- Datasets identify data within different data stores ( i.e., tables, files, folders, and documents)
- A dataset is defined in JSON format
- In copy activity, datasets are used in source and sink. In Data Flow, datasets are used in source and sink transformations
Before we can create a dataset, we must create a linked service to link the data store to ADF.
Dataset format and types
Format type of dataset
- JSON
- Excel
- XML
- Binary
- Delimitedtext
- etc.
Dataset type:
ADF supports many different types of datasets, depending on the data stores we use.
Go to https://docs.microsoft.com/en-us/azure/data-factory/connector-overview , select a data store to learn how to create a linked service and a dataset for it.