Author : MD TAREQ HASSAN | Updated : 2023/01/18

Understanding ETL

See: ETL VS ELT

Capability of ADF

ADF primarily does following two types of data integration operations:

Data transformation (i.e. mapping data flow) can be performed on ADF-managed Spark cluster (Databricks) or dispatched to following using dispatch activity:

Notes:

Key components of ADF

Azure Data Factory is composed of following key components:

Integration Runtime

The integration runtime (IR) is the compute infrastructure that Azure Data Factory and Synapse pipelines use to provide data-integration capabilities across different network environments.

See details: ADF Integration Runtime

Linked Services

A linked service defines the connection information that’s needed for ADF to connect to the external resources (data source, data sink, compute resource etc.). There are two types of linked service:

See details: ADF Linked Services

Datasets

Datasets represent data structures within the data stores (data source or data sink), which simply point to or reference the data we want to use in our activities as inputs or outputs. A dataset represents the structure of data in a source data store or sink data store.

See details: ADF Datasets

Pipelines and Activities

Managed virtual networks and managed private endpoint

The Integration Runtime is a software component that runs on a conpute infrastructure (i.e. VM). Therefore Integration Runtime requires Virtual Network to which underlaying VMs will be deployed. That virtual network can be either ADF-managed VNet (Azure IR) or self provisioned VNet (Self-hosted IR). See details: ADF Virtual Network Integration and Private Link

Data Flow

Mapping data flows

Triggers

Parameters

Control flow

Control flow is an orchestration of pipeline activities that includes: