Author : MD TAREQ HASSAN | Updated : 2022/08/19
Understanding pipeline and activity
Activity:
- In Data Factory and Synapse pipelines, an activity defines the action to be performed on the data
- Activities represent a processing step in a pipeline
- An activity can take zero or more input datasets and produce one or more output datasets (an input dataset represents the input for an activity in the pipeline, and an output dataset represents the output for the activity.)
Pipeline:
- A pipeline is a logical grouping of activities that performs a unit of work
- Pipeline allows you to manage the activities as a set instead of managing each one individually
- Together, the activities in a pipeline perform a task
- The activities in a pipeline can be chained together to operate sequentially, or they can operate independently in parallel
Types of activity
Azure Data Factory and Azure Synapse Analytics have three groupings of activities:
Notes:
- The data movement activity and data transformation activity are commonly called “Execution activity”
Activity policy
- Policies affect the run-time behavior of an activity, giving configuration options
- Activity Policies are only available for execution activities (data movement activity and data transformation)
Activity dependency
- Activity Dependency defines how subsequent activities depend on previous activities, determining the condition of whether to continue executing the next task
- An activity can depend on one or multiple previous activities with different dependency conditions (Succeeded, Failed, Skipped, Completed)
Notes:
- If we have multiple activities in a pipeline and subsequent activities are not dependent on previous activities, the activities may run in parallel
Scheduling pipelines with trigger
- Pipelines are scheduled by triggers
- A pipeline reference of the target pipeline must be included in the trigger definition in order to kick off a pipeline run
- There are different types of triggers
- Scheduler trigger: allows pipelines to be triggered on a wall-clock schedule
- Manual trigger: triggers pipelines on-demand
- Details: https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
Notes:
- Trigger is defined as JSON
- Pipelines & triggers have an n-m relationship
- Multiple triggers can kick off a single pipeline
- The same trigger can kick off multiple pipelines