Most ETL systems don’t fail suddenly.
More commonly, they become slower, less predictable, and more expensive as workloads increase.
In logistics, ETL (Extract, Transform, Load) processes typically involve collecting operational data such as shipment files, tracking events, partner integrations, and customs information, then normalising it and loading it into analytical or operational stores for reporting and downstream processes. These workflows are repetitive and are expected to behave consistently across runs.
At a limited scale, this is usually easy to achieve.
The starting point: ETL implemented in code
But what happens when the scale changes?
One of my recent implementations was for a client operating on large, time-bound datasets in the logistics industry. Their ETL solution used Azure Functions.
From an engineering perspective, the benefits of Azure Functions were unquestionable:
business logic was explicit and version-controlled,
testing and debugging were relatively simple,
integration with existing application components was direct,
execution times remained within acceptable limits for the initial data volumes.
For a time, this architecture met both technical and operational expectations – until data volume increased.
When data volume grows
As the logistics operation expanded, data volumes grew: more parcels, more tracking events, larger partner files, and greater day-to-day variability.
Processing times increased significantly. Jobs that previously completed overnight began taking about ten hours, with peak runs stretching up to forty hours. Completion time also became inconsistent: runs with similar input sizes no longer produced similar execution times.
This introduced several operational risks:
delays to reporting and billing processes,
loss of confidence in batch completion windows,
increased monitoring and manual intervention,
rising infrastructure and operational costs.
Although the ETL continued to produce correct results, it no longer behaved as a stable batch process. For a logistics organisation with fixed daily deadlines, this became a serious structural limitation.
Azure Data Factory enters the picture
At this point, the decision was made to migrate the ETL from Azure Functions to Azure Data Factory (ADF). It was not the only possible choice, and multiple factors were considered.
Azure Data Factory is a managed platform for orchestrating batch data integration at scale. It supports low-code pipeline development, CI/CD integration, and connectivity to a broad range of enterprise and SaaS systems, while abstracting execution and scaling concerns. Costs are usage-based, with no upfront infrastructure investment.
The organisation had already used ADF for another product, but it was not possible to replicate the same settings and achieve satisfactory results. I took full responsibility for infrastructure provisioning, CI/CD integration, monitoring, and ongoing operations. From there, I reimplemented the existing business logic using ADF pipelines and data flows.
The effects of the migration were clear immediately:
average processing time was reduced from approximately ten hours to roughly two
execution time became consistent for similar data volumes
operational intervention was significantly reduced
overall infrastructure costs decreased.
From a business perspective, the main outcome was the restoration of predictability: batch processing completed within the required time window.
It is worth mentioning that the transition required upfront investment. Azure Data Factory uses a different execution and operational model from Azure Functions, and the learning curve was accepted as part of the trade-off.
Looking for experienced developers who understand high-volume systems?
In logistics, ETL is rarely about one neat stream of data:
Information comes from many places (e.g. databases, files, cloud storage, APIs, SaaS tools, and older systems) and in many formats (CSV, XML, JSON, structured tables, etc.).
Volumes fluctuate significantly, especially during busy periods.
ADF helps mainly by making these sources easier to connect. Most common systems are supported out of the box, so there is no need to build and maintain custom connectors.
For transformations, ADF offers ready-made components that cover typical ETL tasks: you can configure most of the logic without writing or maintaining code.
Processing runs (e.g. on a schedule, when new files appear, or after an upstream system finishes its work) is controlled by pipelines. In logistics, timing and cut-off windows matter more than reacting to every single event, which makes this model a good fit.
When Azure Data Factory is an appropriate choice
ADF works well when you’re dealing with large amounts of data, batch processing fits the business flow, and reliability matters more than fine-grained control over every technical detail.
This is often the case in logistics, where:
data volumes are large and can vary a lot,
processing happens in batches rather than in real time,
finishing within a fixed time window is more important than instant results,
predictable execution matters more than low-level control.
ADF also helps reduce the amount of custom code you need to maintain. Basic calculations, string handling, reshaping data, and other common transformations can usually be configured instead of written from scratch. Its built-in connectors cover most standard integrations and are generally easier to keep running than custom, SDK-based solutions.
When Azure Data Factory is not the right tool
Azure Data Factory isn’t built for real-time or near-real-time work. Pipelines take a few minutes to start, making ADF a poor fit when data needs to be processed immediately upon arrival.
It’s also not efficient for very small workloads. Running a pipeline for a handful of records costs almost the same as running it for a large batch, so triggering pipelines frequently with tiny amounts of data tends to be slow and unnecessarily expensive. In those cases, API-driven or event-based solutions usually work better.
ADF also struggles when transformation logic is driven by complex, conditional business rules. While this kind of logic can be implemented, it quickly becomes difficult to read, test, and maintain compared to application code. For these scenarios, application-centric or event-driven architectures give more control and are easier to work with over time.
To sum up, ADF is not the best choice for:
real-time or near-real-time processing
very small or frequent workloads
transformation logic dominated by deeply nested or highly conditional business rules.
Wrapping up
This migration was a response to changing scale in an expanding logistics organisation.
As data volumes grew, predictability and stability became more important. Azure Data Factory provided an execution model better aligned with those constraints.
When a code-based ETL no longer scales reliably and operational risk becomes visible at the business level, moving batch workloads to a platform designed for that purpose can be a pragmatic and effective decision.
But ADF is not a universal solution and entails entry costs in tooling and expertise, so it’s worth defining your needs carefully before committing.
FAQ
What is Azure Data Factory?
Azure Data Factory is a managed service for building and running batch ETL pipelines that move and transform data between systems.
Why use Azure Data Factory in logistics?
Logistics systems often process large, variable data volumes on fixed schedules. ADF is designed to handle this type of batch workload predictably.
How does Azure Data Factory differ from Azure Functions for ETL?
Azure Functions offer flexibility and fast execution. They are excellent for small amount of data but can be used for large batches and long-running processes as well. However, such implementation requires advanced engineering skills and extra effort.
Azure Data Factory, by contrast, is designed out of the box for long-running, large-scale batch processing.
Is Azure Data Factory a no-code tool?
It is best described as low-code. Many transformations can be configured without writing code, but expressions and logic are sometimes required.
When should Azure Data Factory be avoided?
It should be avoided for real-time processing, very small or frequent workloads, and scenarios with highly complex business logic.
Paweł Minkina
.NET engineer at Happy Team, working on complex logistics solutions and trying new ideas along the way.
We use cookies and other tracking technologies to improve your browsing experience on our website, analyse our website traffic, and understand where our visitors are coming from. In our Privacy Policy you can learn more about who we are, how you can contact us, and how we process your personal data.