Tag Archives: microsoft

Azure Data Factory Region Detection

23 Dec

If you are building an Azure Data Factory (ADF) pipeline and receive an error that contains this message when your pipeline is executed:

Failed to detect region of linked service

or

Failed to detect the region for

… then you may be running into a situation where the Data Movement Service (DMS) feature of ADF is either not able to detect the region of the data store or there is no DMS in that region.

The Data Movement Service of ADF is the Azure-managed cloud service (PaaS) that performs scale-out data movement at elastic scale. Azure handles all of the plumbing for moving Big Data for your data pipelines. You can see the locations available for Data Movement on the Azure Regions page (https://azure.microsoft.com/en-us/regions/services/). On the screenshot of that page below you’ll see that the Data Factory service has several sub-services. The Data Factory service stores your factory account metadata while the Movement, Activity Dispatch and SSIS IR are separate managed services that have their own region deployments. It is the Data Movement service in those regions that perform the heavy lifting of moving your data and that is where you should focus to bypass the error.

regions

In the V1 original ADF service, there is a property on the Linked Services definition that allows you to explicitly tell ADF the location of your data store (executionLocation). This is taken directly from the online Azure documentation for ADF (https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-data-movement-activities#global):

For example, to copy between Azure stores in Korea, you can specify "executionLocation": "Japan East" to route through Japan region (see sample JSON as reference).
Note

If the region of the destination data store is not in preceding list or undetectable, by default Copy Activity fails instead of going through an alternative region, unless executionLocation is specified. The supported region list will be expanded over time.

In the new V2 ADF service, the Integration Runtime (IR) feature is the primary way to move data in the cloud or on-prem. So, you may have to explicitly tell ADF about the location of your data source by creating an IR in your data store region and then reference that IR in your Linked Services definition using the new “connectVia” property. If you do not specify an explicit IR reference, then ADF will use a “default IR” which may not be able to resolve the location.

First, create an Integration Runtime in the region where your data store is located:

https://docs.microsoft.com/en-us/azure/data-factory/create-azure-integration-runtime#create-azure-ir

Then add the

connectVia

property to your Linked Service using a reference to that new IR:

https://docs.microsoft.com/en-us/azure/data-factory/concepts-datasets-linked-services#linked-service-json

Advertisements

What Makes Your Data Warehouse a “Big Data Warehouse”?

31 May

I’ve been closely observing the evolution of marketing of the classic database and data warehouse products over the past 2 years with great interest. Now that Big Data is top-of-mind of most CIOs in corporations around the globe, traditional data vendors like IBM, Oracle, Teradata and Microsoft are referring to their platforms as “Big Data” or “Big Data Warehouses”.

I guess, in the final analysis, this is really an attempt by data vendors at shifting perceptions and melding CIO thinking about Big Data away from Apache Hadoop, Cloudera and Hortonworks and toward their own platforms. Certainly, there are some changes taking place to those traditional data warehouse platforms (MPP, in-memory, columnstore) that are important for workloads that are classic “Big Data” use cases: clickstream analysis, big data analytics, log analytics, risk modeling … And most of those vendors will even tack-on a version of Hadoop with their databases!

But this is not necessarily breaking new ground or an inflection point in terms of technologies. Teradata pioneered MPP decade ago, Oracle led the way with smart caching and proved (once again) the infamous bottleneck in databases is I/O. Columnar databases like Vertica proved their worth in this space and that led to Microsoft and Oracle adopting those technologies, while Aster Data led with MapReduce-style distributed UDFs and analytics, which Teradata just simply bought up in whole.

In other words, the titans in the data market finally felt enough pressure from their core target audiences that Hadoop was coming out of the shadows and Silicon Valley to threaten their data warehouse market share that you will now hear these sorts of slogans from traditional data warehouses:

Oraclehttp://www.oracle.com/us/technologies/big-data/index.html. Oracle lists different products for dealing with different “Big Data” problems: acquire, organize and analyze. The product page lists the Oracle Big Data Appliance, Exadata and Advanced Analytics as just a few products for those traditional data warehouse problems. Yikes.

Teradata: In the world of traditional DWs, Teradata is the Godfather and pioneered many of the concepts that we are talking about today for Big Data Analytics and Big Data DWs. But Aster Data is still a separate technology and technology group under Teradata and sometimes they step on their own messaging by forcing their EDW database products into the same “Big Data” space as Aster Data: http://www.prnewswire.com/news-releases/latest-teradata-database-release-supports-big-data-and-the-convergence-of-advanced-analytics-105674593.html.

But the fact remains that “Hadoop” is still seen as synonymous with “Big Data” and the traditional DW platforms had been used in many of those same scenarios for decades. Hadoop has been seen as an alternative means to provide Big Data Analaytics at a lower cost per scale. Just adding Hadoop to an Oracle Exadata installation, for example, doesn’t solve that problem for customers outside of the original NoSQL and Hadoop community: Yahoo, Google, Amazon, etc.

So what are your criteria for a database data warehouse to qualify as a “Big Data Warehouse”? Here are a few for me that I use:

  1. MPP scale-out nodes
  2. Column-oriented compression and data stores
  3. Distributed programming framework (i.e. MapReduce)
  4. In-memory options
  5. Built-in analytics
  6. Parallel and fast-load data loading options

To me, the “pure-play” Big Data Analytics “warehouses” are: Vertica (HP), Greenplum (EMC) and Aster (Teradata). But the next-generation of platforms that will include improved distributed access & programming, better than today’s MapReduce and Hive, will be Microsoft with PDW & Polybase, Teradata’s appliance with Aster & SQL-H and Cloudera’s Impala, if you like Open Source Software.

cbailiss

Microsoft SQL/BI and other bits and pieces

TIME

Current & Breaking News | National & World Updates

Tech Ramblings

My Thoughts on Software

SQL Authority with Pinal Dave

SQL Server Performance Tuning Expert

Insight Extractor - Blog

Paras Doshi's Blog on Analytics, Data Science & Business Intelligence.

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Occasional observations from a vet of many database, Big Data and BI battles

Blog Home for MSSQLDUDE

The life of a data geek