Friday, May 8, 2009

Analysis Services Partition Creation in SSIS

On my return from a short holiday last week (South Africa had two public holidays in one week – Monday and Friday so taking 3 days off gave me the whole week ; ) ) I found an email from a blog reader (Ashkan) asking me to add a blog about SSAS partition creation in SSIS. I had promised that I would in my previous blog so I thought I better get to work.

So here it is…

For this example, I am going to discuss how I usually set up SSAS partitioning on a new cube. In most cases, this is at the stage when the warehouse is fully populated with history data and I need to create a large number of SSAS partitions to represent all this data. So what we need to start is a cube that has been deployed to a SSAS server. (This example will add partitions to the an existing cube on a server not partitions (metadata) to a cube in an Analysis Services project.) This deployed cube will have at least one measure group in it and just the default partition. What I would then need to do, is change the default partition’s query definition to point to my first data set, usually my first month in my warehouse (assuming we are partitioning by month) and then use the following code to create all the partitions that I need.

For this example, I am going to use the sample SSAS database Adventure Works and more specifically the default partition, Internet_Sales which is part of the Measure Group, Internet Sales in the cube, Adventure Works DW.

This partition’s source is set to table binding and is mapped to a specific source table in the cube’s DSV. For this example, I want to partition the measure group against a single fact table aiming each partition at a specific month in that table. Therefore I need to change the partition to query binding and set the source value to:

SELECT fis.[ProductKey],fis.[OrderDateKey],fis.[DueDateKey],fis.[ShipDateKey],fis.[CustomerKey],fis.[PromotionKey],fis.[CurrencyKey],fis.[SalesTerritoryKey],fis.[SalesOrderNumber],fis.[SalesOrderLineNumber],fis.[RevisionNumber],fis.[OrderQuantity],fis.[UnitPrice],fis.[ExtendedAmount],fis.[UnitPriceDiscountPct],fis.[DiscountAmount],fis.[ProductStandardCost],fis.[TotalProductCost],fis.[SalesAmount],fis.[TaxAmt],fis.[Freight],fis.[CarrierTrackingNumber],fis.[CustomerPONumber],
CONVERT ( CHAR ( 10 ), SalesOrderNumber )  + 'Line '  + CONVERT ( CHAR ( 4 ), SalesOrderLineNumber ) AS [SalesOrderDesc]
FROM [dbo].[FactInternetSales] fis
INNER JOIN dbo.DimTime dt
    ON fis.OrderDateKey = dt.TimeKey
WHERE Cast(Convert(varchar(6), FullDateAlternateKey, 112) as int) =  200107

(I realise that this is not the best SQL query in the world but it illustrates the point)

The partition Internet_Sales should then be renamed to something that makes more sense. I usually name partitions according to the measure group to which they belong and the specific month they are assigned to. e.g. InternetSales200107

Once this has been done we have a complete partition that we can use to clone more… and you thought cloning was illegal! ; )

In my previous blog (Analysis Services Partition Processing), I went into a lot of detail explaining the script task and how I prefer to capture the XMLA and executing it against the server using an Analysis Services DDL Task. This SSIS package is very similar and therefore I am not going to repeat all that detail here but rather provide the code that is different and follow up with an explanation.

The Script Task’s code would look like this:

Imports System, System.Data, System.Math, Microsoft.SqlServer.Dts.Runtime
Imports AMO = Microsoft.AnalysisServices
Imports Microsoft.AnalysisServices.QueryBinding

Public Class ScriptMain

Public Sub Main()

Dim amoServer As AMO.Server, amoMeasureGroup As AMO.MeasureGroup
Dim amoPartition As AMO.Partition, oVariables As Variables = Dts.Variables
Dim strXMLAScript As String, strTaskName As String
Dim
amoQueryBinding As AMO.QueryBinding, amoNewPartition As AMO.Partition

Try
Dts.VariableDispenser.LockOneForRead("System::TaskName", oVariables)
strTaskName = oVariables.Item("System::TaskName").Value.ToString

amoServer = New AMO.Server()
amoServer.Connect("Data Source=LOCALHOST;Initial Catalog=Adventure Works DW;" & _
"Provider=MSOLAP.3;Integrated Security=SSPI;Impersonation Level=Impersonate;")
amoMeasureGroup = amoServer.Databases.FindByName(amoServer.ConnectionInfo.Catalog _
.ToString).Cubes.FindByName("Adventure Works").MeasureGroups. _
FindByName("Internet Sales")

amoServer.CaptureXml = True

' Get first partition
amoPartition = amoMeasureGroup.Partitions(0)

' Loop through number of partitions needed
For iMonth As Integer = 200108 To 200112

amoNewPartition = amoPartition.Clone()
amoNewPartition.ID = "InternetSales " & iMonth
amoNewPartition.Name = "InternetSales " & iMonth
amoNewPartition.Slice = "" ' Add as needed

' Create new query binding
amoQueryBinding = New AMO.QueryBinding

' Set new query binding
amoQueryBinding.QueryDefinition = GetQueryBindingText(iMonth.ToString)
amoQueryBinding.DataSourceID = amoPartition.DataSource.ID
amoNewPartition.Source = amoQueryBinding

' Add to measure group and update parition
amoMeasureGroup.Partitions.Add(amoNewPartition)
amoNewPartition.Update()

' Report progress
Dts.Events.FireInformation(0, strTaskName, "Added partition - " & _
amoNewPartition.Name & " to create partition list.", String.Empty, _
0, True)

Next

' Set in transaction and in parallel properties and capture
strXMLAScript = amoServer.ConcatenateCaptureLog(True, False)

' Write to package variable
Dts.VariableDispenser.LockOneForWrite("User::strXMLAScript", oVariables)
oVariables.Item("User::strXMLAScript").Value = strXMLAScript

Dts.TaskResult = Dts.Results.Success
Catch ex As Exception
Dts.Events.FireError(0, strTaskName, ex.Message, String.Empty, 0)
End Try

End Sub

Private Function
GetQueryBindingText(ByVal strMonth As String) As String

Return
"SELECT fis.[ProductKey],fis.[OrderDateKey],fis.[DueDateKey],fis.[ShipDateKey]," & _
"fis.[CustomerKey],fis.[PromotionKey],fis.[CurrencyKey],fis.[SalesTerritoryKey]," & _
"fis.[SalesOrderNumber], fis.[SalesOrderLineNumber], fis.[RevisionNumber]," & _
"fis.[OrderQuantity],fis.[UnitPrice],fis.[ExtendedAmount]," & _
"fis.[UnitPriceDiscountPct], " & _
"fis.[DiscountAmount], fis.[ProductStandardCost], fis.[TotalProductCost], " & _
"fis.[SalesAmount], fis.[TaxAmt], fis.[Freight], fis.[CarrierTrackingNumber], " & _
"fis.[CustomerPONumber], " & _
"CONVERT ( CHAR ( 10 ), SalesOrderNumber ) + 'Line ' + CONVERT ( CHAR ( 4 ), " & _
"SalesOrderLineNumber ) AS [SalesOrderDesc] " & _
"FROM [dbo].[FactInternetSales] fis " & _
"INNER JOIN dbo.DimTime dt " & _
"ON fis.OrderDateKey = dt.TimeKey " & _
"WHERE Cast(Convert(varchar(6), FullDateAlternateKey, 112) as int) = " & strMonth

End Function

End Class


This code will loop through a list of your choice (I have simply chosen to loop through 5 months to prove the concept) creating new partitions. This is done by cloning the existing one (the one we edited earlier) and then overwriting the properties that need to be changed. These properties include the partition ID, Name and the query binding. If the cube is using a slicer then you could overwrite this property too. I haven’t included it in this example.



Please notice that I needed to change the parallel argument in the ConcatenateCaptureLog method to false from the example in my previous blog.



strXMLAScript = amoServer.ConcatenateCaptureLog(True, False)


Once this is complete, you can execute the package and it will create all the partitions you would like (or at least according to your loop). Or create new partitions (by making a few alterations) as and when they are needed at the end of your ETL but before a dimension and measure group OLAP process is done. This obviously will need to be approved by your SSAS DBA. I have on the odd occasion found that creating SSAS objects like partitions dynamically through code in a scheduled job, without letting your DBA know, can get you into a little bit of hot water…



Hope you find this helpful!



Happy Deving!



(I have added a zip file to SkyDrive with the sample package. It can be found here).

9 comments:

Jason Campbell said...

Thanks for this buddy!

Turns out the first task I need to implement in the new job is to automate the cube partitioning.

Jason

Colin Kirkby said...

Nice! Glad I could help, mate!

Anonymous said...

Your blog keeps getting better and better! Your older articles are not as good as newer ones you have a lot more creativity and originality now keep it up!

Unknown said...

Hi,

Very Useful information. I have implemented the same logic for the processing the partition incrementally using ProcessAdd.

But i am getting the Unexpected Error for incremental loading.

Please help me.

Thanks in advance,
Anantha

Colin Kirkby said...

Hi Anantha,
Does this ProcessAdd issue only occur when you use the supplied code or also when you try processing through management studio?
Kind Regards,
Colin.

Unknown said...

Hi Colin,

Thanks for Reply.

Yes it is happening when i tried to use the code not at the Management studio.


Code Sample ( Logic Alone)

If dt.Rows.Count <> 0 Then
For Each dim_name In amoServer.Databases.FindByName(database).Dimensions()
If dim_name.State = AnalysisState.Unprocessed Then
dim_name.Process(AMO.ProcessType.ProcessFull)
Else
dim_name.Process(AMO.ProcessType.ProcessUpdate)
End If
Next
For Each row In dt.Rows
imonth = row("REVIEW_DATE").ToString()
amoQueryBinding_old = New AMO.QueryBinding
For Each amoPartition In amoMeasureGroup.Partitions()
If amoPartition.Name.Contains(imonth) Then
amoQueryBinding_old.QueryDefinition = GetQueryBindingText(imonth.ToString())
amoQueryBinding_old.DataSourceID = amoPartition.DataSource.ID
amoPartition.Source = amoQueryBinding_old
amoPartition.Update()
If amoPartition.State = AnalysisState.Unprocessed Then
amoPartition.Process(AMO.ProcessType.ProcessFull)
Else
amoPartition.Process(AMO.ProcessType.ProcessAdd)
End If
GoTo NextRow
End If
Next

amoPartition = amoMeasureGroup.Partitions(0)
amoNewPartition = amoPartition.Clone()
amoNewPartition.ID = Partition_Name + "_" + imonth.ToString()
amoNewPartition.Name = Partition_Name.ToString() + "_" + imonth.ToString()
amoNewPartition.EstimatedRows = "1000000"
amoNewPartition.Slice = ""

'Just another check, if the partition already existing then skip
If Not (amoMeasureGroup.Partitions.Contains(amoNewPartition.Name)) Then
'Create new query binding
amoQueryBinding = New AMO.QueryBinding

amoQueryBinding.QueryDefinition = GetQueryBindingText(imonth.ToString)
amoQueryBinding.DataSourceID = amoPartition.DataSource.ID
amoNewPartition.Source = amoQueryBinding
amoMeasureGroup.Partitions.Add(amoNewPartition)
amoNewPartition.Update()
amoNewPartition.Process(AMO.ProcessType.ProcessAdd)
End If
End If
NextRow: Next
End If

Unknown said...

Hi Colin,

Few more information below:

Managing Processing
a. Dimension Processing
b. Partition Processing

During Initial Loading
a. Dimension --> ProcessFull
b. Partition --> Create New partition and Will Do --> ProcessFull

During Incremental Loading
a. Dimension --> ProcessUpdate/Add
b. Partition --> Add records to existing partition and Will Do --> ProcessAdd

Issue Description
-----------------
1. Initial loading works fine through management studio. (Both Dimension and Partition)

2. Incremental Loading Logic working fine for dimension but failing while processing Partition.

Hope this will helps to understand the issue more clearly.

Thanks and Regards,
Anantha

Unknown said...

Hi Colin,

Any suggession from your side.....

Thanks in advance,
Anantha

Unknown said...

Hi Colin,

This article is of great help!!! yet have to implement it....will let you know the outcome of it once I implement it....for sure...!!!

Thanks a ton Col.

Regards

Anup Nair.