Wednesday, March 7, 2012

Checkpoints problem in parallel tasks

Hi,

I have a master package with a sequence container with around 10 execute package tasks (for child packages), all in parallel. Checkpoints has been enabled in the master package. For the execute package tasks FailParentOnFailure is set to true and for the sequence container FailPackageOnFailure is set to true.

The problem i am facing is as follows. One of the parallel tasks fails and at the time of failure some of the parallel tasks (say set S1) are completed succesfully and few are still in execution (say set S2) which eventually complete successfully. The container fails after all the tasks complete execution and fails the package. When the package is restarted the task which failed is not executed, but the tasks in set S2 are executed.

If FailPackageOnFailure is set to true and whatever be the FailParentOnFailure value for the execute package task, in case of restart the failed package is executed but the tasks in set S2 are also executed.

Please let me know if there is any setting that only the failed task executes on restart.

Thanks in advance

Essentially, you want to track the outcome of parallel execute package tasks, and only re-execute those which have failed. The problem with using checkpoint files to accomplish this, is that checkpoint files don't track the status of parallel containers after the first task failure associated with FailPackageOnFailure happens.

What that means, if you have 10 parallel EPTs (execute package tasks), and any one of them has a task failure, none of the subsequently completed EPT tasks, whether they succeed or fail, have their outcomes written to the checkpoint file. So, on package restart, "post last checkpoint file write" tasks will run again.

An easier approach might be to put a for loop around each EPT, and loop until successful, using a variable scoped at the For Loop as a"Go/No Go" decision maker, and max retry count.

However, if you want to do it in SSIS using a restart mechanism, you could roll your own checkpointing mechanism.

Such a mechanism would mean creating an OnTaskFailed event handler which would track the failed EPT SourceID/SourceName's (read TaskID/TaskName).

Then add an OnPostExecute event handler to determine those EPTs which succeeded by inference (they didn't fail). Add in a final "On Completetion" script task to append the successful TaskIDs to a configuration file which would then be read in automatically on subsequent execution.

Lastly, you'd set the the Disable property on each EPT to something like FINDSTRING(@.SuccessfulTaskIDs,@.System::TaskID,1) > 0. You can do it that way, but its not point and click by any stretch.

No comments:

Post a Comment