SSIS Blowing Up Your Log?

Posted by Josh | Posted in SQL Server | Posted on 14-12-2009

1

If you have an SSIS package that is causing your transaction log to explode, one of the first places to look would be at the ‘FastLoadMaxInsertCommitSize’ property (this assumes you are using an OLEDB destination). As its name suggests, this property determines the size of batches that SSIS will contain within a transaction before committing. I.e., if you set this to 100,000, SSIS will attempt to insert your data in batches of 100,000 rows, with each being contained within it’s own transaction.

By default this is set to zero, which tells SSIS to attempt to contain the entire bulk insert within one transaction. In my case, that meant over ten million rows! Needless to say, my 2GB transaction log filled up in no time. By changing this value to a more manageable 1,000 rows, the package completed without issue.

Note that in some cases containing your operations within a transaction is a good thing. In this case, it was not necessary, as the package is performing a simple bulk transfer to a reporting database, no atomicity is not a concern.

Linked Servers – Good or Bad?

Posted by Josh | Posted in SQL Server | Posted on 05-12-2009

Tags: ,

0

From Linchi Shea’s blog:

In SQL Server, it is rather handy to retrieve data from a different SQL Server instance and use the result locally in another SQL statement for further processing. In theory and in the set purists’ fantasy land, it shouldn’t matter where you get your data or even how you get the data as long as you can use them to further compose a set-based solution.

That is all fine. After all, in a distributed environment the data you want may not be on the same instance where the processing takes place. Well, that is fine until it comes to performance and troubleshooting. If you use linked servers indiscriminately, sooner or later you’ll run into these issues, and they are not pretty.

We have used linked server functionality at various points at work. As Linchi points out, it can be quite useful when you have data distributed across multiple systems. But I have seen cases where it’s caused issues, such as excess network traffic, authentication problems (anyone actually been able to get Kerberos auth to work for a two-hop Windows auth’d linked server?), and general performance malaise.

So what are some alternatives?

  • Replication – true, it may not solve the network issue, and depending upon the type used, it might cause a delay in data transfer. But it would allow for queries to execute against a single server.
  • Static data transfer using SSIS – I would say this would be fine for offline use of data such as reporting. There have been very few cases in my experience where true real-time data has been a hard business requirement, though at times the point has been, shall we say, vigorously argued.