Using SSIS to transfer data from multiple SQL tables by executing join query that writes result to CSV

Problem: Using SSIS to transfer data from multiple tables using SQL join query into a single CSV table

So after the data was transferred to SQL Server, the next step was to analyze it using Weka. But Weka understands only Arff format file and that can be converted if the data is in CSV (Comma Separated Value) format. Now the problem is that SQL Server does not provide any explicit means to export data to CSV format. The most you can have it is in Excel spreadsheet format. So the solution is to use SSIS (SQL Server Integration Services) or BIDS. One can use the Import/Export Data feature of SSMS but it works well for small dataset size. I’m dealing here with a hundred thousand million rows of data and I found Import/Export if used to simply hang the system. Therefore, my suggestion is to use SSIS for such voluminous data transfer quickly and efficiently. Today, I will show how to use SQL Query that joins two tables and then writes the data to a CSV file dynamically.

Solution

  1. Launch SSIS and create an Integration Services Project
  2. Launch MS Excel and create a new blank worksheet. Save it as CSV file. Provide the header column names to this file. Save and exit (On saving the file in CSV format you will be prompted that something blah blah will be lost.. just ignore it and click on save)
  3. Drag a Data Flow Task and drop it in the control flow area.data flow task
  4. Right click in the connection managers pane on the bottom and create an OLE DB Source connection and a Flat File Connectionconnections
  5. Point the OLE DB Source connection to your database in SQL Server and configure it accordingly configure ole db source
  6. Point the Flat File connection manager to the CSV file created above in Step 2. Make sure you place a check mark on the box at the bottom that says “column names in the first data row” configure flat file destination
  7. Drag and drop an OLE DB Source object and a Flat File Destination object into the data flow task area. data flow task1
    1. Configure the OLE DB Source as given below
    2. In Connection manager choose the OLE DB connection that we created in Step 5
    3. In Data access mode, click on the drop down and choose SQL Command
    4. In SQL Command Text box type your SQL query. (To be on the safe side, please ensure that you have parsed and executed this query in SQL Server. This is to validate that your SQL Query is correct)
    5. Optionally, you may click on Parse Query button and this will tell you if your SQL query is syntactically correct or not.
    6. Next, click on Columns and ensure that you see only those columns that you have mentioned in your SQL query. An interesting feature, I will share with you here. Assume your database table has six columns and Click on Ok button now. OLE DB Configuration is completed. If all went Ok then the red x on it would vanish. ole db source editor configuration
    1. Now double click on Flat File Destination and configure it as follows:
    2. First ensure that have configured the Flat File Connection manager as stated above in Step 6. If not then configure it first.
    3. Assuming that the Flat File Connection manager has been configured so now ensure that from the drop down under flat file connection manager, the connection manager is listed.
    4. Next click on Mappings and ensure that all the requisite columns are mapped properly. If all is correct then finally click on OK button flat file dest editor-1
  8. Save your package and then run it.Success
  9. Viola, package execution is successful and you shall have your required data in CSV format which you can now easily convert it into Arff format for Weka to parse.
Advertisements