How to move data from Excel to SQL Server2008- Act 1 Scene 1

So, I have undertaken a massive BIG project in Educational Data Mining which the first step is data collection. After collecting the data and most of it is in excel format. So to make any sense out of it was imperative that I convert it to a relational database format. Now this gets interesting, for I have a couple of choices to accomplish this feat;

a. Manually copy and paste the data which is like “Yucks”… and really I have no intention to go insane at such a young stage of my life :-)

b. Programmatically, but how….? Million dollar question.. Read on,

  1. Ensure that you have first installed Visual Studio 2008 and then download and Install MS SQL Server 2008 Enterprise Edition R2 and install Service Pack 1 for the same. The benefit of installing SQL Server 2008 is that it comes bundled up with Business Intelligence Studio. You must install Visual Studio should first followed by installing SQL Server 2008. I have wasted several productive hours on trying to integrate higher versions of VS and SQL Server has led me to believe that this is the correct method to avoid throwing your computer out of sheer frustration.
  2. After the installation is completed you can restart your computer.

In this example project, I will describe on how to import data from excel table into SQL Server 2008 by using the SQL Server Integration Services (SSIS) tool.

Let the fun begin

Assume, you have an excel spreadsheet of student records that you want to import into SQL Server. Given below is a sample screenshot. Student records Sample

 Step 1: Create a SSIS Project

Go to SQL Server Data Tools or Visual Studio and then choose to create a New Project. Ensure that you choose the right project template as shown.New Project

SSIS will now create a new project, and also (by default) a new package too, Package.dtsx that is the default naming convention. There are two ways you can see that this is what’s happened. One is that you can see the package in Solution Explorer:packageName

(if you can’t see Solution Explorer, choose‘View’ from the top menu bar and then choose ‘SolutionExplorer’ from the menu to show it. Alternatively you can also press the keys ‘Ctrl’, ‘Alt’ and ‘L’ together on the keyboard and release them together to show the solution explorer window.)

The other one which you see shown to you on the screen packageName1Please note that by default you are put in Control Flow view, which is like a flow diagram showing the sequence in which tasks that you create will execute.default view Step 2: Create a connection to the SQLServer Database

Method 1: If you look at the solution explorer, You will see ‘Data Sources’. Right Click on it and choose ‘New Data Source’. Add new data Source Wizard

b. When you click the button ‘Next’, you will see the following: New Data sourceBy default the radio button titled ‘Create a data source based on existing or new connection’ will be selected. Click the button ‘New’ to create a new data source. When you click the New button you will see this connectionMgr10

c. As shown in the above screenshot if you have chosen the server name and the database name correctly and when you click on the button ‘Test Connection’ you should see the following ‘Test Connection succeeded’ message box. connectionMgrSucess

d. Finally, if you look at the Solution Explorer, under Data Source you should see the database connection as shown: connectionMgrSucess1

Hope you were able to accomplish this much. If you have encountered any problem at any step, please leave your comment and I will be happy to answer it for you. In the next post I will show you how to create tables in SQL by using the Business Intelligence Development Studio (BIDS) 2008, populate data from excel files into sql server database. Till then stay safe and be well.

Thank you for reading.

SSIS unable to bulk copy data you may need to run this package as administrator

I have started on a project work. One of the preliminary tasks is to extract data from excel tables and populate them into SQL Server Tables. One method was to do a copy and paste but the data size that I’m dealing with is in several millions of rows.SSIS

Therefore the alternative and easier solution is to use SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks. SSIS is a platform for data integration and workflow applications.

So I installed MS SQL Server 2014 Enterprise edition with Business Intelligence package in it. In another post I will detail the steps to create a simple ETL package but for now I will stick to the solution of this problem.

The solution is instead of using SQL Server object use the OLEDB Destination instead.  Otherwise you may need to edit your local policy to grant the “Create global objects” permission in order for the SQL Server Destination to work properly i.e.

“Users who execute packages that include the SQL Server destination require the “Create global objects” permission. You can grant this permission to users by using the Local Security Policy tool opened from the Administrative Tools menu. If you receive an error message when executing a package that uses the SQL Server destination, make sure that the account running the package has the “Create global objects” permission.”

SQL Server Destination Reference http://msdn.microsoft.com/en-us/library/ms141095.aspx.

Error: Python version 2.7 required, which was not found in the registry- Solved

Python errorToday while installing, scipy for Python ver 2.7 I came across this error message on my Windows 7 OS 64 bit system.
Please note, I already have Python 2.7 installed.
The solution is;

1. Open notepad and copy paste the following code in it.

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python]
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore]
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.7]
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.7\Help]
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.7\Help\Main Python Documentation]
@="C:\\Python27\\Doc\\python276.chm"
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.7\InstallPath]
@="C:\\Python27\\"
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.7\InstallPath\InstallGroup]
@="Python 2.7"
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.7\Modules]
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Python\PythonCore\2.7\PythonPath]
@="C:\\Python27\\Lib;C:\\Python27\\DLLs;C:\\Python27\\Lib\\lib-tk"

2. Save the file on desktop as key.reg

3. Click Start menu on desktop then click on Start search and type regedit and press enter key.

4. Registry editor window opens. Click on File menu and then click on Import. Navigate to desktop and then import the file named key.reg. You will see a message that will say something like the key was sucessfully imported.

5. try installing scipy again. It will work

Clustering With K-Means in Python

Ashish Dutt:

Very interesting read.

Originally posted on The Data Science Lab:

A very common task in data analysis is that of grouping a set of objects into subsets such that all elements within a group are more similar among them than they are to the others. The practical applications of such a procedure are many: given a medical image of a group of cells, a clustering algorithm could aid in identifying the centers of the cells; looking at the GPS data of a user’s mobile device, their more frequently visited locations within a certain radius can be revealed; for any set of unlabeled observations, clustering helps establish the existence of some sort of structure that might indicate that the data is separable.

Mathematical background

The k-means algorithm takes a dataset X of N points as input, together with a parameter K specifying how many clusters to create. The output is a set of K cluster centroids and a labeling…

View original 898 more words

Data Processing with Weka (Part II)

Today, I will discuss and elaborate on data processing in Weka 3.6 (it’s the same in version 3.7 too). This post is the second part in the series of “Data pre-processing with Weka”. If you have not seen my earlier post, you are directed to see that first.

Continuing further, assuming that you have cleaned the data at hand and its now “noise free”. The next step is to process it. Now how do we go about doing this?

Often when we have a dataset, the most common question that comes to our mind is that which attributes are the most closely related to each other such that a relationship between them could be defined? Attribute is synonymous to column heading and instance is synonymous to a row of record. Let’s take the soybean dataset that comes free with Weka to answer these questions.

Click on Edit as shown in the picture to see if the data has missing values in it or not.

Missing values

As it’s evident in the screenshot here, that this soyabean dataset is noisy. The same can be seen in this screenshot when it’s opened in Weka.

Missing values in Weka

So first step is to clean it. So either you can check my previous post on data cleaning or else the other option is to manually clean the dataset. If you want to manually clean the data for that, you will first have to save the dataset in .CSV (comma separated value) first. This is shown below.

csv format

And then open it in Microsoft excel and then manually search for outliers in the data and remove it.

How to manually delete outliers in MS Excel

In case you don’t know how to do this, here is a short tutorial on the same.

Step 1: Open the dataset in Microsoft Excel

Step 2: Ensure that the column headings row is selected. column headings

Step 3: From the Editing ribbon on the top of screen, click on the drop down besides “Sort & Filter” as shown in screenshot and click on “Filter”. filter poistion

You will see that each of your column heading has a filter to it now.

filter added

Step 4: click on the filter, check if any column has an outlier value like a ? mark or any other value that you think is incomplete, then delete it as shown in figure.delete outliers1

Note: Please ensure that you are deleting the rows and not pressing the delete button on the keyboard. Because that will only delete the values and your data will still have the blank values. Therefore to delete the rows that contain outliers, first select the rows then right click on the selected rows and from the drop down menu, click on the option “Delete rows” to delete the rows.

Step 5: Save the file

Step 6: Now open this file in Weka you will see as shown below in the figure that there are no missing values in the dataset. Save it in .ARFF format and you are done No missing values

Now, we are ready to answer the question that we asked in the beginning. To reiterate, how to find out the attributes that are related to each other and can constitute a relationship. Looking at this soybean dataset, we have 36 attributes and 562 instances. attribute&instances count

Next, to find the relationship between the attribute, in the Weka Explorer tab, click on “Select Attributes” and then under Attribute Evaluator, click on button “choose” and select the option, “FilteredAttributeEvaluator”, you will see a dialog box, click on Yes. Below that, make sure from the drop down menu you choose “No class” as shown in the figure.filter attribute

Click on Start to find attributes related to each other. If you have followed the aforementioned steps correctly then you should see an output similar to the one as shown.attribute list

Hope you learnt something meaningful and I was able to bring a smile to you, if yes do leave a comment. See you soon, until than take care.

Java IO Exception when using Weka CSVLoader

Well today, I was trying to load a csv file in Weka when I got the dreaded error message . I had seen this java error message “java.io.IOException: wrong number of values. Read 28, expected 18, read Token[EOL], line 25 “on an earlier occasion too and that time having no clue on how to fix it I ended up deleting most of the columns of the dataset. Perhaps a brief background first that causes this problem message. Look, if you have very noisy data especially if you are trying to consolidate a dataset from a datasource then typically you will be coping several columns and pasting them into a .xls or CSV format file. Well, I was attempting to do something similar too.


Java Error

So when I fired up Weka and wanted to convert the .csv file to .arff format using Weka Arff viewer, I got the above error. In order to solve it all you have to do is to check for any trailing comma signs at the end of the text or any double quote sign in between or anywhere within the text or simply put any forms of punctuation marks within your datajava error causeset. The cause for this error is shown in the picture.

As you can see this text in data contains all the punctuation marks. So to resolve this java error, Remove any punctuation marks in your dataset and then try loading the data again in Weka. it will work.

Hope this helped you.

 

Fix: Arff file not recognised or Unable to load data in Weka

If you are a beginner to Weka then one of the most common problems that you might face is that you are unable to load your dataset into Weka because either it gives you an error such as ” Reason: java.io:IOException premature end of file, read Token [EOF], line 1″ or “Arff file not recognised” or some other similar error.
 
In one of my previous post, I had provided the link to a blog that had answered this question. However, it was today when I was helping out someone else with the similar problem I realised that the most important step (Step 2 given below) was missing from the instructions on this blog. Therefore, I decided to write this post for the benefit of all.
To solve this error the solution is given below.
 
Step 1: Open your datafile in Excel/Access or whatever is its format
Step 2: Save the dataset file in CSV format. To save the file in .csv format do the following steSave asps,
Click on File—>Save As—> Click on the drop down menu besides ‘Save as type’ and change it to CSV. You will see a dialogue saying something like “Some features will be lost blah blah blah”… Just click on Yes.
Step 3: Now go to Weka and then click on Tools–> Arff Viewer 
Step 4: In Arff Viewer window–> Click on File–> click on Open —> go to the location where you saved the datafile in .CSV format, choose the datafile and then –> Click on OpenfileOpened
Step 5: Viola, your datafile will be open in Arff viewer. Now all you have to is to save it in Arff format by changing its extension to arff and in save as type choose arff.
 
Now the reason why you were getting that java error, because, the instructions on that blog which I had guided you too did not state that you first have to change the datafile format which is in excel or access or whatever format to .CSV (Comma Separated Value) format. Always remember, for Weka to open your data file, your dataset should first be converted into CSV format. Only then Weka will be able to load it in the ARFF viewer and subsequently will let you save it to .arff format
Hope you found this useful.
Follow

Get every new post delivered to your Inbox.

Join 32 other followers