Sunday, February 20, 2011

Tutorial 2: How to establish a connection to SQL Server using Excel 2007-2010.

Most people know how to use excel, but usually the data is provided to them in a file. What if the data is not provided and it is only accessible by connecting to a SQL server instance? This is common in businesses where a data warehouse is set up, and business users are forced to analyze the data by connecting to either an OLAP cube (click for more info) or a table in SQL server. I am not going to talk about OLAP cubes today, so I am just going to focus on how to use Excel with a direct SQL Server source.

A few things to consider:
  1. Make sure you have access to the SQL Server. Usually, Database Administrators will limit the access of the databases to a few key people. Typically, business users will have access only to the data warehouse database for deeper analysis, so make sure you communicate with the DBA to ensure you have rights.
  2. While most data warehouses are designed with the business user in mind, sometimes table names and column names are not named in an easy to understand manner. Make sure you understand what the definition of every measure and dimension is in the database.
  3. Excel always needs to be refreshed to see new data, so if you establish a connection to SQL Server and then save the file, that will be data at the time you initially created the file. If you need to see current data, then you will need to refresh.
So let's get started!

First, open Excel and click on the Data tab. Now click on "From Other Sources," and click on "From SQL Server." You should see a dialog box that looks like this:



This is always the standard connection box for establishing a connection to SQL. 

Now put in the information for the server name and the authentication, and if you do not know this information, please get with your Database Administrator to find this out.

Next, it will ask for what database and table that you wish to connect to. This information should be provided to you by the Database Administrator. Once you know, go ahead and make the appropriate selections and click next. For my example, I am connecting the DataWarehouse database and am accessing the rpt_PLRV2_ALLSites table.





Now, it will ask where you would like to save the connection file. I recommend storing it in the default location it brings up: C:\Users\richecker\Documents\My Data Sources.

Once saved, you will then be prompted with the following dialog box:


If you wish to perform analysis on aggregations of the data, I would recommend using a pivot table, but if you just need to export the data, the table option is all you will need. For purposes of this tutorial, I am going to use the Pivot table option

Select Pivot table, and now it will start transferring all of the data from the server to Excel. Depending on how large the table is, this could take some time. I usually recommend using this approach on small data tables.

Now you will be presented with the usual pivot table editor and now you can start massaging the data you best see fit.

I wanted the report by Product ID, Product Code ID, and dates of how many confirmed leads we had. Therefore, I made the selections and this produced the final product:


Note: you can do the same manipulation through excel as if it were just a regular excel document where all the data was located on a worksheet. The only difference is the data is not located on a worksheet; it is located on a SQL Server instance.

**WARNING**

This data only updates when you choose to refresh. If you want the latest data available when working with this sheet, you must go to the data tab and click the refresh button:


Please note that in Excel 2010, PowerPivot is available as a plugin for excel, and I would recommend using this plug in if you have Excel 2010. For more information on PowerPivot and how to get started, please check out this video below:


*Pictures provided by Richard Ecker, 2010.
*Video provided by YouTube user: sqlserver 

Sunday, February 13, 2011

Excel Project

Today, I completed an excel analysis project that addressed the effectiveness of a particular workout. The data was broken out by age and gender, and the goal was to see if the heart rate of these people ever hit the target heart rate. To do this type of analysis, I needed some data such as max heart rate, target heart rate, target achieved, and the percentage increase. The problem was, I only had the raw data so I had to calculate the data. To get the max heart rate, all I had to do was enter in the cell =max(startingcell: endingcell) and I copied that formula for all rows. To get target heart rate, I just had to enter the target percentage on the worksheet and then come up with a formula that addressed that cell absolutely. So under the target heart column I entered in my calculation as = $E$1*D3. Also, I learned to make a cell reference absolute instead of relative, all you have to do is hit the F4 in the formula right before the cell. This inserts the $ signs automatically. I copied this cell for the rest of the entries to get the target heart rate of each individual. All that was left was to come up with a check to see if the max heart rate was greater than or equal to the target heart rate. I learned that this statement (=IF(F3>=E3,"Yes","No")). This was copied to the rest of the cells. To get average increase, I just made the formula =(F3-I3)/I3 where F3 is the max column, and I3 is the starting column, and then I just copied that formula for the rest of the entries.
Now we have the data that we want such as target heart rate, max heart rate, and whether or not the workout was effective. However, it was only for each person and there was no aggregations being done with the data. I learned that a pivot table will provide all of the different aggregations for this type of data. It was as simple as clicking on the Insert tab and selecting Pivot table. It defaulted to the correct range of data, and I hit ok. It now was in a new worksheet, and all I had to do was drag and drop the fields I wanted to either columns, rows, or aggregation. Since we wanted it broken down by gender, that can be inserted into the rows, the target achieved field could be dragged into the column portion, and the id column can be dragged to the aggregate block since we want a count of people. The end result of the pivot table options should look like the following:

Please note that you have to click on the aggregate field in the pivot table options and click value field settings. This needs to be changed to a count instead of a sum. Once this is performed, we now get a table that shows the number of people broken down by gender of whether or not they achieved their target heart rate.

The final result of the analysis is displayed here:









The last piece of analysis was to create another pivot table which was done the same way as before. Only, the pivot table options should look like this instead:

 Now, on the pivot table itself, we want the data grouped by age by decade, so just right click the age column, click group, and put in the number 10 to break it down into decades. I also learned to get rid of the grand total column, you just have to right click the pivot table, click on pivot table options, select the totals and filters options, and then just uncheck the grand totals boxes.
The end result looks like this:











Overall, I now feel a lot more comfortable working with pivot tables, and I learned that in some cases it does not matter if you put the fields you want to group by in either the columns tab or the row tab. I also learned that in the page layout options, you can put in a header or footer there instead of having to put the header on the actual worksheet and specifying that particular row to repeat every page. This project has definitely boosted my business analysis capabilities.

Sources for pictures:

Richard Ecker via Snipping tool