New Power Query Feature; Small Detail

There a few enhancements to Power Query that were released this month. See this video from Microsoft Server and Cloud for an overview.

I tested out #3, the additional DateTime filters and noticed some behavior that isn’t explicitly stated nor very intuitive (at least for me anyway). Let’s assume today is 6/28/2014; the following M functions will return the following time ranges:

= Table.SelectRows(ChangedType1, each Date.IsInPreviousNDays([DATE], 250))    returns 10/21/2013 to 6/28/2014
= Table.SelectRows(ChangedType1, each Date.IsInPreviousNWeeks([DATE], 40))    returns 09/15/2013 to 6/28/2014
= Table.SelectRows(ChangedType1, each Date.IsInPreviousNMonths([DATE], 7))    returns 11/01/2013 to 6/28/2014
= Table.SelectRows(ChangedType1, each Date.IsInPreviousNQuarters([DATE], 3))  returns 07/01/2013 to 6/28/2014
= Table.SelectRows(ChangedType1, each Date.IsInPreviousNYears([DATE], 1))        returns 01/01/2013 to 6/28/2014

I would have assumed that these functions would have returned the dates filtered at the date level. But that is not these case, each function returns the first day of that level. For example, if you want the last 3 weeks, you’re going to get all the dates starting from the first day of the week, 3 weeks back, not 21 days back.

Also note that the sub functions in the queries above (like Date.IsInPreviousNWeeks) return only true or false; which is then iterated for “each” row in the ChangedType1 set.

Give it a shot for yourself and Happy Power Querying!

Categories: Power Query Tags: , , ,

Power Query Fill Down; Practical Example

January 30, 2014 Leave a comment

Links related to this post:
Retro Sheet
Microsoft Office Fill Down Reference

Power Query Overview
The guts behind Power Query is the “M” Language. Once we specify our source data, the Power Query engine iterates through user-defined steps/formulas to transform it into a useful data set. The process is much like building a macro where each step depends on the previous step. Each step is a formula written in the “M” language that can be automatically generated from Power Query Designer toolbar buttons and drop down menus or, the sequence of steps can be written by hand if you need functionality not available in the user interface.

Practical Problem
You’re tasked with extracting, transforming and loading flat files of loosely structured data into a Data Warehouse. Typically, a flat file will have a header row and each detail record represents the same type of data as the rest until the end of the file. Our flat file has no header row and each row can be of a different type. Luckily, the data is consistent and the first column in each row tells us the type of row it is. In the image below, an “id” record tells us the identifier for 1 baseball game. The “version” record tells us the data format in which the game was recorded. And, the “info” record gives us 1 piece of information about the game specified in the closest “id” record above. There are multiple games in the 1951.EDA file, so we need to somehow match the game id “DET195106102”, to each “info” record below it, but not past the next “id” record:

And we have to do this using Power Query.

Practical Solution
The key to the solution here is the Table.FillDown function. This function can also be automatically written via drop down menu when right-clicking on a column:

Practical Example

The FillDown function is the key piece of code in our solution, but there are many steps needed beforehand to get us setup. In the link at the top of the page, you can download these files from RetroSheet.org, just make sure you read their terms and conditions. There are many files on this site and this particular problem file is an Event file. Once you have those downloaded somewhere you can access them from your Excel 2013 instance, follow these steps to play along at work (or at home if you’re REALLY bored):

1. Browse to your Power Query tab in Excel and click “From File”>”From Folder”:

2. Specify the folder in which your baseball files are located (yes, Power Query iterates through multiple files!) and click OK:

3. You should see a list of all files in the folder appear in a tabular format. We only want to concern ourselves with 4 different types of file extensions, but before we filter, we need to click the “Load more” link in the filter/sort dialog to get all distinct file extensions. Otherwise, we could get unexpected results. See link below:

4. Now that we have all of our distinct file extensions, we can feel confident about select the 4 file extensions that we are concerned with “.EDA”, “.EDN”, “.EVA” and “.EVN”:

5. Now that we have filtered the files, let’s clean up a bit removing all of the unnecessary columns (yes, we can remove the filtered column without losing our filter condition, don’t worry, it’s all stored in memory ;)):

6. This leaves us with the Content columns which, by default, stores the content of each file in a binary column. We can expand each row in each file by simply clicking the double-down arrow in the right corner of the Content column:

7. Voila! Now we have a table with every record… from every file. Next, we need to separate out the record type column by delimiting by the leftmost comma:

8. Set your delimiter parameters as such:

9. Since we’re trying to build some referential integrity into our data set, we need to add a new column that will explicitly link all of our “info” and “version” records types to the proper game id. To do that, we’ll need to click on Insert Custom Column from the Home ribbon and fill in the following formula (if [Column1.1] = “id” then [Column1.2] else null):

10. It’s important to specify “null” for the non “id” records to allow the FillDown function to work it’s magic. Click on the Fill Down menu item:

11. Voila again. If we scroll down the file, we can see that the Fill Down function fills null values with the last non empty value in that column sorted from 1st record to last:

Now you can proceed with loading a more properly structured data warehouse.

Here is the “M” language formula code that was generated by our user interface actions:

let
Source = Folder.Files(“D:\Data\RetroSheet\RawData”),
FilteredRows = Table.SelectRows(Source, each ([Extension] = “.EDA” or [Extension] = “.EDN” or [Extension] = “.EVA” or [Extension] = “.EVN”)),
RemovedColumns = Table.RemoveColumns(FilteredRows,{“Name”, “Extension”, “Date accessed”, “Date modified”, “Date created”, “Attributes”, “Folder Path”}),
CombinedBinaries = Binary.Combine(RemovedColumns[Content]),
ImportedText = Table.FromColumns({Lines.FromBinary(CombinedBinaries,null,null,1252)}),
SplitColumnDelimiter = Table.SplitColumn(ImportedText,”Column1″,Splitter.SplitTextByEachDelimiter({“,”}, null, false),{“Column1.1”, “Column1.2”}),
ChangedType = Table.TransformColumnTypes(SplitColumnDelimiter,{{“Column1.1”, type text}, {“Column1.2”, type text}}),
InsertedCustom = Table.AddColumn(ChangedType, “Custom”, each if [Column1.1] = “id” then [Column1.2] else null),
FillDown = Table.FillDown(InsertedCustom,{“Custom”})
in
FillDown

Cheers,
Nick.

SQL Server PDW Replace Special Characters

September 3, 2013 Leave a comment

If you run across special characters (such as tabs, carriage returns and line feeds) and need to manipulate them in some manner in PDW, you need to use Hex representation since Decimal representation is not available in version (Microsoft SQL Server 2012 – 10.0.4176.0 (X64) Apr 12 2013 19:03:46 Copyright (c) Microsoft Corporation Parallel Data Warehouse (64-bit) on Windows NT 6.2 <X64> (Build 9200: )). See the sql below to better understand:

/*
CREATE TABLE [dbo].[PDWTest] WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = REPLICATE) AS
SELECT CONVERT(varchar(255), 0x746573740A) [TestColumn] UNION ALL
SELECT CONVERT(varchar(255), 0x74657374) [TestColumn] UNION ALL
SELECT CONVERT(varchar(255), 0x7465737409) [TestColumn] UNION ALL
SELECT CONVERT(varchar(255), 0x746573740D) [TestColumn] UNION ALL
SELECT CONVERT(varchar(255), 0x746573740D0A) [TestColumn]
*/

/*
How to translate varbinary to sql representation (Hex = Ascii Character = Decimal):
0x74 = 't' = CHAR(116)
0x65 = 'e' = CHAR(101)
0x73 = 's' = CHAR(115)
0x74 = 't' = CHAR(116)
0x0A = [tab] = CHAR(9)
0x0D = [line feed] = CHAR(10)
0x09 = [carriage return] = CHAR(13)
*/
SELECT TestColumn AS [Standard View]
	,'*' + TestColumn + '*' AS [Standard View With Asterisks]
	,CONVERT(varbinary(8000), TestColumn) AS [Binary View]
  --,REPLACE(TestColumn, CHAR(10), '') AS [This Fails in PDW] --'CHAR' is not a recognized built-in function name.
    ,REPLACE(TestColumn, 0x0A, '') AS [Use Hex Code in PDW]
	,REPLACE(TestColumn, 0x0D0A, '') AS [Multiple Hex codes] --Think of (0x) as a single quote (') used to specify the beginning of a hex code, there is just no closing equivalent in hex
	,REPLACE(REPLACE(REPLACE(TestColumn, 0x0D, ''), 0x0A, ''), 0x09, '') AS [Nested Example]
	,'*' + REPLACE(REPLACE(REPLACE(TestColumn, 0x0D, ''), 0x0A, ''), 0x09, '') + '*' AS [Standard View of fixes]
	,CONVERT(varbinary(8000), REPLACE(REPLACE(REPLACE(TestColumn, 0x0D, ''), 0x0A, ''), 0x09, '')) AS [Hex View of fixes]
FROM PDWTest

The sql statement above produces these results:
Capture

Download SQL

ASCII Table Reference

SSIS 2012 Unquoted Semicolons

If you’re getting this error:

Error saving YourPackage.dtsx: The connection string components cannot contain unquoted semicolons. If the value must contain a semicolon, enclose the entire value in quotes. This error occurs when values in the connection string contain unquoted semicolons, such as the InitialCatalog property.

you may have a semicolon in your password. If you’re trying to implement Project Parameters in 2012 SQL Server Data Tools, simply encapsulate your password parameter value in double quotes, such as “my;Password”:

Categories: SSIS

Hello world!

March 23, 2011 1 comment

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

Categories: Uncategorized