Categories: Nerd Stuff, SQL Posted by NateChurch on 12/15/2011 11:21 AM | Comments (0)

I have been on a team where we are creating a solution for a corporation that will be deployed to different business units around the world as a standardized solution. In this case the solution is for oil field work and will help each business unit optimize the production of wells based on the injection of water into other wells, a process called waterflooding. The customers for this application will be the engineers who will monitor the metrics in order to make decisions on how to optimize the process by controlling the variables available to that engineer. For instance, they can increase water flow to an adjacent well if they see production drop off from another well or they can even recommend drilling new wells.

                As you can imagine this application is very data intensive. The process of injecting water creates many standard metrics which we are going to measure with our application. Some of these could be water rate, water pressure, or oil out for instance. With a user base that is largely technical and mathematically driven, they want as many different numbers as they can get to make their decision. This can present a problem because the information they want may be outside of the standard dataset we will provide.

                This need for additional data is also driven by each business unit’s differing geography and geology. For instance, in West Texas it is often cheaper to drill a new well if a greater injection rate is needed, where an offshore rig in deep water would find the same solution cost prohibitive. Other the other hand, offshore wells may take more expensive measurements, but performing these same measurements hundreds of times in a West Texas field is too expensive to do. 

                We are tasked with creating a solution that has as many of the basic attributes as possible for our first release so the engineers can make the nest decision, but giving the ability to expand those data sets as needed. We will hand the finished product to each business unit’s IT staff and they will then take over ownership. We will send out patches and upgrades to the solution, which they will then install. Our finished product will consist of a database, ETL, and a user interface, so the extensibility is needed throughout.

                With these goals in mind, I was asked to come up with different solutions for creating an extensible dataset. I have detailed some of these solutions below. My goal was to create a dataset that was both modular, meaning that it was standardized for flexibility, but also extensible, or able to be expaned. Since we are handing the tables over to the business unit IT staff, it is important to maintain logical divisions between the added code. This will make code we have developed less likely to break when extended. It will also facilitate smooth upgrades and patches.

 

Here are the solutions I provided to my client. Obviously certain choices are preferred over others.

1.       Add columns to the original source tables.

a.       Positives: This is arguably the simplest

b.      Negatives:  Presents problems when upgrading the solution and can affect any tuned performance you have set up.

In the sample, our original columns are in blue and the added columns are in red. So, we have added Latitude and Longitude

 

2.       Create custom tables and join them via views to the source tables.

a.       This is another simple solution where you create one table preferably in another schema for every table where you wish to have custom values. In our sample, the solution tables are in the dbo schema and the user tables are in the bu schema.

b.      Positives:

                                                               i.      This table is owned and managed by the users

                                                             ii.      Can be put into separate schema for increased security (which is also nice for use on alternate filegroups or alternate backup strategies if needed)

                                                            iii.      View can auto-extend

Note:  (This can be done by using SELECT dbo.col1, dbo.col2, bu.* …)

                                                           iv.      Helps with upgrades

                                                             v.      Does not affect performance of the original tables outside of the view

                                                           vi.      Users create their own ETL to populate these tables.

                   c.       Negatives:

                                                               i.      Must account for multiples in the custom attribute tables, so row counts stay the same.  For instance, if there are two rows for WellID BH1 then there will be more rows in the final view. Two ways you can handle this via a “TOP 1” or by adding a date added column and taking the most recent.

 In the sample, you can see we joined the table with a simple outer join and the view is populated with values for both tables. Be sure to use an outer join so you have the flexibility to keep your interface running when there are no custom attributes provided by the users. This is especially important because there may be two ETL processes(solution and user) updating a single row in the view. It is also important your interface can handle NULLs.


 

3.       Create one custom value table  and join those via views to the source tables

a.       This solution creates a common attribute table so there are less new tables. It basically creates a virtual or dynamic tables, that you can create via a PIVOT.

b.      Positives:

                                                               i.      Custom value tables are great at keeping all values in one location for copying during upgrades

                                                             ii.      These are helpful because you have them created and added dynamically via the interface. 

                                                            iii.      Users can update the custom valued table using their own ETL.

                                                           iv.      Can be put into separate schema for increased security

                                                             v.      Stored Procedures can be written so they don’t have to change.

c.       Negatives:

                                                               i.      Doesn’t allow for multiples of the same attribute.

                                                             ii.      Requires modification of the views to add new attributes.

                                                            iii.      If you extend to far (too many attributes) you will start to see performance degrade.

 

In the sample you can see the custom table contains the custom values for all the tables. In this case, you would want a Primary, Clustered Index on the first three columns (Table, RowID, Attribute). The values are then added to the original values using a PIVOT clause. Optionally you could create a view that does the PIVOT before the final view for the end to maintain(see dotted line in sample).

 

 

4.       Create an XML column on a lookup table and store the additional information there.

a.       This would involve creating an XML schema and creating XML documents for each row. You could choose to store this in the original table, a custom table for each domain, or one custom table for all domans (similar to the designs of Options 2 And 3 but with XML)

b.      Positives:

                                                               i.      It can be indexed as XML as well.

                                                             ii.      Customer can update these using their own ETL, separate from those provided.

c.       Downside:

                                                               i.       It is difficult to update without an interface to do so.

                                                             ii.      User’s ETL coding is more complex.

                                                            iii.      Views must be manually changed for each new attribute

 

In the sample you can see that the there is a table that contains custom XML attributes. This option could also be created one table for the whole solution (similar to the last option) or it could be created as an additional column on the original table. I have gone away from that, because I don’t like BLOBs attached on the original tables. In this case I wouldn’t so that I can move them to other filegroups if needed.

 

 

Sample Scripts:

I have included some sample SQL scripts for option 3 and 4 in case you wish to see how these might work code wise.

ModularExtensibleDataSets-DynamicTables.sql (2.69 kb)

ModularExtensibleDataSets-XMLColumn.sql (764.00 bytes)

 




Categories: Nerd Stuff Posted by NateChurch on 5/25/2010 11:34 AM | Comments (0)

So from time to time I will write some code. It isn't all that often and I am not that great at it, but I can get it done normally with only a few drinks, clumps of hair and broken objects afterwards. I have graduated over time to C# because it helps if a BI professional can read a developer's poorly written C# code. Today I was dinking around teaching myself how to use the super simple "Web Site Administration Tool". It is a fairly basic interface that allows amateurs like me to add security without too much fuss. Well I was trying to point this app to use a database backend and could not for the life of me get it to work. So in hopes of saving you some time I will show you how I fixed it. Here is the error message:

There is a problem with your selected data store. This can be caused by an invalid server name or credentials, or by insufficient permission. It can also be caused by the role manager feature not being enabled. Click the button below to be redirected to a page where you can choose a new data store.

The following message may help in diagnosing the problem: An error occurred while attempting to initialize a System.Data.SqlClient.SqlConnection object. The value that was provided for the connection string may be wrong, or it may contain an invalid syntax. Parameter name: connectionString

This message of course is bullshizzle because I am using that same connection string to connect my webservice to obtain information. I read all the articles, post and every developer on the planet and I couldn't find a solution. So I had to (GASP!) figure it out for myself. As I suspected I felt pretty stupid when I figured it out, but I will post my solution so you don't spend nearly as much time before you feel stupid. Then again it could be said that if the connection string is good enough for the OleDBConnection object it should be fine for this too. I have wasted too much time to actually sit here and think about this little paradigm anymore, so with all the time you save by reading my solution you can put some thought into this. Just hit digg on your way out. Without further ado, here is the solution.

<connectionStrings>

<remove name="LocalSqlServer"/>

<add name="LocalSqlServer" connectionString="Data Source=blah\devdb;Initial Catalog=superdb;Persist Security Info=True;User ID=User1;Password=nunyabusiness;" providerName="System.Data.SqlClient"/>

<add name="SuperDBConnectionString" connectionString="Data Source=blah\devdb;Initial Catalog=superdb;Persist Security Info=True;User ID=User1;Password=neveryoumind;Provider=SQLOLEDB;" providerName="System.Data.SqlClient"/>

</connectionStrings>

So here are my connection strings, as found in the web.config for those of you playing along at home(the passwords and databases have been changed to protect the innocent, duh!). The first connection string is the one that works and the second is the one I was using for everything else. It seems that the SqlMembershipProvider chokes when you have the clause, "Provider=SQLOLEDB;". You don't need it, so if you leave it out it should work. I have it because I do switch back and forth between other oleDB providers, but I will cross that bridge later. The fact that it says larger than life "providerName" is not lost on me, but if you use this connection string with an OleDBprovider it doesn't seem to care whether you have that there or not. In fact I have changed it to a MySQL database just to see if it would work and it did. I know you aren't reading still but if you are, Thanks.