Archive for: July 7th, 2017

Assume that the red line is the regression model we learn from the training data set. It can be seen that the learned model fits the training data set perfectly, while it cannot generalize well to the data not included in the training set. There are several ways to avoid the problem of overfitting.

To remedy this problem, we could:

Get more training examples.

Use a simple predictor.

Select a subsample of features.

In this blog post, we focus on the second and third ways to avoid overfitting by introducing regularization on the parameters βi of the model.

One of the simplest concepts when computing graph based values is that ofcentrality, i.e. how central is a node or edge in the graph. As thisdefinition is inherently vague, a lot of different centrality scores exists thatall treat the concept of central a bit different. One of the famous ones isthe pagerank algorithm that was powering Google Search in the beginning.tidygraph currently has 11 different centrality measures and all of these areprefixed with centrality_* for easy discoverability. All of them returns anumeric vector matching the nodes (or edges in the case ofcentrality_edge_betweenness()).

This is a big project and is definitely interesting if you’re looking at analyzing graph data.

The possibility to use both technologies together is very interesting. Using graph objects we can store relationships between elements, for example, relationships between forum members. Using R scripts we can build a cluster graph from the stored graph information, illustrating the relationships in the graph.

The script below creates a database for our example with a subset of the objects used in my article and a few more relationship records between the forum members.

In this module you will learn how to use the Drilldown Choropleth Custom Visual. The Drilldown Choropleth is a map visual that displays divided regions that are highlighted indicating the relative value in each location.

Overall, the script is longer at nearly double the lines but where it shines is when adding new columns.To include new columns, just add them to the table; to exclude them, just add in a filter clause.

So, potentially, if every column in this table is to be tracked and we add columns all the way up to 1,024 columns, this code will not increase.Old way: at least 6,144.New way: at least 2,048.Dynamic: no change

Read on for that script. Even though his developer ended up not using his solution, Shane has made it available for the rest of the world so that some day, someone else can have the maintenance nightmare of trying to root out a bug in the process.

Those who have studied In-Memory OLTP are aware that in the event of “database restart”, durable memory-optimized data must be streamed from disk to memory. But that’s not the only time data must be streamed, and the complete set of events that cause this is not intuitive. To be clear, if your database had to stream databack to memory, that means all your memory-optimized data was clearedfrom memory. The amount of time it takes to do this depends on:

the amount of data that must be streamed

the number of indexes that must be rebuilt

the number of containers in the memory-optimized database, and how many volumes they’re spread across

how many indexes must be recreated (SQL 2017 has a much faster index rebuild process, see below)

the number of LOB columns

BUCKET count being properly configured for HASH indexes

Read on for the list of scenarios that might cause a standalone SQL Server instance to need to stream data from disk into memory to re-hydrate memory-optimized tables and indexes.

I ran this several times to see if there was a pattern to the madness, and it turned out it was. All waits were concentrated in database ID 2 – TEMPDB. Many people perk up by now and jump to the conclusion that this is your garden variety SGAM/PFS contention – easily remedied with more TEMPDB files and a trace flag. But, alas- this was further inside the TEMPDB. The output from the query above gave me the exact page number, and plugging that into DBCC PAGE gives the metadata object ID.

His conclusion is to reduce temp table usage and/or use memory-optimized tables. We solved this problem with replacing temp tables with memory-optimized TVPs in our most frequently-used procedures.

A .Net tick is a duration of time lasting 0.1 microseconds. When you look at the Tick property of DateTime, you’ll see that it represents the number of ticks since January 1st 0001.But why 0.1 microseconds? According to stackoverflow user CodesInChaos “ticks are simply the smallest power-of-ten that doesn’t cause an Int64 to overflow when representing the year 9999”.

Even though it’s an interesting idea, just use one of the datetime data types, that’s what they’re there for. I avoid ticks whenever I can.