SEO

This article will show how we used the approach of genetic algorithms to optimize the distribution of links within
web sites to maximize the average rank of individual web pages.

The theory

We begin by explaining how about a link analysis algorithm ranking (LAR), which are used by search engines to make
an 'order (or ranking) of web pages.
It infers importance to a web page based on the topological structure of the graph, extracted by analyzing the links
of the site in question, analyze and along the arches outgoing and incoming arcs, that is, outgoing links and
incoming links of pages that make up the graph.
Based on this information is a value associated with each page, which will be used for sorting.

The predecessor of link analysis algorithms ranking is InDegree which calculates the popularity of a page, taking
into account the number of pages that have a link to it.
Algorithms latest and most refined are:

PageRank

Kleinberg (hereinafter called HITS)

Salsa

PageRank follows a random graph of the web, where every page spreads its weight to the pages to which it links,
resulting in an array of weights appointed following weights of authority.
The complete algorithm for calculating the PageRank relies on the use of the theory of Markov processes.

Kleinberg proposed schema of a spread of weights at two levels, leading to the values of authority and also the
values of hub.
A page with high hub, is a page that contains links to pages of quality (ie with high authority) and symmetrically
a page with a high authority, will be a page from many good bet hub.

Salsa is a hybrid between the two previous algorithms.

To approximate these algorithms take into account the formula originally developed by the founders of Google.

Rank[A] = ( 1 - d ) + d ∑i (Rank[Pi] / Link[Pi])

Rank[A] is the value of Page Rank What we want to calculate

P[i] are the pages containing a link to A

Rank[Pi] are the values of Rank pages P1 ... Pn

Link[Pi] are the total number of links on the page that offers links

d (damping factor) is a factor that Google decided that the original documentation becomes 0.85.
It can be adjusted by Google to determine the percentage of PageRank, which must pass from one page
to another and the value of the minimum rank assigned to each page in the database.

The algorithms in question are obviously much more complex and our aim is not to replicate faithfully, but to
obtain a mathematical model that can approach for a "Our value of Rank that will enable us to determine the
goodness of the topological structure calculated the 'genetic algorithm after it was applied.

Let us now an overview on genetic algorithms.

The genetic algorithm is an algorithm to analyze the data and belongs to a particular class of algorithms used
in various fields, including the' artificial intelligence.
It is a heuristic search and optimization, based on the principle of natural selection of Charles Darwin
that governs the development.

Operating Principles

A typical genetic algorithm from a number of possible solutions (individuals) calls population and arrange for
their evolve during the execution: each iteration, it makes a selection of members of the public is used to
generate new elements of the population , Which will replace an equal number of workers already present, and
thus constitute a new iteration for the population (or generation) below.
The succession of generations to evolve an optimal solution of the problem assigned.

The evolution is obtained through a recombination of partial solutions, each individual transmit part of its
genetic heritage to their descendants.
The introduction of mutations in the population of random departure, occasionally generates new workers, with
features not included among those present in the genetic of the original species.

Once the stage of evolution, population generated at each iteration, is analyzed and are required only solutions
that best solve the problem: individuals with the qualities most suitable environment in which they find are more
likely to survive and reproduce.
These solutions will suffer a new phase of development and so on.

In the end, we expect to find a population of solutions that can adequately solve the problem.
There is no way to determine in advance whether the algorithm will actually be able to find an acceptable solution.
As a rule, genetic algorithms are used for optimization problems for which no one knows algorithms complexity or
linear polynomial.

Details of the operation

The solution of the problem is encoded in a data structure, usually a string, called gene.

Initially creates a number of genes at random and you define a function that returns the "goodness" of a gene
as a solution to the problem, said fitness function.

The algorithm consists of the operations that tend to alter the population of genes in an attempt to improve them
so that a better solution.

The evolution proceed in steps, for each of these is first performed a sort of genes on the basis of the result
of the fitness function.
They then performed the operations on a number of genes parameters established by the algorithm, which generally
determine how many genes must undergo crossover and mutation, and to what extent.

The algorithm then evolves through the following points:

Generation, in a random initial population

Creation of a sequence of new populations, or generations.
In each iteration, the current members of the public are used to create the next generation and to this
end, make the additional steps:

Each member of the current population is estimated calculating the value of their fitness

It determines an appropriate order of those individuals on the basis of the values of fitness

The most promising individuals are selected as parents

From these individuals will generate an equal number of individuals of the next generation, and this
can be done in two ways, ie by making random changes on a single parent (mutation) or appropriately
combining the characteristics of a pair of parents (crossing )

Individuals thus be generated to replace the parents by allowing the formation of the next generation

Finally, the algorithm s'interrompe when one of the criteria for arrest is satisfied.

According to a coefficient determined initially, some parts of the genes better results are exchanged, assuming
that this can improve the outcome of the function of fitness in the next "evolutionary step".

Single point crossover

There are various techniques of crossover. One of the simplest is the "single point crossover" which is to
take two workers and cut their strings encoding in a case in point.
It creates two heads and two tails. Now exchange the heads and tails, getting two new genes.
The crossover is not always applied, but with a probability pc.
Where is not applied children are simply copies of the parents.

Experimentally you can see that the improvement is welcome after a number of steps.This in cases of less fortunate, of course.

Mutation

The mutation is the random changes to portions of genes with a value of fitness lower, based on factors defined
initially.
These changes aim to improve the value of the function for the gene in question.

In reality it is not proper to think of changing only the chromosomes with lower fitness, in order to provide
greater capacity exploratory the algorithm (and not end up in "holes" in excellent local) are also considered
useful mutations of chromosomes with the same high fitness.
Ultimately the mutations mainly served to explore the area of research, did not order improvements.

Application

A website is in fact a graph. And it is the graph of the site under consideration that will apply the 'genetic algorithm.

The representation is used for adjacency matrix.

The matrix of a nearby structure particular data commonly used in the representation of graphs.
In particular, is widely used in the preparation of algorithms that operate on graphs and their
representation in the general computing.

Since any graph its matrix of adjacent consists of a binary square matrix that has the indices of rows and
columns of the names of the top of the graph.
In the place (i, j) of the matrix is a 1 if and only if there is an arc in the graph that goes from summit to
summit the j, otherwise there is a 0.

If instead of 1 in the matrix are numbers, these are interpreted as the weight given to each arc.
For example, if all the vertices of the graph represents a series of dots on a map, the weight of strings
can be interpreted as the distance of the points they connect.
In our case indicate the quantity of links.

In the case of representation of non-oriented graphs, the matrix is symmetrical than the main diagonal.

One of the features of this matrix is allowed to obtain the number of paths from one node to node i j who must
cross n nodes.
To achieve all this is sufficient to power the same n-matrix and see the number that appears instead i, j.

As previously explained a gene is a binary string, then a sequence of 0 and 1. The idea behind this algorithm is
a matrix that can be seen as a binary string and then a gene. If we want is access to the position in X, Y of a
matrix organized as a string, just apply the following formula:

Value(x,y) = Matrix[ (y*matrixsize) +x ]

Applying a genetic algorithm to graph the site before you can get a better topological structure that improves
the rank of individual pages.

Have been studied and implemented other processes of evolution than to crossover previously explained, specially
designed for the SEO problem.

The fitness function is quite complex and among the parameters that take into account are the average rank of
goal pages, the average number of links per page and number of pages void.

The goal pages are pages of the site that are our special interest. Typically, in most cases, these pages leaf.

For anything page is a page with the same rank equal to 0.15 (it is easily deduced from the formula of rank above),
or the value determined by the user.

The algorithm is able to take account of the editorial constraints and evolve as a function of them.

The results

Case 1

The application has allowed us to make new considerations on those before, perhaps, were considered certainties
of the SEO.
Speaking with various seo specialist, reading the forums, one of the considerations that makes the most is that
we must "balancing" the distribution of links. Or (as regards internal links) if a page from a link to an 'other,
it is appropriate that the spare parts, or fancedo accounts of serve for each page, outgoing link has many, many
must have entrants.
The model matetematico, however, seems to say completely 'opposite. The "goal" page, we want to maximize is the node 5.

As we can see in the complex model, the page you wanted to build was even penalized.
Clearly this example is rather forced, but it is a simplification of the structure of many websites.
The intention of this test is obviously not to suggest a circular pattern for your sites, but to show how it
is not discounted the validity of the structures of which we are convinced.

Case 2

Following the results of 'application of' genetic algorithm at: http://www.fratellileonelli.it
You can see the representation of 3D graph before and after the trial. The size of green areas indicate the
value of page rank, while the strings are represented in purple.

Before the application

After the application

We see at a glance the difference between before and after treatment.
The balls (Individual web pages) have a larger diameter, which means higher rank, and are much more interconnected.

An extract of the algorithm genetic results

Url

Rank Before

Rank After

Change in %

/

20.869984

13.694373

-34.3824

/come-raggiungerci/

1.923949

1.051139

-45.3655

/foto/

3.078781

1.464144

-52.444

/eventi/

1.982173

1.14884

-42.0414

/progettazione/

20.345295

13.828813

-32.0294

/realizzazione/

20.464703

13.718434

-32.9654

/manutenzione/

20.574158

13.976818

-32.0661

/potatura/

20.674488

14.024247

-32.1664

/erbacee-perenni/

29.898375

22.935853

-23.2873

/arbusti/

26.296431

16.967085

-35.4776

/foto/azienda-florovivaistica-leonelli-0007/

0.215424

1.110144

415.33

/foto/azienda-florovivaistica-leonelli-0008/

0.215424

0.298621

38.6201

/foto/azienda-florovivaistica-leonelli-0009/

0.215424

0.322093

49.5158

/foto/azienda-florovivaistica-leonelli-0010/

0.215424

0.518017

140.464

/eventi/mostra-mercato/

0.318485

1.077961

238.466

/eventi/open-day/

0.318485

0.400233

25.6678

/erbacee-perenni/agastache/rupestris/

0.288162

0.380497

32.0427

/erbacee-perenni/ajuga-reptans/atropurpurea/

0.288162

0.353591

22.7053

/erbacee-perenni/alchemilla/mollis/

0.288162

0.439853

52.6408

/erbacee-perenni/allium/karataviense/

0.288683

0.538787

86.6362

/erbacee-perenni/allium/tuberosum/

0.288683

0.363862

26.0422

/erbacee-perenni/allium-karataviense/ivory-queen/

0.288162

0.31456

9.16076

/erbacee-perenni/alyssum/montanum/

0.289121

0.456966

58.0534

/erbacee-perenni/alyssum/saxatile/

0.289121

1.010271

249.429

/foto/azienda-florovivaistica-leonelli-0001/

0.215424

0.291642

35.3805

/foto/azienda-florovivaistica-leonelli-0002/

0.215424

0.268323

24.5558

/foto/azienda-florovivaistica-leonelli-0003/

0.215424

0.281249

30.5559

/foto/azienda-florovivaistica-leonelli-0004/

0.215424

0.335848

55.9007

/foto/azienda-florovivaistica-leonelli-0005/

0.215424

0.353533

64.1102

/foto/azienda-florovivaistica-leonelli-0006/

0.215424

0.317864

47.5525

/erbacee-perenni/alyssum/saxatile-compactum/

0.289121

1.260897

336.114

/erbacee-perenni/amsonia/tabernaemontana/

0.288162

0.542984

88.4299

/erbacee-perenni/anemone/honorine-jobert/

0.290098

0.390125

34.4802

/erbacee-perenni/anemone/konigin-charlotte/

0.290098

0.351422

21.139

/erbacee-perenni/anemone/prinz-heinrich/

0.290098

1.071951

269.513

/erbacee-perenni/anemone/richards-ahrends/

0.290098

0.435242

50.0325

/erbacee-perenni/anemone/rosenschale/

0.290098

0.391896

35.0908

/erbacee-perenni/anemone/whirlwind/

0.290098

0.33464

15.354

/erbacee-perenni/aquilegia/vulgaris/

0.288162

1.104393

283.254

/erbacee-perenni/aquilegia-cerualea/red-hobbit/

0.288162

1.041284

261.353

/erbacee-perenni/aquilegia-cerulea/rose-queen/

0.288162

0.346796

20.3473

/erbacee-perenni/aquilegia-flabellata-pumila/ministar/

0.288162

0.391123

35.7299

/erbacee-perenni/aquilegia-hybride/biedermeier/

0.288162

0.45022

56.2383

/erbacee-perenni/arabis-caucasica/snowball/

0.288162

0.390517

35.5197

/erbacee-perenni/arenaria-montana/alba/

0.288162

0.437306

51.7567

/erbacee-perenni/artemisia/powis-castle/

0.288683

0.340346

17.896

/erbacee-perenni/artemisia/stelleriana/

0.288683

1.002906

247.407

/erbacee-perenni/asclepias/tuberosa/

0.288162

0.404723

40.4497

/erbacee-perenni/aster/snow-flurry/

0.288162

0.496032

72.1363

/erbacee-perenni/aster-dumosus/kippenberger/

0.288683

0.348915

20.8643

/erbacee-perenni/aster-dumosus/schneekissen/

0.288683

0.345664

19.7384

/erbacee-perenni/aster-laterifolius/lady-black/

0.288162

0.359595

24.789

/erbacee-perenni/astilboides/tabularis/

0.288162

0.404522

40.38

/erbacee-perenni/aubretia/blue-cascade/

0.288162

0.348454

20.9227

/erbacee-perenni/begonia/envasiana-bianca/

0.288162

0.405793

40.8208

/erbacee-perenni/belamcanda/chinensis/

0.288162

0.331347

14.9863

/erbacee-perenni/brunnera/macrophilla/

0.288162

0.447869

55.4226

/erbacee-perenni/callirhoe/involucrata/

0.288162

0.550039

90.878

/erbacee-perenni/campanula/loddon-anna/

0.289121

0.423717

46.5537

/erbacee-perenni/campanula/porscharskyana-lisduggan/

0.289121

0.441346

52.6511

/erbacee-perenni/campanula/poscharskyana-stella/

0.289121

0.351554

21.5942

/erbacee-perenni/campanula-persicifolia/telham-beauty/

0.288162

0.476057

65.2042

/erbacee-perenni/campanula-poscharskyana/eh.-frost/

0.288162

0.420136

45.7984

/erbacee-perenni/campanula-punctata/rubriflora/

0.288162

0.441126

53.0825

/erbacee-perenni/centaurea/pulcherrima/

0.288162

0.440109

52.7295

/erbacee-perenni/centranthus/ruber/

0.288162

0.385745

33.8636

/erbacee-perenni/ceratostigma/plumbaginoides/

0.288162

1.198788

316.011

/erbacee-perenni/chrisantemum-coccineus/robin-red/

0.288162

0.387029

34.3093

/erbacee-perenni/chrysantemum/weirichii/

0.288162

1.040353

261.03

/erbacee-perenni/cimicifuga/racemosa/

0.288162

0.364126

26.3612

/erbacee-perenni/convallaria-japonica/nana/

0.288162

0.482505

67.4419

/erbacee-perenni/coreopsis/grandiflora-aureo-marginata/

0.289121

0.477275

65.0779

/erbacee-perenni/coreopsis/verticillata-grandiflora/

0.289121

1.244313

330.378

It is important to note that the algorithm has tried to distribute in a more homogeneous the rank of the site.
The table with the final results are highlighted in red the most significant.

The pages are in the menus as you can imagine are those with higher rank, both before and after.
However, their rank was lowered for the goal pages.

Of course, this behavior can be driven by need. In this particular case the customer was important to raise the
rank of pages that contain the description of the plant, therefore, l 'algorithm has evolved in this direction.

As a result you can see how they were positioned additional links calculated by the process:

If you are interested to apply (for free) the software described, simply downloadgenesis here