A couple of days ago there was a post on the lookstat blog titled top search keywords for energy, it compared search terms in Google in an attempt to estimate popularity of energy images on microstock sites. Earlier this year I did something similar in a post about seasonal stock images, and at the time I made the point that I wasn't exactly sure how well Google search terms related to searches on microstock sites.

So that set me thinking... (yes, be very afraid) Just exactly how much of a match is google trends/adwords data to what people are searching for at microstock sites? Clearly there will be some relationship, but I'd also guess that there are lots of popular terms that will not have a proportionate number of microstock searches. It's difficult to know how similar the two are. Is it reasonable to assume that popular keywords in Google are more likely to lead to more microstock sales as those keywords make popular subjects hence there will be related businesses in need of such images? As they say "assume makes an ass out of u and me".

Say for example we see that holiday flights is a popular subject in google, we also see that exchange rates is equally popular, for me it's a stretch to say that images representing exchange rates will sell in the same proportions as ones depicting holiday flights, surely there are too many variables?

Another Data Set

I have access to the keywords that people used while browsing a free stock photo site (similar stockxchange but nothing like as big). Of about 1 million searches in 2008 and late 2007 there were some 160,000 unique key phrases searched. The vast majority of them only got one search (just 48k with 2 searches or more). This is just one of the places in microstock we see the 'long tail / exponential decay graph', see my post how long images continue to sell and more recently microstock dairies revisited the longtail. A plot of the top 100 is as follows: full table of the data is below, only every 4th keyword would fit on the graph:

Graph of search terms vs search volume for top 100 searches.

The keyphrases were sorted as is, so typos, stemming and things like entering the search "cat." or "cats" have not been grouped along with all the other "cat" searches. Likewise the total for "people" does not include a total of times users searched for terms with people in them like "young people" these are listed separately.

Comparison

Armed with this data (in the slowest pivot table known to man) I decided to do a bit of analysis to see how these matched the results in the lookstat post:

Left: Google search analysis from the lookstat post and Right: results from the free stock photo site data set.

Looks like they match quite well! Anything below 10 could perhaps be considered error and could easily be skewed by some other factor. I was convinced I was going to be able to prove that energy jobs was popular in Google but not a popular stock search, it seems that way but sadly I don't think I have a large enough data set to be certain (?).

What's quite interesting is that only 45 out of a million searches were made for our 'top' energy keywords (there were also 6 similar with one search each - "solar energy farm", "solar energy panel" etc) plus many more for single keywords of solar, energy and their related synonyms).

The Top 100

For extra comparison, the keywords in my data set look a lot like those top 100 keywords searched on Shutterstock, although I have a definite English language bias, I also have not removed from the top 100 several keywords like 'nude' and 'sex' that are probably not image buyers. Quite a lot more variability in the ordering and plenty of the keywords Shutterstock have in their top 100 only made it into my top 200.

Rank

Keyword

Frequency

Rank

Keyword

Frequency

Rank

Keyword

Frequency

1

people

18818

34

doctor

2059

67

animals

1455

2

(blank)

16332

35

nude

1998

68

fish

1448

3

music

8971

36

party

1977

69

construction

1422

4

fruit

8922

37

fire

1905

70

flower

1412

5

christmas

7589

38

medical

1887

71

fruits

1410

6

business

5508

39

hands

1880

72

dancing

1379

7

food

5413

40

child

1864

73

cat

1341

8

woman

5008

41

kids

1818

74

rose

1341

9

family

4787

42

tree

1818

75

sky

1327

10

computer

4647

43

education

1798

76

heart

1317

11

children

3731

44

golf

1787

77

home

1306

12

baby

3708

45

sports

1772

78

camera

1284

13

car

3452

46

wine

1758

79

sun

1281

14

dance

3433

47

massage

1745

80

birthday

1254

15

house

3191

48

coffee

1737

81

shopping

1249

16

money

3162

49

hand

1663

82

paper

1243

17

school

3119

50

fashion

1650

83

girls

1235

18

sex

3093

51

earth

1643

84

eye

1217

19

wedding

3058

52

face

1629

85

students

1202

20

book

2977

53

health

1623

86

beauty

1189

21

girl

2845

54

horse

1621

87

world

1177

22

football

2707

55

phone

1597

88

winter

1161

23

women

2611

56

snow

1587

89

pizza

1107

24

beach

2586

57

nature

1587

90

computers

1076

25

water

2510

58

student

1556

91

film

1076

26

apple

2459

59

smile

1549

92

spa

1075

27

love

2406

60

globe

1532

93

law

1066

28

dog

2384

61

hair

1531

94

dogs

1063

29

books

2349

62

fitness

1530

95

chocolate

1049

30

man

2264

63

soccer

1521

96

beer

1027

31

sport

2193

64

guitar

1509

97

tv

1022

32

office

2177

65

flowers

1473

98

space

1020

33

cars

2101

66

sexy

1471

99

cake

995

100

london

994

Note: "blank" searches are probably either robots, perhaps mistaken users, or users just seeing what an empty search does. Interesting if you run a web site with a photo search then a blank search should most likely not allow you to search, or perhaps return a message with nothing found but ALSO a selection of random or popular images.

Quite an eclectic little bunch and I think this is the first time synergy, tofu and andy warhol have been used in the same sentence. Quite a few of the searches are not what you would call 'traditional stock subjects'.

Conclusion

It seems reasonable that a comparison of relative terms in google trends/adwords will match the relationships between searches on a stock photo site, but I still think that there are a lot of keyphrases for which that is also not the case. I plan to analyse the data some more to see if I can pick out a few obvious "search engine popular" keywords that don't match image searches. it would be really great if google would let us search their "image search" volume alone. I did previously look at using the google data by combining keywords of interest with the keyword "photos", "images" or "pictures", it works for very popular single word searches but not for most key phrases. We have thus far ignored which images actually sell! see picniche for more about that.

I should be able to set-up something were you can query this data and my more recent 2009 dataset, if anyone is interested?

Great post, Steve! I definitely think there are going to be situations where the google data & the microstock data is going to diverge. Still, it was impressive to see the correlation on the energy related items. Very cool stuff. Look forward to seeing more :)