What I Learned from Scraping SEOmoz’s Active User Base

Many moons ago, when Moz was SEOmoz, I had the idea to scrape all its publicly available profile data on active users just to see what I could learn about the community. Quantitative market research is an incredibly powerful method to quickly grab insights on a brand’s users. Using those insights, we can develop strong content strategies and link-building campaigns, as well as develop competitive insights.

What easier way than scraping the data from a brand’s user profiles?

In Soviet iAcquire, the web crawl you.

Oh, you may have heard of Gary and Cogswell, the Russian-coded robots that escaped the Ministry of Education and Science and sought asylum in iRank (our homegrown targeting and reporting technology for scaled content marketing). They were originally assigned some very menial tasks, but I’ve since reprogrammed them to aid us in better marketing. They are here to lend a hand to their idol, Roger Mozbot, in the hunt for Red October. As the Russian saying goes, “Many hands make light work.”

Special thanks to our creative director Robb Dorr for capturing them in the act.

So we built (and by “we” I mean I had our Manager of Research and Development Joshua Giardino build) a multi-threaded crawler in Python, and we fired it at all of the profiles of Moz users who had logged in during the previous 60 daysÃ¢Â€Â”those people whom I’ll call “active users.” For those that have forgotten what their Moz profile looks like, they contain a lot of great info ripe for the plucking. I personally don’t know what Moz uses them for, but with this post I hope to touch on some potential use cases. Your profile looks (or at least looked) like this, and has all of the following data points in it if you provide them.

How SEOMoz profiles once looked

Full Name

User Name

Email

Title

Company

Type of Job

Location

Favorite Thing About SEO

Bio

Favorite Topics

Instant Messenger Handles

MozPoints

Level

Membership type

Rank

# of Comments & Responses

Length of Membership

Links to other sites

Social Media profiles

So, now that we got this treasure trove of data on SEOs in a highly engaged community, let’s see exactly what we have.

Crawl stats

Crawl date 2/15/13 Ã¢Â€Â“ Yep, Casey, that was us.

14,036 out of 14,872 profiles were successfully crawled Ã¢Â€Â“ It wasn’t a polite crawl at all.

Methodology

Scrape as many users as we can

Cross-tab everything until we find useful insights

Run linear regressions to test the validity of correlations

Limitations of the data

According to the About page Moz had over 15,000 subscribers in February of 2013, but you can be a user without being a subscriber. I’ve asked Mozzers in passing how many users the site has, and have gotten much bigger numbers than that. After I originally submitted this post, it was revealed to me that Moz has over 250K+ user accounts. So the issue with this data is that it is just a sample. However, sampling is inherently a part of market research; after all, you can’t survey everybody. The more important point, however, is that the users we scraped were all active users within the previous 60 days, and therefore were likely more reflective of the needs of those who are highly engaged in the product.

Also, many users have not completely filled out their profiles, so when performing cross-tabulations we are often dealing with samples of slightly different sizes. Therefore, all of the insights presented only account for respondents. That is to say, we don’t mention the number of people that have not filled out a given data point. Again, for those who want to know, the base number of total respondents for this study is 14,036, which makes for an approximate 5.6% sample of all users (but presumably a much larger percent of active users). Feel free to check our work.

I’ve talked a lot about market research and how SEO as an industry doesn’t value it. Many SEOs I’ve encountered prefer taking shots in the dark or the guess-and-check method. This line of thinking is why the erosion of keyword data in analytics matters so much to SEOs. Market research is why it doesn’t matter to channels like social media or (ugh) display.

In fact, for enterprise clients it is only about “are we capturing the right people,” and “how many are we getting through each channel?” This way of thinking allows marketers to think bigger and be involved in conversations beyond meta tags and links. For those that are leery of the application to small-business marketers, you can easily leverage canned market segmentation provided by Nielsen, Experian, and others, or you can leverage segmentation in other ways.

So first, let’s go over some high-level insights. Our Inbound Marketing Analyst, Jiafeng Li, ultimately cross-tabbed the data a ton of different ways, and the entire analysis that we’ve performed is available for download at the bottom of this post in the “Parting gifts” section.

Membership type

The Membership Type field in the Moz Profile refers to the type of Moz subscription that a user has. For the purposes of this study we basically care whether the user is “basic” or not. Basic means they are a Moz user without a paying account, while any other membership is a paying customer of six user types.

As the histogram indicates, the majority of active users are Pro members. Roughly 60% of this group has an active subscription. While interesting, this data doesn’t tell us much until we bring it into context of other data points that we will examine shortly. It should be noted that this field is set programmatically, so all “respondents” have this field filled out in their profiles.

Most active users are either basic (unsubscribed) or Pro (standard subscription) usersÃ¢Â€Â”42% basic and 49% Pro. Therefore, a large segment of these users are active subscribers paying at least the regular rate of /month. This also means most users are genuinely affected when the product has issues. However, it’s notable that Moz does a great job of being transparent when this happens.

Moz Insight: There’s no real actionable insight here without looking at data in context of other data points that we will examine later in the post.

Competitive Insight: Nearly half of Moz’s active user base doesn’t subscribe to the product. It would be worthwhile to segment further and reach out to these people to understand why.

Years of membership

The profile also tells us when the member signed up for her account. This is interesting to get a picture of the retention of the Moz active user base. The actual data point is the number of years since signup which shows that year over year Moz has retained more active users.

Note: Remember this data was collected in February of 2013 so that explains the small negative delta between years one and zero.

Congrats to Moz for their sustained user retention. Based on the sample they’ve retained more active users every year (not including year 0 which had just started).

From the outside looking in this is a clear indicator of a growing and thriving community. When researching viable opportunities this is far more important to me than any link metric. To be clear, though, this data is limited in that we don’t know exactly how many users signed up and ultimately canceled altogether. Nor do we know how many users have switched user types over time. Therefore the data is a jigsaw puzzle with a couple of middle pieces missing.

This is also how we realized this is just a sample of the user base because Moz reports its subscriber growth on the About page as:

2009 Ã¢Â€Â“ 5K

2011 Ã¢Â€Â“ 10K+

2012 Ã¢Â€Â“ 15K+

However due to the fact there is an account base of over 250k+ this is clearly not indicative of all user accounts. Also, in a recent conversation with Rand I learned that the subscriber base has continued to grow well beyond the number displayed on the About page at the time of this writing.

Time spent on SEO

One of the more interesting data points requested in the user profile is the amount of time a given user spends on SEO per week. This is particularly interesting because we can use this as an indicator of savvy or engagement in the spaceÃ¢Â€Â”especially in context with job titles.

The biggest segment (20% of respondents) spend more than 50 hrs/week on SEO, and as you might imagine, the active user base is mostly made up of people that spend a ton of time on SEO. However, there are also very large segments that spend smaller amounts of time on SEO.

Insight: As a content creator, there is space for really advanced content, but there’s likely an even more lucrative opportunity for basic content built for people with a shorter attention span for SEO.

Level/MozPoints

Moz has a rudimentary system of gamification that comes into play based on how active a user is on the blog or in Q&A. Points are awarded forÃ¢Â€Â”you guessed itÃ¢Â€Â”filling out your profile, publishing blog posts on YouMoz or being promoted to the main blog, commenting, and acquiring thumbs up.

This value is set by the system and the data indicates that 90% of active users are lurkers. There’s only a handful of Gianlucas out there. Based on how MozPoints are awarded, this histogram helps me understand how many users are engaged enough to be “thought leaders” as defined by the Journeyman, Authority, Guru and Oracle levels. These are the influencers I would reach out to if I wanted to place links or I wanted to get buy-in before I posted on YouMoz and wanted to ensure I got traction.

Moz Insight: Moz’s gamification needs work, and actually isn’t very TAGFEE. There are more actions that are beneficial to Moz that should also award points to users. For example, sharing a post on Twitter should result in a point for the sharer and the author. The rewards are also not that compelling. With all the Mozperks and free swag Moz gives away they would be well served to build a marketplace where users can redeem their points for fun stuff.

Note the change in the level names since the change to Moz. Guru has become Expert, and Journeyman has become Specialist.

Competitive Insight: 90.16% of Moz’s active users are not that engaged in the blog, Q&A or comments. While the community thrives in different ways on different channels there is an opportunity for another site to spring up that rewards user engagement in a more in-depth and (dare I say it?) transparent way.

Type of work

Users self-identify the classifications of their work, and with this data point Moz better understands how well they are capturing their targets.

Moz speaks to all segments of the audience with its offering and content, but as Rand mentioned enthusiastically at MozCon, they are focused on helping small business owners do better marketing. However, the active user base is 25.7% agency or independents that are likely floating across many clients.

The remaining big segments are:

16.69% Business Owners

15.65% In-House

Moz Insight : Moz’s active user base is not primarily made up of their core target. The real question that needs answering is, why is that? I believe cross-tabbing a little further gives us some more clues later in this analysis.

Competitive Insight: Moz’s user base is full of people that make great targets for agencies and enterprise products. Product brands that serve the enterprise like Conductor or Brightedge; and agencies like Distilled, SEER, Portent, and (ahem) iAcquire are obviously well served by being featured here or at Moz events.

Years of membership vs. membership type

Since we don’t have any indication of how user account types change over time, the best we can do is look at account types in context of account age to try and understand if there are any trends.

For users with membership less than a year, a higher percentage are basic users; while at more than 1 year, a higher percentage of users are pro users, indicating possible conversion to pro users after 1 year. The data indicates that the longer people are engaged with Moz, the more likely they are to subscribe to Pro.

Competitive Insight: The best time to convince users to try another product is in their first year of using Moz. The data indicates that Year 0 members aren’t quite convinced this is the product for them. A competitor would be well-served to offer a longer free trial than Moz does, and actively engage the user with how-to content via email to keep them actively engaged throughout their free trial so they can understand the value of the product.

Moz Insight: The data indicates that Moz does a good job of keeping these active members happyÃ¢Â€Â”if they can keep them around. Users are likely kept due to Moz’s investment in upgrades and remarkable content. The real question is which types of content lead to those initial conversions and which types reduce the churn? Don’t worry, I’ve got some ways to figure that out as well.

Naturally, Moz would also be well-served to develop ways to keep users highly engaged during their free trial process with “Did You Know” weekly emails based on app usage and non-usage.

Type of work vs. membership type

We wanted to understand how the type of work correlates to membership type. What types of users own what type of membership?

Pro usage is dominated by in-house professionals, and independents are the only segment that is mostly basic users.

Moz Insight: The hypothesis I’ve drawn here based on the data about these active users is that independents either don’t see the value in subscribing to Moz or they can’t afford it. Moz should consider a certification program similar to that of HubSpot, which would allow independents to generate leads. Once certified, these independents can enjoy a cheaper subscription rate. After all, independents are even smaller-business owners.

Competitive Insight: There is an independent market worth tapping with a tool suite that costs less than ,188 per year. It would be worth performing exploratory research to understand what type of tools independents believe are worth investing in.

Time spent on SEO (heavy users) vs. membership type

We wanted to know what types of memberships the most engaged SEO practitioners have as these people are likely the hardest to please and may have the most influence of the bunch.

For heavy SEO users in the active user base, those who spend more than 50 hours/week on SEO, agency users and in-house users have higher percentage of Pro subscribers while business owners and other types of users comprise higher percentages of basic users.

Moz Insight: The data about these active users indicates that a large portion of business owners that are heavy SEO users are basic users of Moz. Moz may be too expensive for the people it wants to serve most, or even worse, these people may not truly see the value of Moz. This may be the most useful insight to Moz, and is definitely worth exploring further through interviews of this segment.

Competitive Insight: The independent and small-business owner is the battleground for those competing with Moz. Agencies and in-house professionals typically have access to bigger budgets and a variety of tools, whereas independents and small business owners often have to choose. Therefore, this may be where all-in-one products like RavenTools and HubSpot outperform Moz. It’s worth following up with exploratory research and examining any publicly available data on their users.

Level/MozPoints vs. years of membership

We wanted to see if there was any correlation between the number of years of membership and the amount of contribution to the community, wondering if it would be possible to predict when the next John Doherty or Tom Critchlow would pop up.

Among the “aspirant” users, who are less active, most of them are comparatively newer members; while among “contributor,” “journeyman,” and “authority,” most of them are comparatively older members.

The data indicates that the insight is obvious: The longer you’re with the Moz community, the more likely you are to become more engaged. The biggest group of contributors lies at the two-year mark. It would appear that Moz is already proactively cherry picking the best-of-breed posters to add to the Associate program. Competitors looking to quickly identify people for potential guest posting could look here, but again this is obvious, because if someone is good their posts tend to get tons of visibility anyway.

Regressions on membership type

There have been many discussions as of late on the value of correlation in SEO. Rand has already gone in-depth as to why correlations studies are worthwhile, but I will briefly say while correlation != causation it does bring up some interesting insights. That said, we ran linear regressions on the data that we cross-tabbed in the last few charts as follows:

When X = “time spent on SEO,” “type of work,” and “level,” the adjusted R-square is low.

The results of our regression indicate how strongly membership type correlated with time spent on SEO, type of work, and level. We found that membership is not strongly correlated with any one of those given metrics, which means that while there are a lot of happy “coincidences” here, they doesn’t necessarily mean any given factor is a driving force behind that correlation.

Job titles

Users have the ability to enter their job titles in their Moz profile. However, free-form text fields are difficult to analyze, since everyone’s answer is very different. Enter: the word cloud.

Perhaps I am innumerate, but I’ve never really been a fan of Word Clouds. Bigger words, bigger value. Big whoop. That said, this one would be pretty useful if I didn’t already know a lot about the Moz community. If I’m looking to create content it’s probably not best to go with code-heavy stuff. This word cloud tells me that I’m mostly speaking to people that are pretty far in their SEO careers, such as marketing directors and managers. As the marketing lead for an SEO and social media agency, I could quickly verify that my exact audience is here.

Moz Insight: There is a large opportunity for higher-level or big-picture content such as what Rand delivers on his personal blog. Since the majority of the active audience appears to be pretty far in their careers, this content may prove more valuable to them.

Competitive Insight: This data further indicates that Moz is a great place to get in front of enterprise professionals, especially in a less “sales-y” capacity. Two words: Case. Studies.

Users’ favorite things about SEO

Users also have the ability to share what is they love about SEO in an in-depth free form text area within their profiles. Again we leverage a word cloud due to the difficulty of segmenting responses otherwise.

This word cloud is also pretty helpful in understanding what content will resonate with the audience. One of the highest occurring ideas is that users love to get results or see their work on the first page of the SERPs. That in context with users loving the constant challenge and, to a lesser extent, the creativity required to get there leads me to believe this is an audience that will be very receptive to new approaches with proven results.

Insight: The active Moz audience is far more interested in results (and therefore case studies) rather than just ideas. This is an insight for both Moz and other marketers looking to appeal to this audience. Bring data or go home.

Users’ favorite topics

This section of the user profile is somewhat of a more succinct version of the last field. Users are given options to choose from which makes it a lot easier to analyze. Even so we’ve leveraged the word cloud here to see what really stands out for the Moz community.

Optimization, content, analytics, research, link building appear to be the hits with the active users in the community. It looks like I’ve covered them all in this post, but how the post performs will be truly indicative of how well these types of content reaches those people. And that’s a good point worth raising right now. How people say they act is not necessarily how they actually act. It will always be up to analytics to prove these insights right or wrong, but the point is to start out with an educated guess backed by data.

Moz Insight: As Moz is expanding its offering to be more about inbound marketing rather than just SEO this will be a good data point to measure to determine whether they are capturing more of that broad audience. However the choices are still reflective of Moz’s historical SEO focus as seen in the screenshot below.

Now would be a good time to update this to reflect more of the granular facets of Inbound.

Competitive Insight: This data really drills in the ideas of what you should focus on if you’re trying to get Moz users to come to you. Case studies and how-tos on optimization, content, analytics, research, and link building are the way to go, and a quick look at post analytics seems to back this up.

The real purpose of this post isn’t just to show Moz how they can do better marketing, it’s to show you how you can leverage user profiles to your advantage with your competitors for a variety of initiatives.

Lead Generation Ã¢Â€Â“ A lot of Moz profiles show email addresses publicly, but they’re rendered with JavaScript (darn you, Casey). I could have easily fired a headless browser at the site, pulled in email addresses and sent our sales team at them. (Don’t worry, I didn’t.)

Content Strategy Ã¢Â€Â“ As noted in the analyses, the data makes it crystal clear what the audience wants in the form of content. A lot of content marketing programs take shots in the dark at what users want while this type of research allows a marketer to make a strong case for the content they would build. It’s far easier to convince a client of a creative content approach tied to an audience with data than with just keywords and links.

Link Building Ã¢Â€Â“ This data is basically a personalized Followerwonk. I can slice and dice features of the dataset and grab their social URLs and sites, then combine them with Domain Authority and Social Authority. That would give me a highly personalized list of link-building prospects that I could segment and target by interest. Say for example I only want links from people who’ve been down with Moz since the beginning: I could just filter by the users that have had accounts for seven years. Done.

This is quantitative research with the qualitative insights coming out of my own experiences with the Moz community. Moz has, in the past, done a great job of quantitative research in the form of surveys they run on their community and user base. In fact, we could have layered that data over the data we’ve collected to get a more complete picture of the user base, including demographics with data from GlassDoor and Payscale to figure out salaries by title. We also could have leveraged Moz’s transparent analytics feature to show how content of the different types performs by subject and use those insights to get closer to what actually works for Moz.

We could have also performed qualitative research, much like Moz does with its various initiatives wherein they watch users using their products and ask questions. As a part of Moz’s Customer Advisory Board (CAB), the product team often reaches out to me to get my thoughts about using Moz Analytics and get specific feedback. The next step would be to pull out a set of users that are representative of the most valuable segments and similarly have question and answer sessions.

Exploratory Research Ã¢Â€Â“ I’ve mentioned it several times, but this is process of speaking to people in small groups with open ended conversations to understand how your audience is thinking about your product. This process is usually performed in Focus groups or open ended surveys to help define what needs to be answered by more data.

More Quantitative Research based on those findings Ã¢Â€Â“ Once we collect findings from exploratory research we could then send out survey questions based on those findings to get a bigger sample of the segment or find those people through other channels like LinkedIn.

In other words insights can always be understood further or fine-tuned when used a basis to determine or answer new questions.

The mad scientists at Moz could also pull the entire 250k+ users and perform the same analysis. However, I think the analysis of the active users proves to be more actionable, as it limits the research to just those that are actively engaged. Additionally, the analysis of all users may lead to insights into why certain user segments have become completely inactive.

Moz could also layer this data with app usage data for a more complete picture of what content keeps users using the product.

Measurement and targeting applications

This slide below from my MozCon 2012 presentation may have been forgettable at the time, but this is the foundation for what I believe is the future of digital marketing. This is the framework by which arbitrage and dynamic targeting become stronger, more viable solutions.

The concept is actually called cohort analysis. Before your eyes glaze over, this is nowhere near as complicated as the Keyword-Level Demographics methodology I developed at the end of 2011. With cohort analysis we segment users based on their shared features and track them accordingly. With Keyword-Level Demographics we’ve done that using Facebook data to match the relevant user data to features we’ve identified as relevant to our predefined personas. With cohort analysis we’re doing it from the other direction by first collecting data and then defining segments based on actual usage rather than just panels and surveys.

That is to say that Moz doesn’t have to go as far as building personas complete with demographics and user stories; they can stop at segments. Much like your Google Analytics segments, Moz could develop affinity segments to see what content resonates with which user types throughout the site. With all the data provided in the user profile Moz can segment any number of ways and may choose to go with membership types as the base since it is one of the lowest common denominators between users. However for the sake of understanding let’s use the Time Spent on SEO as our defining characteristic.

Moz could define high level segments as follows:

Super Heavy Users Ã¢Â€Â“ Time spent on SEO over 50 hours/week.

Heavy Users Ã¢Â€Â“ Time spent on SEO 35-50 hours/week

Medium Users – Time spent on SEO 20-35 hours/week

Light Users Ã¢Â€Â“ Time spent on SEO 5-20 hours/week

We know Moz wants to target business owners. From the high-level insights, we have identified business owners that are super-heavy users as a segment of opportunity, since many of them are currently basic users. Now, to drill down into one of those segments we could target basic users that have “Link Building” listed as one of their interests, and spend more than 50 hours a week on SEO. Let’s call this segment “Basic-50-LB.” Based on the data this is indeed a valid segment:

We now know a lot about what this segment is interested in, so we can then test and optimize against it.

Now let’s compare this to the interests of the business owners that are heavy SEO users and have Pro accounts. It appears to be somewhat different.

The question we want to answer is, why? And how do we push those basic users to become Pro users? There are a lot of things worth testing on the basic users to see if we can discover what affects their perception of Moz’s value.

With that segment defined, Moz could track what type of content performs and then dynamically surface that type of content for that user when they log in. Moz could also track how many times that user type has to see a specific type of post before they are likely to become a Pro user. This is where geniuses like Dr. Matt Peters and Dr. Pete Meyers come in and build predictive models and Moz’s entire digital marketing mix start to make Target’s pregnancy prediction tactic look old school.

Further, Moz could see which products a given segment likes using the most and use that to inform their product roadmap. Did this segment become a Pro user once Followerwonk was released? Did signups increase once the Social Authority API rolled out? And finally, Moz could get more aggressive with these tests and segmented emails to users that cancel in hopes of bringing them back to Pro. For example a user very interested in link building would get emails with all of the recent link building posts, Q&A and discussion.

But to do this we first need to set up Google Analytics for measurement of cohorts. To do so we need to create a new custom segment that looks for the Custom Variable that we’ll be setting when a user starts their sessions.

Steps to do so are as follows:

Click the Down Arrow below your Segment name

Click Create New Segment

Click Conditions under Advanced

Select Users and Include next to Filter

Select the Custom Variable you will be setting under drop down that gives you the dimensions to choose from.

Choose Contains and then type in the value which would be the segment name Basic-50-LB

We’d also do this for the segment we’d like to compare it to as well as capture the higher level segment “Basic-50″ for bigger-picture insights.

This is actually something we do in the measurement planning phase with our clients here at iAcquire. It’s actually incredibly simple, when a user logs in just pull their profile and identify which segment they are then fire off a custom variable like so:

_gaq.push([Ã¢Â€Â˜_setCustomVar',1,'userSegment',userSegmentName ,1]);

The steps leading up to firing the custom variable will require some custom programming, but I promise you that it’s nothing more than a bunch of if-then statements. Tell your developer to relax.

Ultimately what you’ll get in your analytics is these segments in context with your analytics data allowing you some very precise user insights that are completely relevant to you. In some ways this approach is actually better than Keyword Level Demographics because it doesn’t require a user to be logged into Facebook and it leverages the data within the user profiles.

I know what you’re thinking, “How does this apply to my site or my clients? It will be impossible for my site to get users to create a profile and fill it out.” Well, can you get just a social handle or an email address? Ok, then I’ve got a couple solutions to that as well: FullContact and RapLeaf.

It turns out that FullContact does more than just give Paid Paid Vacation, they are also a contact data provider. Both RapLeaf and FullContact allow you to pass minimal information on a user and get a ton back. Here is some high-level information from their respective sites.

FullContact

RapLeaf

So remember when I said the email was difficult to scrape? The social handle was not. I’d be all set for lead gen with just a few API calls.

Using one of these solutions you could pull their data when they signup use it to determine their segment or persona, save that to your database and cookie them. This way there’s no need for them to create a profile or opt-in in anyway aside from the initial signup. Also as long as they don’t kill their cookies, the user doesn’t even have to explicitly sign in. Sometimes the Internet feels like magic.

You guys know I can’t give you a good idea without leaving you a way to use it.

Josh’s scraper code

Since SEOmoz became Moz there have been more than enough changes to the structure of the site that this code will not work anymore, however it’s a good starting point if you’d like to build a scraper for competitor user profiles in the future. You can find it (and some other cool things) on the iAcquire Github repository for you to enjoy.

More market research resources from J-Li

We take market research pretty seriously here at iAcquire. Here are two posts you shouldn’t miss from our Inbound Marketing Analyst Jiafeng Li.

Cohort analytics stuff

Nudge Spot Ã¢Â€Â“ An easy to use dynamic targeting platform based on cohorts

At iAcquire search is our craft, and this post is just another example of an element of the new SEO process at work. This is the type of my stuff my team incorporates into SEO on a daily basis in addition to the creative technical ideas we come up with. The fact is, we live in the information age where big data reigns supreme, but let’s not forget smaller data like we’ve just examined.

So It looks like Roger, Gary, and Cogswell are ready to do better marketing. Are you?

And yes, it feels amazing to be back on the blog.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!