Recently, while searching for some technical stuff using Google, a peculiar link came up with the other search results. I tried to visit the site, but it looks like an exact clone of http://stackoverflow.com

I still don't understand something ... how do they exact copy the entire site. is it kind a saved/cached pages? bit confused.
–
RahulAug 4 '14 at 20:07

1

@Rahul: in this case they are copying the HTML, leaving intact the stylesheets and image references. As a result, your browser styles the HTML exactly the same way Stack Overflow is styled.
–
Martijn PietersAug 4 '14 at 20:34

6

LOL .. then they have put enough effort to do this sort of smuggling :). BTW, the site is total down now.
–
RahulAug 4 '14 at 20:37

3

A link from the SO homepage to this question sends enough traffic to bring the site down. :)
–
James LawrukAug 6 '14 at 19:18

This doesn't appear to be a scraper, but a proxy of some kind, which appears to run about 10 minutes behind the main site. As it appears that the site is hosted somewhere in China, this could be a way someone has devised in order to make Stack Overflow accessible. I don't know - we're looking into it some more.

I would not attempt to log into that site to poke and explore it, for obvious reasons.

Me neither. I don't trust these hacky-wacky sites. Might be full of trojans/malwares.
–
Der GolemAug 4 '14 at 12:50

12

The sign up and log in links in the top bar are linking to https://stackoverflow.com/, perhaps because the proxy replaces all instances of http://stackoverflow.com but not https links.
–
StijnAug 4 '14 at 12:57

1

If the proxy runs 10 minutes behind then it is still a scraper as it must have scraped a copy 10 minutes earlier to store it. :-P I know, pendantry will not get me anywhere..
–
Martijn PietersAug 4 '14 at 15:41

4

@MartijnPieters There's a line between caching (as a typical squid proxy does) and scraping (saving a copy as a copy, not something that expires).
–
Tim Post♦Aug 4 '14 at 15:42

Sure, I am not being serious here. Glad this is taken seriously, btw.
–
Martijn PietersAug 4 '14 at 15:45

@TimPost You say "a way to make Stack Overflow accessible", do you mean that the real SO is normally blocked in China?
–
Mr ListerAug 5 '14 at 5:07

9

What's that you say? 10 minutes in the past? A new frontier of time-travel-enabled fastest-gun-in-the-west lies before us.
–
Timothy ShieldsAug 5 '14 at 5:38

2

@MrLister SO in China was having problems with the JavaScript in the recent past due to the Google blockage (but was still semi-usable), but everything has been a-ok for a while now.
–
XiaofuAug 5 '14 at 5:46

If this is truly something nefarious and/or commercial AND is being hosted in the PRC then you could report them to the relevant authorities (can't tell you who exactly). Sites hosted in China must be registered and display their corresponding ICP number at the bottom of the homepage. E.g. see Baidu for an example: ICP证030173号. Since they don't have this they can potentially be shut down. And if they did have this then you could track them down...
–
XiaofuAug 5 '14 at 5:55

@zyboxinternational It's definitely a proxy - just one that has a hard time keeping up (and well, considering our volume, that's sort of expected). Devs are looking into it.
–
Tim Post♦Aug 6 '14 at 16:42

@TimPost As I mentioned in my answer, it's probably a low-end VPS that's hosting the site. It would make a LOT of sense in this instance...
–
ʎǝʞuoɯɹǝqʎɔAug 6 '14 at 17:41

I must say, leaving a comment as an answer on MSO is so ironic I almost have to upvote it. I won't bother with the semi-rant that tells you to be patient until you have enough rep to comment...so, I suppose I will...
–
Chief Two PencilsAug 5 '14 at 5:54

3

I figured it was important to tell people that the website is still up.
–
BretskyAug 5 '14 at 5:56

No it's not. From yesterday night I tried almost 10 times and can't access. I think that site is not accessible from India.
–
RahulAug 5 '14 at 9:46

It is accessible as of right now, in Canada, but only some parts of it work.
–
BretskyAug 5 '14 at 19:51

It's also not only still accessible from that link in the US, but the homepage works for me as well
–
IzkataAug 5 '14 at 20:06

7

Why is it important to frequently report if another site is up or down on MSO?
–
Martin CapodiciAug 5 '14 at 20:12

This proxy site obviously has limited bandwidth and since this thread was posted, it's been going down quite frequently.
–
ʎǝʞuoɯɹǝqʎɔAug 5 '14 at 20:17

@martin-capodici Because the site is a complete copy of SE.
–
BretskyAug 5 '14 at 20:19

@Bretsky It's not an exact copy, it's a proxy that seems to be caching each page once every 10 minutes-or-so. I'm timing how often it updates (by picking up a comment I made on a question). Once done, I'll post an answer to this..
–
ʎǝʞuoɯɹǝqʎɔAug 5 '14 at 20:21

There. You now have enough reputation to comment.
–
Cᴏɴᴏʀ O'BʀɪᴇɴDec 19 '14 at 3:01

Doing some more snooping, and looking at the DNS Name servers, it seems that the IP & subsiquently the sites, are hosted through Mongit / Host Virtual (Host Virtual seems to host VPS installations in Hong Kong, China (Source)).

ASN Lookup

The ASN lookup warrants the same information, it is owned by Host Virtual, or should I say it is hosted by them. (Source #1, Source #2)

And looking at the IP Block associated, we see that our returned IP is in fact hosted via Host Virtual through "China Mobile". (Source)

Conclusion

I might have found the culprit/guilty party at fault here. I've stumbled upon a person that talks about, and I quote:

As Chinese government banned many foreign websites like youtube,blogger, facebook and so forth, i feel the crisis of human rights inChina!

I'm not going to post/publish the name/information on here, but if a moderator would like to contact me to verify/get this info to see if it is indeed so, please do if you haven't already found out who the party at fault here is. Just trying to help out here.

Looks like a person with a good motive using the wrong means.
–
Infinite RecursionAug 7 '14 at 6:51

1

@InfiniteRecursion Exactly what I was thinking. This person doesn't seem the slightest bit malicious from what I can find, but you can never judge a book by its cover.
–
DarrenAug 7 '14 at 6:51

2

I don't understand why there would be such a pressing need to provide a proxy for Stack Overflow. News and social media sites, sure, but is access to Stack Overflow really such a pressing human rights need?
–
CupcakeAug 7 '14 at 6:53

@Cupcake Take it the chinese government doesn't want its people to know how to code properly or solve any issues.
–
DarrenAug 7 '14 at 6:59

This is the response to the report filed by me, from the Stack Exchange team,

Thank you for reporting this content. I've passed the information
along to the person at our company who handles such issues. It's the
diligence of users like you that helps us stay valuable!

Please note, bringing these sites into compliance (or getting them to
no longer serve our content) is often a long and arduous process. You
may not see immediate results. However, rest assured that we're
working on it.

Thank you again, Stack Exchange Team

So from this we can understand that the target site has been considered under action. and we cannot expect any sudden changes regarding the shutting down process of that site.

I commented on a question, bought up the same question on heima588.com and waited. The comment hasn't turned up on the site yet, and it's been 25 minutes.

I browsed the site, and quite a few times I got a message that there's no space left on the host, which appears to be a low-end VPS (judging by the error message and abysmal performance of the site):

It also appears to have hardly any resources available to it since the above error isn't uncommon and sometimes the site ceases to load at all.

As Tim Post mentioned, it's probably a badly-designed (unauthorised) proxy that somebody has set-up to get around a block in their country or ISP, most likely to do with the issues with StackExchange using Google-hosted JavaScript code when China blocked access to Google.