Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am surprised that the comments haven't mentioned the role of SEO in Wikipedia's growth and defensibility.

Wikipedia's habit of deep interlinking helped it rank back in the early aughts when the SEO rules were rather simple. Add to that the subdomain-driven localization strategy and many other moves that were considered SEO best practices back in those years when the on-page factors used to matter.

But that was just the start. Wikipedia killed it in SEO when it was easy to do so, but it also did one other thing that most SEO-driven sites (eg: About) didn't do correctly - it cared deeply about the content quality and also resisted to run ads (anyone remember Jason Calacanis' articles on how they are leaving $100m on the table? See [1]). So when Panda came around, Google correctly rewarded Wikipedia with #1 rankings for over 50% of its terms (!!), and Jason Calacanis had to shut down Mahalo which got destroyed by Panda.

Wikipedia's dominance continues because it's basically impossible to overcome its lead in inbound links and domain authority. Add to that a surprisingly under-the-radar company culture which has avoided any major blow ups despite its community wielding so much leverage over the world's education and having to make a lot of difficult calls on a daily basis.

Well done.

[1] https://calacanis.com/2006/10/28/wikipedia-leaves-100m-on-th...



It is somehow sad that "caring about content quality" is considered SEO and not just making a good website.

I think more than half the things you mentioned are only good SEO because search engines want to send people to websites they will like reading. I think when that is the case, we should be crediting people for making good websites, not good SEO.


Yeah, looking at it from a purely SEO perspective always makes decisions look kinda shady, as in they're trying to game some algorithm in order to gain more exposure. Wikipedia's biggest SEO factor has to be the massive amount of backlinks it gets. Those are purely organic and happen simply because it is generally the best authority on most any topic. It's more of a testament to how genuinely good Google's algorithm has become rather than some masterplan by Wikipedia.


Once you have 150 million inbound links, the strategy choices are easy - focus on content quality! But you have to remember what Day 1 was looking like: a tiny community, no inbound links, and a fair amount of other encyclopedia competitors trying to attract authors. Now the strategy choices are quite interesting - at the beginning of a startup you have just enough energy/runway to "kill it" in one area. Which one do you focus on? Generating the world's best content alone without the heavy lifting they've done on deep interlinking and other SEO-friendly moves wouldn't have cut it.


What were the competitors in the early days?


Further down in the comments you'll find a research paper [1] that analyzed why Wikipedia succeeded where others failed. I am not sure I bought into the conclusion (which prompted my initial comment), but at least it has a comprehensive listing of all the main players at the time:

Interpedia, TDEP, Everything2, h2g2, TheInfo, Nupedia, GNE

[1] https://mako.cc/academic/hill-almost_wikipedia-DRAFT.pdf


Thank you for posting that. It made me realize that I have the wrong citation in a footnote of a book I'm about to proof :-)


Yes, I believe that Wikipedia and its authors never put much thought into SEO. They just think about how to best structure the information and make heavy use of links, which also happens to be a good strategy for SEO.

The Google search rank algorithms changed a lot more than the overall structure of a good Wikipedia article in the last 20 years.


Have you ever encountered on Wikipedia a sentence like this:

"...because a <a>blue</a> <a>whale</a> did..."

rather than

"...because a <a>blue whale</a> did..."

Obviously, the latter version would have been more useful, and I find it difficult to believe that a human being would have made such a mistake. Don't get me wrong - such instances are rare, but they do happen and are an indicator that not all links are generated manually. I don't know what they are using today (if anything), but as someone else pointed out, in the early days they used UseModWiki to ensure a high level of deep interlinking. We can argue that this was done to improve the UX, but the level of ambition that went into it signals that they also saw it as a strategic move (and they would have been right to assume that - 20 years ago, a highly interlinked site was likely the best bang for the buck in SEO when it came to how to prioritize your time and resources).


There are actual humans placing such links to the point that there is an explicit rule against doing that https://en.wikipedia.org/wiki/MOS:SEAOFBLUE


Nice! Didn't know about such a rule.

Maybe in legitimate cases, it'd already help if Wikipedia underlined links, so you can see if it's one or multiple links.



SEO changed dramatically over the last 20 years. Thankfully, we are today exactly at a point that you described (it all started with Panda in 2011). As you'll see below, getting there was not just a technological challenge, but also one of fixing misaligned incentives.

Prior to 2011, Google enjoyed a mutually beneficial relationship with content farms which splattered their pages with AdSense ads (and Google ranked them highly). Can you imagine how it must have sounded for the Panda engineers to pitch to Sergey and Larry that they wanted to replace all those highly monetized websites with Wikipedia?

Matt Cutts commented that "with Panda, Google took a big enough revenue hit via some partners that Google actually needed to disclose Panda as a material impact on an earnings call. But I believe it was the right decision to launch Panda, both for the long-term trust of our users and for a better ecosystem for publishers."


For anyone wondering what Panda is: https://en.m.wikipedia.org/wiki/Google_Panda

“Google Panda is a major change to Google's search results ranking algorithm that was first released in February 2011. The change aimed to lower the rank of "low-quality sites" or "thin sites", in particular "content farms", and return higher-quality sites near the top of the search results.”


Back in the early to mid 2000s, I learned web design/development by volunteering to create websites for charities/NGOs. In the process, I

* ensured that the code (HTML and CSS, only basic non-AJAX, JavaScript) was standards-compliant (at the time, XHTML [1] was “the big thing”)

* implemented basic usability guidelines as advocated by Jakob Nielsen [2] in his Alertbox newsletter and

* followed Mark Pilgrim’s suggestions in his Dive Into Accessibility

Carrying out the above and simply focussing on quality content was enough to rank highly in Google’s search engine results and I never had the need nor inclination to do any research into SEO. Back then the mantra in the web development books was that “content is king” – and Google reflected this philosophy. Sadly, the Web has changed a lot in the intervening years.

1. https://en.wikipedia.org/wiki/XHTML

2. https://en.wikipedia.org/wiki/Jakob_Nielsen_(usability_consu...

3. https://web.archive.org/web/20110927131211/http://diveintoac...


Yea, if SEO is to have a useful meaning, it really out to be "changes you make to improve search ranking holding quality fixed".


Equivalently, the challenge in running a search engine is to decrease the divergence between “what makes a website good” and “what makes us rank you higher”.


> So when Panda came around, Google correctly rewarded Wikipedia with #1 rankings for over 50% of its terms (!!), and Jason Calacanis had to shut down Mahalo which got destroyed by Panda.

For context, this is the type of content that Mahalo was producing to try to game SEO:

https://www.youtube.com/watch?v=vdNk1xmDpxo


Game how? Pretty girl says she will show how to mix a drink. Then proceeds to show how to mix a drink. Not ending world hunger or anything, but nothing deceitful here at least. Maybe I'm missing something


> Game how? Pretty girl says she will show how to mix a drink. Then proceeds to show how to mix a drink. Not ending world hunger or anything, but nothing deceitful here at least. Maybe I'm missing something

IIRC, those videos are pretty famous because the pretty girl was not actually any good at mixing drinks:

https://www.esquire.com/food-drink/drinks/a30172952/viral-ol...

> JaNee Nyberg Once Made the World’s Worst Old Fashioned. Jim Beam Just Gave Her a Shot at Redemption.

> The world’s worst Old Fashioned was made on a quiet summer morning in 2010, in a dot com startup’s shoddily decorated conference room in Santa Monica. Mahalo.com had hired JaNee Nyberg to host a series of 50 cocktail tutorial videos that they would then upload to their YouTube channel. In the series’ most infamous video, the actress, model, and part-time bartender slops together an Old Fashioned using no bitters, a giant orange wedge, a ton of ice, and an entire pint glass of bourbon. Now, everyone makes an Old Fashioned a little bit differently—here’s Esquire’s official recipe—but Nyberg’s way was definitely wrong and totally hilarious.

Here's that video: https://www.dailymotion.com/video/xfhhjf


it's not awful, it's just not very high quality content. the instructions are more or less correct, but she does a lot of triggering things in the video, like not measuring the whiskey at all. if I ordered a mint julep in a bar and the bartender made it the way she does, I would not order one at that bar again. if it was expensive, I might ask for my money back or a simpler drink.

compare with this video on the same drink: https://www.youtube.com/watch?v=uTKC9Ht4Erg

the guy explains why he does things the way he does them and why you might do it differently depending on your tastes. unlike the first, this video manages to be more informative than simply reading its script as text.


Yeah, I'm not saying it's amazing content, I just fail to see how this is "gaming SEO"


you might accuse them of clickbaiting by using an attractive woman instead of someone who knows how to make a proper drink, but yeah I don't really see this as "gaming SEO" per se.

edit: I have now spent entirely too much time researching these videos and I feel bad for criticizing her. apparently she was an actual bartender but had to make a hundred of these videos in two days without any of the proper tools or even a script. https://punchdrink.com/articles/where-is-she-now-janee-mahal...


Went back an read the comments on that video. Youtube comments are gold sometimes.

Edit: Watched a few more. I get the impression that theese videos are actually made as gags. She says various measurements for the alcohol (like 2 ounces) but consistently just tops up the whole pint glass.

Edit 2: I may need to reevaluate the "not SEO gaming" stance.


I also went back to watch some more and I really hope you're right about them being gags. the old fashioned video is more like a "how not to" guide: gross cherries, muddling a whole orange slice with peel (wtf), spilling the drink everywhere in a needless mixing step, etc.


Feels like people would rather see the pretty girl. That's just giving people what they want.


Wikipedia's page views, though, are relatively flat since 2016. I suspect, in large part, because of Google's move to expose Wikipedia content on Google pages, removing the need to follow any links to Wikipedia for many queries.

https://stats.wikimedia.org/#/all-projects/reading/total-pag...


Google doesn't reliably put Wikipedia links in the results anymore because they're filling the first page with revenue generators.


This is definitely the reason. Whereas wikipedia is always top of duckduckgo results.


I dont think they did any of those things because of SEO, but because it was the obvious way to do it.

Deep interlinking - originally it used software called UseModWiki, which would automatically make a link if a page name existed for the word you just used.

subdomains - if you want to make a separate site for each language, that is the onvious way to do it

good content- why would anyone intentionally want to make a site with shitty content unless you are making $$$ off it (and wikipedia wasnt)


> Add to that a surprisingly under-the-radar company culture which has avoided any major blow ups despite its community wielding so much leverage over the world's education and having to make a lot of difficult calls on a daily basis.

There's been and still is plenty of controversy regarding both the Wikimedia Foundation and its relationship to the community of volunteers. In fact I'd say the whole thing's kind of rotten, because of many complicated issues.


I am aware of many of their issues, especially on smaller international sites. But even so, on a risk-adjusted basis I think they've managed to avoid more drama than other teams would have pulled off given the environment they are in.


That is interesting. As someone who used to work for the wikimedia foundation, it felt like there was a constant stream of drama. I guess when you are in the middle of it, it feels more intense than it actually is.


I think it's that we get drama which is big to us, but it rarely splashes outside the Foundation and the community to become common knowledge... and the dedicated community remains fairly insular.

Maybe this will be the hidden drawback of the current work to improve talk pages -- all the wikidrama will become more visible to the world!


There’s also the matter of scale and money.

Wikipedia drama generally seems to happen less often and affect fewer “big” personalities and their money, than say Youtube and Google giving their content creators whiplash over whatever the new policy change is.


> Add to that a surprisingly under-the-radar company culture which has avoided any major blow ups despite its community wielding so much leverage over the world's education and having to make a lot of difficult calls on a daily basis.

That's because Wikimedia Foundation is quietly focusing on the tech and keeping the site up and running while mostly letting the community govern itself.

The few times the Foundation tried to override the community, it didn't go well.


That's one version. Another version, that I know will go over very well here, is that some have invested a lot of money to bring high quality, accessibly-written off-line content by experts in the field to the Web, only for Wikipedia editors to poorly rewrite it in thousands of articles and rank over the original content in Google. And then Wikipedia started using non-follow links so the original sites got no benefit whatsoever.


I find it often has the opposite problem. High quality, accessibly-written off-line content by experts in the field is synthesised on Wikipedia by someone with good understanding of the subject, but then other editors delete large swathes of it for not citing every line and replace it with considerably more dubious explanations of the subject sourced to news articles and partisan think tanks which put all their content online.


> high quality, accessibly-written off-line content by experts in the field to the Web

Where is this content? It sounds like you’re alluding to something obvious but I honestly have no idea, and would like to know where to find it if it does exist.


No kidding. I remember trying to find high quality educational content on the web before wikipedia. For certain subjects it existed, but it was few and far between, and of very mixed quality (how do you know how much trust to put on some geocitirs page)?


>(how do you know how much trust to put on some geocitirs page)

The same way you learn to "trust" anything, including Wikipedia - by verifying sources.

I love Wikipedia, but I don't blindly assume it to be the ground truth in anything (if such truth even exists), especially in the "long-tail" of subject matter.


There's different levels of "trust". With Wikipedia i know roughly what i am getting. I can make an informed decision as to how much to trust it and how much to do further research depending on the application i need it for. After all, sometimes i just need knowledge with a decent chance of being true, where other times I need to be really sure. Wikipedia provides a relatively consistent experience (varrying somewhat with how obscure a page is). Random geocities sites do not give me that consistency, so I cannot make an informed guess as to how correct the page is.


There are fairly predictable quirks about Wikipedia:

- Articles in areas of math written in impenetrable jargon

- Encyclopedic articles about obscure, trivial subjects

- Stubs about relatively important individuals

- Tug-of-war entries about current events

- Random endless lists

- A lot of procedural fighting about original research, notability, etc.

But, as you say, a way to get pointers to or a quick take on a topic, it's pretty good. Am I going to take anything Wikipedia says to the bank without double-checking? Probably not. And, if you look deeply enough into some topics, you find a lot of circular references to some other single source of information. But overall, it's a good go-to reference.


With Wikipedia, you very soon develop a sense of gauging maturity of the article just from a quick glance. More often than not, the editors would even put maturity warnings for you.

If something looks dubious you can even dive into revision list to spot the problems.

This is more than can be said for nearly any other source out there.


"Where is this content? "

Scattered all over the web.

I do agree with that point, that for most topics there exists better quality content elsewhere. But finding it and verifying, that it is not made up, is the reason I also use mainly wikipedia first for researching a new topic. And then proceed to more detailed pages, sometimes linked in wikipedia.


If you want to understand suicide in the UK you need to know, at a minimum, about ONS, NCISH, Fingertips, and then coroners for England and Wales and whatever the equivalent is for Scotland and Northern Ireland.

Here's a list of links:

https://www.ons.gov.uk/peoplepopulationandcommunity/birthsde...

https://sites.manchester.ac.uk/ncish/

https://fingertips.phe.org.uk/profile-group/mental-health/pr...

https://www.judiciary.uk/wp-content/uploads/2013/09/guidance...

The best way to find out about these is to speak to someone who works in suicide prevention, so that would be people working for local authority suicide prevention partnership boards (they can have different names in different areas), or people working for NCISH or MASH or ONS, or people on Twitter. But if you can't do that you can sort of get some of the information from Wikipedia. It's a struggle though because the page is a poorly laid out mishmash of information, mostly written by people who don't understand the subject. https://en.wikipedia.org/wiki/Suicide_in_the_United_Kingdom


I hate to be that guy, but if the content on Wikipedia is wrong, why not fix it? Unlike other profit-driven community sites (cough, Fandom, cough) you'll actually be helping other people.


It's not possible to fix information on Wikipedia by using primary sources (the Judiciary website, the ONS data, the NCISH reports), you have to use secondary sources such as newspaper reports. Since newspapers get this stuff wrong too wikipedia will only allow incorrect information.

And that's Wikipedia working as intended. If you're unfortunate you'll run up against someone who i) doesn't know anything at all about the topic, ii) has misunderstood some poorly reported document, and iii) has more free time than you. It's exhausting dealing with these people and I simply have better things to do with my time.


That doesn't appear to be true, but you're right, getting into an argument with Wikipedians can be exhausting.

https://en.wikipedia.org/wiki/Wikipedia:Identifying_and_usin...


> It's not possible to fix information on Wikipedia by using primary sources

You're confusing "original research" (something you've personally researched and have not published elsewhere) with primary sources. Primary sources are absolutely suitable for Wikipedia, it's only original research that is disallowed.


Well, that kind of shows the issue, right? I’m talking about what happened many years ago (nofollow links were added in 2005) - companies learned their lesson and wouldn’t try to pay expert writers and editors for reference-type content for web use anymore. Here is content that’s somewhat similar: encyclopedia.com.

By the way, I think Google and its easily-gamed algorithms that rewarded regurgitated content and mega-sites is more to “blame” here than Wikipedia itself for how it went down.


I don't like Wikimedia nor the Wikipedias, but I don't get your point and I think you're incorrect (unless you're talking about Wikipedia's early days).

The way Wikipedia should work is by sourcing verifiable facts from reputable sources, and copyright violations are not allowed. I don't understand to what are you referring with "high quality, accessibly-written off-line content"? Britannica isn't high-quality and journalism isn't written by experts in the field.


You’re mistaken: rewriting content is not a copyright violation and is allowed. I’m not talking about Britannica but more in-depth content.


What does "non-follow links" mean?



Jason Calcanis, such a that guy.

Wikipedia is a fascinating story of well earned success and growth without the reigns of vc dominating its trajectory. I imagine they have had a lot of tricky decision making. I’m curious what their process has been (as someone who uses Wikipedia but really doesn’t know things are running behind the scenes).


It's sort of complicated -- there's an ideological nonprofit called the Wikimedia Foundation that hosts all the various WikiProjects and maintains the software that's used (mostly mediawiki and some associated services). When you see those banner ads on wikis for donations, that's who you're donating to.

However, the Foundation is very hands-off about the content of wikis, which tend to run on "consensus"[0] with the editing-community for that wiki. That establishes the policies for the wiki, and often influences the technical decision-making for specific wikis. Also, the Foundation writes mediawiki extensions for a bunch of non-core behavior, but the individual wiki communities take a strong hand in whether they're enabled for that wiki. It's why the WYSIWYG editing environment (VisualEditor) is so inconsistently available between wikis, for instance.

Some wiki communities have a fairly fraught relationship with the Foundation, generally if they feel like they're being pushed into things. There have been controversies about things like the Foundation banning abusive users project-wide, or Foundation employees editing wikis from their staff-affiliated accounts. It's generally very inside-baseball though, and if you're outside the community it's hard to hear about.

[0]: https://en.wikipedia.org/wiki/Wikipedia:Consensus


You mean they tried to make a useful site. During that period search engines were new, and people didn't yet start search result optimization hacking. Which meant ranking was still a good proxy for site quality and not for optimization hacking.

The conscious choice to leave the money on the table is exactly the same. Instead of optimizing for cash value which is just a proxy for real value, wikipedia optimized for quality encyclopedic knowledge distribution.


I hate to disagree, but SEO as an industry was booming pretty early on. I remember going to a conference in 2001 that was massively popular. The type of stuff you'd learn there was pretty outrageous and very very black hat (so much so, that people sharing those tips were too self-conscious to reveal their identities, hence the term black hat).

My point is - you could very much do very well back in those days regardless of your content quality (which consequently trended down and gave rise to content farms). It's only in the last 10 years (and particularly starting in 2014 based on my experience with my own content) that the content quality became a true proxy for ranking, and vice versa.

As a side note, these days, you still have SEO conferences, but the stuff you learn there is so diversified that people have started calling it content marketing and other names. The perhaps most useful gathering is SEOktoberfest, it's invite only and they admit only 30 attendees. Never been there, but I've heard it's worth the six grand that it costs to get in as a first-time attendee (I am not affiliated with it in any shape or form).


I agree. You go back 10 years and SEO was pretty much synonymous with black hat SEO.

These days, there still are a fair number of mostly low quality content farms. But there's also high quality content marketing. The latter still definitely is aware of things like page views, how far people read through an article, what type of headlines seem to be most effective, and so forth. It starts with good content that readers are interested in though.


For those who don’t closely follow SEO, can you explain what Panda is/was?


I know nothing about SEO either, but a quick search gives: https://moz.com/learn/seo/google-panda


I’m not sure how it’s done now, but Wikipedia had special promotion on Google ranking (which would have probably set lower manually if it would run ads, while people would stop comtributing content).

That guy who wants to run ads on it sounds like a really evil person: he doesn’t get it that Wikipedia changed the world already, it doesn’t need to do ,,more good’’.


He's not evil, he just can't see past making money.


Well, the love of money is the root of all of evil...


You've got it backwards; they didn't "kill it" on SEO by making good content, good content was rewarded with high ranking search results. Even this is questionable now that google et al purposely present just enough of wikipedia's content directly in the search results to discourage you from leaving google. So in the end they (a) display ads and (b) don't get the revenue. THis is a win?


Wikipedia does not display ads.


>>"[..] cared deeply about the content quality and also resisted to run ads [..]"

Do I remember wrong? because I remember they (the Wikipedia organization) were going to run ads at some point but it was strongly rejected by the community. I think there was even some fork because of that.


I don't think Wikipedia ever seriously considered running ads, but Wikitravel did fork over this, and the ad-free fork (Wikivoyage) eventually joined Wikipedia's parent Wikimedia.


If you mention some site is good in SEO it generally implies that compared to content quality, it gets better ranking in search engine. Here what you are saying is the quality is better and it is a good thing that SEO is same as have good quality content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: