Aiming to provide the “sum of all human knowledge,” Wikipedia is one of today’s most highly-trafficked websites. Of its content, Katherine Maher, CEO of the Wikimedia Foundation — the nonprofit that hosts Wikipedia — writes: “We believe in ‘knowledge equity,’ which we define as the idea that diverse forms of knowledge should be recognized and respected.”
But does the encyclopedia live up to this vision, or is it playing a part in perpetuating and entrenching long-standing biases? The gender imbalances in both the editor population and the site’s biographies — both over 80 percent male — are well documented. But what of the backbone of the encyclopedia: the sources cited within its pages?
The linked citations that appear at the bottom of Wikipedia pages provide both verifiability to the page content and visibility to the sources themselves — potentially a lot of visibility. A 2010 study — which opens with the line “Want to stir up a room full of college faculty and librarians? Mention Wikipedia” — found that in a survey, 85 percent of college students reported browsing Wikipedia during the early stages of research projects, and more than half noted linked citations as a reason for turning to the site.
Librarian Merrilee Proffitt, quoted on the Wikimedia blog, observes that citations can “lead end users to libraries where they can find those trusted sources and others like them — for free.” After the University of Washington added links to its digital collections, Wikipedia directed more than 11,000 visitors to their collections over the course of one year. Ball State University measured a seven-fold increase in annual page views after adding links to its digitized sheet music collection. And a very recent study, available in preprint, found that readers of the English Wikipedia click an external link once for every 147 page views. (Wikipedia receives about 20 billion page views a month.) Meanwhile, the Internet Archive has been scanning Wikipedia’s source documents to make them easily accessible to students working late into the night. The potential for knowledge discovery via Wikipedia is enormous.
Even self-citation, a practice men engage in more often than women, attracts the attention of browsing scholars. According to one study, the average self-citation will receive one additional, independent citation within a year, and three independent citations within five years. Visibility has value. It’s far easier to miss a person’s work if it’s not there.
But how do citations get into Wikipedia to begin with? And whose work is being cited? A cursory glance at Wikipedia’s 10 most cited sources reveals that they are almost exclusively authored by men. (Nine were authored by men and one was authored by “The MGC Project Team,” which has female members). The most-cited source — “Updated world map of the Köppen-Geiger climate classification,” referenced a whopping 2.8 million times — was properly lauded for being useful and open access. But less attention was paid to the fact that a bot inserted the bulk of these references.
That “bot factor” puts people like Jane Darnell, a Wikipedia editor who strives to make women more visible on the site, at a technical disadvantage. Darnell, who focuses on paintings and their documented catalogs, notes that women rarely appear as lead authors. The catalogs are credited to the person who writes the introduction, generally a museum director, generally a man. Whenever possible, she adds the names of additional contributors, often women, to the catalog descriptions on Wikidata, a crowd-sourced database that acts as an information hub for Wikipedia and its sister sites.
“This helps improve the visibility for women researchers,” Darnell says. But her additions often require considerable digging through exhibition catalogs and other primary sources. Her careful work, and that of like-minded editors seeking to correct for Wikipedia’s gender bias, does not match the blistering pace of automated editing. Technology can easily propagate bias, or even amplify it.
Of course, all Wikipedia articles are not made equal. To get a more complete picture of the site’s sourcing inequities, I took some time to look at how women authors are cited on what Wikipedia editors consider to be the site’s most important pages. I started with the site’s mathematics project, one of several projects created to help interested editors collaborate and monitor coverage of specific subject areas. As part of the project, mathematics articles are assigned a priority — top, high, mid, or low — indicating “how important it is that Wikipedia should have a high quality article on the subject.” Top priority articles are a “must-have for any reasonable mathematical encyclopedia,” and these pages are more likely to be included in fixed versions of Wikipedia distributed in print, on flash drives, or on memory cards.
Of the books cited in top priority math pages, I looked for those written by a single author or editor — per the source’s metadata — and I matched each author to their gender using online biographical records. For authors that I couldn’t match, I guessed the gender based on name. I concluded that just 77 of the 1,753 sources, roughly 4 percent, were authored by women. The number of female authors was similarly low, about 5 percent, for sources cited in high priority pages, the next highest tier.
Mathematics has long been known for its abysmal underrepresentation of women. Although women earn just under one-third of math and statistics doctorates, a 2016 study found that only 8.5 percent of single-authored publications in top mathematics journals are by women. Might Wikipedia pages on other subjects show more gender parity?
To probe that question, I turned to Wikipedia’s project on literature, a subject area in which women are better represented. I found that only about 15 percent of the single-author books cited in top priority literature pages were written by women. (Women authored roughly 22 percent of sources cited in top priority pages and 27 percent of those cited in mid priority pages.)
One factor in the gender imbalance is the content of the pages themselves. Among a list of top priority pages that includes James Joyce, Franz Kafka, Gustave Flaubert, George Orwell, and Marcel Proust, there wasn’t a single page devoted to a woman. Because a Wikipedia page about a literary figure will typically reference that figure’s canon, the imbalance in citations is partly driven by the number of texts authored by the subjects of the pages themselves.
The skewed perception of what counts as important is itself worthy of note. Of the 31 authors, playwrights, and poets that Wikipedia recommends should be included in every language version of the site, none are female. Where is Toni Morrison? Ursula Le Guin? Virginia Woolf? Alice Munro? Nadine Gordimer? Gabriela Mistral? Emily Dickinson? Anne Frank? Mary Shelley? Gertrude Stein? Jane Austen? Sappho? Only in the “expanded” list of 246 essential writers do women begin to appear, and they remain outnumbered — even counting Charlotte, Emily and Anne Brontë, whom Wikipedia conveniently groups into a single entity, “the Brontë family.”
The Royal Society of Chemistry recently released a plan to eliminate gender bias from research publishing. Among the findings, the Society’s report notes that papers with a female corresponding author — a role traditionally reserved for the most senior member of an authorship team — receive fewer citations than those with male corresponding authors. Also, men are less likely than women to cite a paper with a female corresponding author. In light of the gender imbalance among Wikipedia editors, it’s worth examining whether the same patterns hold true on Wikipedia.
Digital libraries like Wikipedia can provide vast access to knowledge, but they can also further diminish voices that are already marginalized on traditional shelves. Whether Wikipedia serves to empower and engage all people or to perpetuate and entrench long-standing biases is up to us.
* * *
Kirsten Menger-Anderson is a writer and researcher based in San Francisco. She is the author of “Doctor Olaf van Schuler’s Brain.”