The Star Wars universe has been mapped, and it’s massive

The Star Wars universe is massive, just in case you were wondering. Sure, it feels like it’s all about the Skywalker line and their interactions with the Force, but against the backdrop of the rise and fall of a Galactic Empire, there ends up being a lot of individual characters represented. Then, there was this thing called the Star Wars Expanded Universe, back before Lucasfilm declared most of it non-canonical, but those books, video games, and comics added even more characters to Star Wars lore.

All of this is extensively documented on the “Wookieepedia,” the wiki site for the Star Wars universe. A Swiss researcher and his team recently fed the entirety of Wookieepedia into a computer program that sorted and applied the data to graph theory. The results reveal the staggering size of the Star Wars story.

The study was run by Signal Processing Laboratory 2 at the Ecole Polytechnique Fédérale de Lausanne (EPFL) in Switzerland. Someone called Professor Vandergheynst directed the study, but a member of the team named Kirell Benzi blogged some results in celebration of Star Wars geekery and that’s how we’re getting our first look at the data.

The EPFL team scraped all of Wookieepedia into a graph database starting with the Category pages (like all the Jedi, all the Sith, all the Senators, etc) then moves through all the subcategory pages and loads the data into a graph database. This gives a complete list of every character in the Star Wars universe and according to Benzi, it contains 21,647 characters until you remove the characters whose name start with “Unidentified” (Unidentified Wookiee), that actual number is 19,612.

The computer program then went through every Wookieepedia page for each character and created a link if the characters were mentioned on each other’s page. That ended up taking two full days for the computer to scrape from the internet and process into the database. Using the graph database, you can start rendering representations of the data and discovering some cool things about the Star Wars universe.

For one, it is overwhelmingly human. Humans represent 78% of the characters in the Star Wars galaxy. Second place goes to the Twi’Lek (the aliens with the tails on their heads), then the Rodian (Greedo was a Rodian), and finally Wookiees, but all are significantly small pieces of the pie.

The whole of the Star Wars universe’s story takes place over 36,000 years of lore with most of the characters popping up around the eras of the two movie trilogies, the Rise of the Empire era and the Rebellion era, but a surprising third spike also happens around the Old Republic era that was never featured in the movie, but is the subject of the Knights of the Old Republic video games and supplemental material.

When graphing out character relations, the Top 15 most connected characters weren’t a huge surprise, except for Reven (Darth Reven), who is the representative of the Expanded Universe’s Old Republic era on yet another list.

While putting all the character relations on a graph based on the era in which they appear, Benzi and the team noticed some gaps in the era data for some of the characters. Using the data about the characters that interact with the era-less character, they were able to estimate what era these vague characters appeared in.

Han Solo is only going to meet people in the Rebellion era or later (during a normal human lifetime) so if you’ve met Han Solo, that narrows down what era you are in. The below graph has black nodes represent missing values, red nodes as the Rise of the Empire era (prequel trilogy). blue nodes are the Rebellion era (original trilogy). Green nodes are characters appearing in both eras.

And here it is after the computer program estimated the missing data points. It estimates this was done at a 60% accuracy.

Although this is all being applied to Star Wars characters, the science behind the app is the code and robots that allow such a massive amount of internet information to be data-scraped from the web and arranged in a graph database with corresponding connections in a relatively short period of time. Two days to connect every Star Wars character ever to each other isn’t that much time to spend when you think about the same programs and graphing techniques applied to scientific and medical studies.

Until then, it’s cool to see just how big Star Wars is on the two posts Benzi has graced the internet with (here and here)…also they should do a Memory Alpha scrape for Star Trek. That’d be cool.