My most recent project lets you explore your fediverse connection with a breakdown of servers your followers and people you follow are on.
A new feature I added last week lets you download the data, giving you an opportunity to do some fun things with it.
For example, I wonder how many of my connections use Mastodon. How many of them host their own instance. What about how old the domains they use are. I bet a lot of them were registered pretty recently.
Let’s start by visiting the project page, signing into our fediverse instance, and giving it a moment for the data to be processed. Once the data is ready, let’s save it as connections.csv
.
To figure out what software each domain runs, we will need to visit its /.well-known/nodeinfo
endpoint, which contains links to where we can find this information.
So, for example, for mastodon.social, we will visit https://mastodon.social/.well-known/nodeinfo
, which responds with the following JSON data.
{
"links": [
{
"rel": "http://nodeinfo.diaspora.software/ns/schema/2.0",
"href": "https://mastodon.social/nodeinfo/2.0"
}
]
}
Some platforms, like Calckey, might give you a few links.
{
"links": [
{
"rel": "http://nodeinfo.diaspora.software/ns/schema/2.1",
"href": "https://calckey.social/nodeinfo/2.1"
},
{
"rel": "http://nodeinfo.diaspora.software/ns/schema/2.0",
"href": "https://calckey.social/nodeinfo/2.0"
}
]
}
Visiting one of these will show you the information we need.
{
"version": "2.0",
"software": {
"name": "mastodon",
"version": "4.1.2+nightly-20230517"
},
"protocols": [
"activitypub"
],
"services": {
"outbound": [
],
"inbound": [
]
},
"usage": {
"users": {
"total": 1126636,
"activeMonth": 228529,
"activeHalfyear": 467959
},
"localPosts": 55648204
},
"openRegistrations": true,
"metadata": {
}
}
Specifically, it’s the value of software.name
. Now that we have all the pieces together, let’s write some code that will do the work for us.
I am going to use Python for this little data exercise, so a basic familiarity with it will be needed. Or you can follow along using a language you are more comfortable using.
Let’s create a file called connections.py
. We’ll start by opening the connections.csv
file we downloaded earlier and seeing what’s inside.
import csv
filename = "connections.csv"
with open(filename, "r") as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
print(row)
When we run this script with python connections.py
, we will see the contents of our CSV file. So far, so good.
We only really care about the domain, so let’s change the last line to print(row[0])
. Run the script again, and that’s exactly what we’ll see.
Now, going back to our notes from earlier, we’ll remember that to get the information about each server’s software, we have to start at the /.well-known/nodeinfo
endpoint.
Let’s try this. Instead of just printing the domain name, we can make a request to it.
First, we’ll need to install the requests package.
pip install requests
Now we can make start fetching some data. One little advice, you might want to make a backup of your original connections.csv
file and only keep 1-3 domains in it while we work on our code, to make things faster.
Here’s a quick test.
import requests
import csv
import json
filename = "connections.csv"
with open(filename, "r") as csvfile:
datareader = csv.reader(csvfile)
next(datareader) # skip the header
for row in datareader:
domain = row[0]
url = f"https://{domain}/.well-known/nodeinfo"
try:
response = requests.get(url)
response_json = response.json()
response_formatted = json.dumps(response_json, indent=2)
print(f"response from {domain}: {response_formatted}")
except requests.exceptions.RequestException as e:
print(f"error accessing {domain}")
Running this will show the nodeinfo links for the first few of our domains.
response from mastodon.social: {
"links": [
{
"rel": "http://nodeinfo.diaspora.software/ns/schema/2.0",
"href": "https://mastodon.social/nodeinfo/2.0"
}
]
}
response from mstdn.social: {
"links": [
{
"rel": "http://nodeinfo.diaspora.software/ns/schema/2.0",
"href": "https://mstdn.social/nodeinfo/2.0"
}
]
}
response from hachyderm.io: {
"links": [
{
"rel": "http://nodeinfo.diaspora.software/ns/schema/2.0",
"href": "https://hachyderm.io/nodeinfo/2.0"
}
]
}
For simplicity, we can use whatever is the first link from each nodeinfo object.
import requests
import csv
import json
filename = "connections.csv"
with open(filename, "r") as csvfile:
datareader = csv.reader(csvfile)
next(datareader) # skip the header
for row in datareader:
domain = row[0]
url = f"https://{domain}/.well-known/nodeinfo"
try:
response = requests.get(url)
response_json = response.json()
nodeinfo_url = response_json["links"][0]["href"]
response = requests.get(nodeinfo_url)
response_json = response.json()
software_name = response_json["software"]["name"]
print(f"{domain} uses {software_name}")
except requests.exceptions.RequestException as e:
print(f"error accessing {domain}")
If you run into any issues, you might need to do a check the rel
attribute of the first response_json
object and make sure it contains the correct schema information, for example http://nodeinfo.diaspora.software/ns/schema/2.0
I’ll let the script run for just a few of the domains.
mastodon.social uses mastodon
hachyderm.io uses mastodon
botsin.space uses mastodon
calckey.social uses calckey
error accessing mastodon.lol
kolektiva.social uses mastodon
friend.camp uses hometown
pixelfed.social uses pixelfed
outerheaven.club uses akkoma
error accessing masthead.social
kolektiva.media uses peertube
As expected, Mastodon, and it’s forks, like Hometown, show up pretty often. It’s also interesting to see that my instance still keeps track of people from servers that no longer exist, like mastodon.lol.
Alright, so now we just need to figure out the distribution of each platform in our dataset, which is pretty straightforward.
import requests
import csv
import json
filename = "connections.csv"
platforms = {}
with open(filename, "r") as csvfile:
datareader = csv.reader(csvfile)
next(datareader) # skip the header
for row in datareader:
domain = row[0]
url = f"https://{domain}/.well-known/nodeinfo"
try:
response = requests.get(url)
response_json = response.json()
nodeinfo_url = response_json["links"][0]["href"]
response = requests.get(nodeinfo_url)
response_json = response.json()
software_name = response_json["software"]["name"]
print(f"{domain} uses {software_name}")
if software_name in platforms:
platforms[software_name] += 1
else:
platforms[software_name] = 1
except requests.exceptions.RequestException as e:
print(f"error accessing {domain}")
platforms_formatted = json.dumps(platforms, indent=2)
print(platforms_formatted)
In addition to the previous output, we will now see a list of detected platforms and how many servers use them.
{
"mastodon": 4,
"calckey": 1,
"hometown": 1,
"pixelfed": 1,
"akkoma": 1,
"peertube": 1
}
Great. We can now start thinking about doing something with this data. And for that, let’s install pandas, which you might know is a popular Python library for analyzing and manipulating data.
pip install pandas
Here’s an updated version of our script that saves the platform data into a new CSV file. I also added some extra code to keep track of the progress.
import requests
import csv
import json
import pandas as pd
filename = "connections.csv"
platforms = {}
datareader = csv.reader(open(filename))
domains_count= len(list(datareader))
print(f"found {domains_count} domains, processing...")
step = 0
with open(filename, "r") as csvfile:
datareader = csv.reader(csvfile)
next(datareader) # skip the header
for row in datareader:
step += 1
domain = row[0]
url = f"https://{domain}/.well-known/nodeinfo"
try:
response = requests.get(url)
response_json = response.json()
nodeinfo_url = response_json["links"][0]["href"]
response = requests.get(nodeinfo_url)
response_json = response.json()
software_name = response_json["software"]["name"]
print(f"{step}/{domains_count}: {domain} uses {software_name}")
if software_name in platforms:
platforms[software_name] += 1
else:
platforms[software_name] = 1
except requests.exceptions.RequestException as e:
print(f"error accessing {domain}")
platforms_formatted = json.dumps(platforms, indent=2)
print(platforms_formatted)
df = pd.DataFrame(platforms.items(), columns=['platform', 'servers'])
df.to_csv('platforms.csv', encoding='utf-8', index=False)
We can now run this script using the original connections.csv
file.
Let’s install one more package.
pip install matplotlib
And here’s our platforms.py
script.
import pandas as pd
import matplotlib.pyplot as plt
filename = "platforms.csv"
data = pd.read_csv(filename)
df = pd.DataFrame(data)
df_sorted = df.sort_values(by=["servers"], ascending=False)
chart = df_sorted.plot.bar(x="platform", y="servers", rot=45)
chart.bar_label(chart.containers[0])
plt.title("Popularity of fediverse platforms")
plt.xlabel("Platform")
plt.ylabel("Number of servers")
plt.savefig("platforms.png", bbox_inches='tight', dpi=100)
Pretty basic, but gives us a solid result.

I mentioned earlier wanting to see when was each domain registered. There’s a few ways we can go about this. First, let’s make a copy of our connections.csv
file, we can call it domains.csv
.
Open it in Excel, or a similar program, and delete the connections
and percentage
columns, leaving just a list of domains.
We will process this new CSV file with the following script, let’s name it domains.py
. Note that I’m using the command line tool whois, which is available on Linux and MacOS by default, but some operating systems, like Windows, may not have it and you will have to install it.
import os
import csv
import json
import re
import pandas as pd
import matplotlib.pyplot as plt
filename = "domains.csv"
datareader = csv.reader(open(filename))
domains_count= len(list(datareader))
print(f"found {domains_count} domains, processing...")
step = 0
domain_info = {}
with open(filename, "r") as csvfile:
datareader = csv.reader(csvfile)
next(datareader) # skip the header
for rows in datareader:
step += 1
domain = rows[0]
print(f"{step}/{domains_count}: {domain}")
try:
stream = os.popen(f"whois {domain}")
output = stream.read()
match = re.search(r"Creation Date: (\d{4}-\d{2}-\d{2})", output)
creation_date = match.group(1)
domain_info[domain] = creation_date
print(creation_date)
except AttributeError as e:
print("creation date not available")
print(json.dumps(domain_info, indent=2))
df = pd.DataFrame(domain_info.items(), columns=["domain", "creation_date"])
df.to_csv("domain-info.csv", encoding="utf-8", index=False)
After you run this script, you will have a new file called domain-info.csv
. A few domains might be missing if the whois tool couldn’t find them.
Time to create one more file to process this new data, domain-info.py
.
import datetime
import pandas as pd
import matplotlib.pyplot as plt
filename = "domain-info.csv"
data = pd.read_csv(filename)
df = pd.DataFrame(data)
# We will need to convert the dates so that matplotlib understands them.
dates = list(map(lambda date: datetime.datetime.strptime(date, "%Y-%m-%d"), df["creation_date"]))
plt.plot_date(dates, df["domain"])
plt.title("Age of fediverse domains")
plt.xlabel("Domain creation date")
# Let's remove the Y axis for a cleaner look.
plt.tick_params(axis='y', which='both', left=False, right=False, labelleft=False)
plt.savefig("domain-info.png", bbox_inches='tight', dpi=100)
Again, very straightforward. And here’s how the result looks for me.

Pretty much what I expected, but it’s cool to see people using domains they registered a while ago as well.
I hope these examples will inspire to dig more into the data, and feel free to share any interesting findings with me!