Stache-istics!

A week ago we launched www.mymoustache.net, a fun website that uses machine learning to measure your facial hair and promote men’s health.

And we’ve got some nice mentions around the media atย venturebeat.com, PC World, Movember, Business Insider and a few others.

And now we’re starting to look at the data behind the scenes and learn some interesting, amusing facts.

How do we look at the data?

Every time a photo is analyzed, we generate a small text data with some details like, how many faces we found, their average age, gender, size of moustache/beard found, approximate location where the photo was submitted (very high level, anonymous so there’s no personal identifiable data at all) and we send this to some big pipes we have behind the scenes called Azure Event Hubs. There are meant for super high scale data pumping, like IoT scenarios for example. From there, we persist that data into our cloud storage. Then we read it back using Azure Stream Analytics where we can parse it and send to our reporting tool, PowerBI.

More or less like this:

stacheistics architecture

 

This architecture allows us to analyze data in real time, aggregate and report on it in all sorts of ways. For example, on Stream Analyticsย I take that text data (saved in JSON format) and transform it this way:

with data as (
 select
  GetArrayLength(logs.AnalyzeResults.Faces) AS faceCount,
  arrayElement.ArrayValue.faceId as faceId,
  arrayElement.ArrayValue.attributes.gender as gender,
  arrayElement.ArrayValue.attributes.age as age,
  arrayElement.ArrayValue.attributes.beardPercentile as beardpPercentile,
  arrayElement.ArrayValue.attributes.moustachePercentile as moustachePercentile,
  arrayElement.ArrayValue.attributes.BeardLength as beardLength,
  arrayElement.ArrayValue.attributes.BeardConfidence as beardConfidence,
  arrayElement.ArrayValue.attributes.MoustacheLength as moustacheLength,
  arrayElement.ArrayValue.attributes.MoustacheConfidence as moustacheConfidence,
  arrayElement.ArrayValue.donate as donate,
  logs.AnalyzeResults.SubmissionMethod as submissionMethod,
  cast(logs.Timestamp as datetime) as eventDateTime,
  cast(DATETIMEFROMPARTS (DATEPART ( yyyy , logs.Timestamp ),
  DATEPART ( mm , logs.Timestamp ),
  DATEPART ( dd , logs.Timestamp ), 0,0, 0,0) as datetime) as eventDate,
  cast(DATETIMEFROMPARTS (2015, 11, 07,
  DATEPART ( hh , logs.Timestamp ),
  DATEPART ( mi , logs.Timestamp ),
  DATEPART ( ss , logs.Timestamp ),0) as datetime) as eventTime,
  logs.Latitude as Latitude,
  logs.Longitude as Longitude,
  logs.Country as Country
FROM
  logs as logs
  CROSS APPLY GetArrayElements(logs.AnalyzeResults.Faces) AS arrayElement
)

select * into output from data

If you know SQL you will understand most of this. But we’re not running this against a dataabse. Instead, we’re running this against streaming data. And we’re parsing and remodeling this data acording to our needs. For example, in this particular case I have a JSON payload that may contain an array of faces (a single photo may contain many faces) so I need to turn each face into its own record by dowing a cross query against that array.

Then I setup stream analytics to push this to Power BI where I get this super nice reporting tool that tells us a lot. And what have we learned?

  • Total faces analyzed in a week: 49,254
  • Total men: 37,534
  • Total women: 11,720 (you would wonder why women would care about analyzing their faces with a mustache site but it turns out we have a auto-stache feature that adds mustaches to them)
  • Average mustache length from all photos (scale between 0 to 1): 0.28
  • Average beard length from all photos (scale between 0 to 1): 0.27

Countries with the biggest mustaches in average (and here I was hoping somewhat I’d find Mexico right at the top so I could have some nice jokes with my friends but that didn’t happened at all. Actually and quite unexpectedly Brazil, my home country, was one of the top ones):

biggestmoustaches

And here the countries with the shortest mustaches, where there seems to be a few Asian ones there:

Smallmoustaches

And also average mustache length by age:

age

Not many 10 years olds with beard as you would expect ๐Ÿ™‚ Some interesting couple of bars that were quite off there, I wonder why…

And that’s it…

 

Ah, I also decided to write a custom dashboard that shows some of this data across the world in 3D. Because everything looks more fun in 3D: http://stacheistics.azurewebsites.net

The taller the bars are, the higher the number is (they also look lighter):

stacheistics

One interesting fact I’ve got from there: How many people checked the “donate to science” checkbox and authorized us to use their photos to improve our machine learning, per region?

science

It turns out (I don’t know why) some regions were off the bar. The northeast of Brazil for example is one. Egypt as well shows very high. North of Japan and Belarus also seem pretty high.

A disclaimer: Don’t take this data very seriously. It could very well be a programming error from my side. But looking at this data is still pretty cool! ๐Ÿ™‚

 

  • David Keller

    Nice read! ๐Ÿ™‚ So far I’ve seen PowerBi only on presentations… Sadly, your world is broken ๐Ÿ˜‰ http://stacheistics.azurewebsites.net/

    • Yes, I took that demo down a good while ago ๐Ÿ™‚ I should have added some update in this post, thanks for pointing that out