6 reasons to (re)watch Buffy – how I created my dataviz

The idea

This year, the topic of Ironviz, Tableau’s annual dataviz competition, was ‘something that brings you joy’. Since I already worked in Eurovision and Hamilton (which one day, I will update), I was left with creating a dataviz on my cat or Buffy. And I went for the later, with a very clear goal: share 6 reasons to (re)watch Buffy the Vampire Slayer.

Here’s how. I hope to share some tricks you can apply to your next dataviz projects.

#1 Getting the data

#Python #Webscraper.io #OpenRefine #Github

  • The rest of the quantitative data was from IMDB and Buffy’s Wiki. I used webscraper.io, a free tool I tend to call “Beautiful Soup for people who prefer to visualise”. I also used OpenRefine for cleaning the csv extracts.
  • There was still one datapoint missing: how to prove the show is feminist?
    I could have tagged manually which episodes pass the Bechdel–Wallace test ; since I’m pretty sure it’s 100% of them, the datapoint would have had little wow effect…

    I got insanely lucky that Julian Freedland shared the full episodes transcripts on Github, including the total time spoken per character.
    All I had to do was to tag manually the gender of more than 200 characters.
    I excluded any character that had less than 10 seconds of dialog in the whole show and the ones where the gender is difficult to assign without rewatching the whole show (for instance, when the transcript says “doctor” or “vampire”).

#2 Format & software

#mobilefirst #mobilesecond #Tableau

My priority is to create with mobile usage in mind, not always an easy thing in Tableau. The viz will never be as pleasant to use on a smartphone as on desktop but I want to give at least a sense of the content, without the whole layout bugging completely.

The format was constricted by the max length on PowerPoint. Oh yes, I used Powerpoint for the background! Which leads me to the Design part:

#3 Design

#PowerPoint #Adobe

  • Designers, please don’t hyperventilate. I’m afraid I don’t have (yet) strong Adobe Illustrator skills, so I mainly used PowerPoint.
  • I really struggled with finding a good title font though. I needed a web safe font for the text created in Tableau (I went with Georgia) but I wanted something both youthful and iconic for the headers. I settled on Mason Sans, which means I have to create the text in Adobe and then save and use it in PowerPoint.
  • Icons: I have a paid subscription to Noun Project. The advantage is that you can change directly the colour of the required icons on the website (and you don’t have to credit the source of the icons).

#4 Technical challenges

#arcSankey #network #actions #Tableau

1/ Creating an arc diagram was made very easy with Ken’s Flerlage workbook and instructions shared on his blog

Close up of an arc diagram graph that aims at showing how Buffy is a show very well written. The show contains plenty of payoffs and foreshadows.

2/ Trend line appearing / disappearing depending on selection:

Close up of a line chart graph that shows the ratings of Buffy’s episodes throughout 7 seasons. The intention is to show that the ratings remain overall good, and don’t get worse in time and also that the finale episode of each season always receive a high rating, because the show always makes sure to end with a bang and a sense of closure.

I also got inspired by

3/ Creating a network graph was much more complex. I ended up following 3 tutorials in parallel. The first one is totally amazing but the following ones explain certain steps a bit more in depth. So in order of creation, I would recommend to view

#5 Getting feedback

#vizofficehours #datafam

One day, I will write a whole blog about how to get feedback often and early on. The long story short is that I got feedback from both industry experts (in this case: my friends who are Buffy fans) and dataviz experts.

Like I did with my Eurovision viz, I went to Michelle Frayman and Zaks Geis‘ weekly #vizofficehours ; they provided great tips and guidance on what to focus on. If you are in a “I know there’s something wrong but I cannot put my finger on it“, I would encourage you to attend.

In the context of IronViz, Sarah Bartlett and the datafam organised additional ad hoc office hours, so I had also the feedback of Jacqui Moore and Zach Bowers.

I illustrated how I use their feedback in this Twitter thread.

That’s all folks, drop a comment if you have any question 🙂

Analysing jury bias in the Eurovision song contest

https://public.tableau.com/views/Eurovision_Song_Contest_biases_2009_2019/EUROVISION?:language=en&:display_count=y&publish=yes&:origin=viz_share_link&:showVizHome=no#6

How I have created this data visualisation

The idea

I have watched Eurovision since I am a kid; every year, my parents would root for their native country. And yet, it came as a surprise to me that Portugal won in 2017. There’s this idea that blocs of countries vote for each other.

Deprived of a multitude of neighbouring countries, and being snobbed by other Mediterranean/Latin countries, how could Portugal get enough points from other countries to win?

We kept losing, therefore the competition was unfair, right? How can we explain 2017?

The question and the methodology

How to quantify the impact of bloc voting and bias? This is a tough question and fortunately for me, many clever people, with solid statistical knowledge, have already studied the topic.

I thought more interesting to focus on jury voting, because the 2009 re-introduction of jury was an attempt to limit bias. We know that televoting is biased, but what about juries of music professionals?

I reached out to Alexander V. Mantzaris who wrote with Samuel R. Rein & Alexander D. Hopkins a paper on Preference and neglect amongst countries in the Eurovision Song Contest in 2018. Not only did Alexander Mantzaris kindly helped me understand the statistical model, he also refreshed the data with updates from 2018 and 2019.

The visual inspiration

From the beginning, I thought a hex map was the way to represent the different countries.

The following visualisations were great inspirations:
– the Eurovision voting alliances from Delayed gratification is a little jewel
– Maarten Lambrechts’ work on Eurovision and Google searches and his making off are well worth a reading
The Economist already covered the paper I mentioned. I absolutely love the simplicity they chose for their chart, but I wanted to show more data and only recent one, so I went for another visualisation.

The dataviz challenge

I wouldn’t have been able to create this viz without
– the hex map without Daniel RowlandsTableau Tilemap generator
– on the Tableau community forum, Bryce Larsen has patiently answered all my questions and fixed my issues regarding unioning tables and required calculations .

After two months ogling at the same viz, I needed external feedback. Michelle Frayman and Zaks Geis have a weekly #vizofficehours ; they provided great tips and guidance on what to focus on. If you are in a “I know there’s something wrong but I cannot put my finger on it”, I would encourage you to attend.

Among other life savers, mentions to
– Andy Kriebel’s tutorial Using Parameter Actions to Choose a Chart Type (to allow switching from hex to classic maps)
– Kevin Flerlage Use Cases for Transparent Shapes & Images

Let me know if you have any feedback or question!

A resume a little less classic

For SWD Jan 2021 challenge, I freshened up my resume and try some interactivity

I used Powerpoint and Tableau only.


https://public.tableau.com/profile/anne.sophie5083#!/vizhome/Dataviz_Resume_ASPDS/myresume?publish=yes

My inspirations were:

Francisco Cardoso for the uncluttered resume and the bubbles

Christian Felix, for managing to put a significant share of written text while making everything look light

Varun, for great signposting

Hamilton: an exploration of motifs

Strong storytelling, quirky dataviz and quoting Hamilton are 3 of my favourite hobbies. I mixed the 3 and shared a tribute to my favourite musical in the form of a visualisation project celebrating its clever catchy phrases aka motifs.

>> Click here to explore themes & motifs in Hamilton <<<

 

Gif2

 

So how did I do it? Let me drop some knowledge!

  1. Get the lyrics

I scrapped (extracted) the lyrics and singers from Genius.com with the tool Webscraper.io and reworked the export to clean it with OpenRefine.

  1. Find the themes

I started looking at text analysis tools such as Voyant tools and then realised there was no need: everything clever about Hamilton’s lyrics and motifs has already been written. I focused on a qualitative analysis.

  1. They deserve credit for all the credit they gave me
  1. Hungry for more?
  1. I took a pen

The visualisation and navigation were all created on Tableau except the hand sketched ones. Oh, you haven’t seen them? The bunny must have hidden them…

My progress in Tableau: MakeOver Monday exercises

Content Warning: this blogpost mentions the topic of suicide. If you are at risk, please stop here and please find a list of crisis lines

Following the completion of several great Tableau courses (Data Visualization and Communication with Tableau on Coursera, Master Tableau in Data Science in Udemy and the free Tableau Certification Challenge by Olga Tsubiks) I have decided that the best way to master this dataviz tool is to practise regularly. This is where my participation to the weekly exercise MakeOverMonday comes in.

Here is a description of my first 3 attempts at this dataviz exercise.

Attempt #1: Iron Man championship medialists – MakeOver Monday week #42

For my first attempt, I wanted to highlight how the top countries with most medialists changed over time, but I used 3 graphics instead of 1 to make my point. Think this could have gone better, but I was proud to try anyway!

Attempt #2: Suicides in England and Wales – MakeOver Monday week #43

A very sensitive topic; the most important bit I learnt this week is how to approach sensitive topics (thanks to Bridget Cogley with her blogpost on Data Ethics, Dashboards, and Presenting Death). I made sure to provide help contacts as in this blog above, and to choose carefully my wording.

Attempt #3: World Cities Ranked by Annual Sunshine Hours – MakeOver Monday week #44

One of the lessons from my 2 previous attempts was to stop cramming my dashboard with too much info. For this one, I simplified a lot, which helped me spend loooooooooooooots of time entering manually the geo coordinates of almost 70 cities. Oh what a fun exercise!

Conclusion: so far, the experience has been good and I’ve learnt a lot. I feel that I made progress but seeing how others dealt with the same data also inspires me to go further. Stay tuned to see more MakeOver Monday attemps in my Tableau gallery!

Outside Insight: how social data could contribute to a new vision for decision-making

Several interesting mergers and partnerships in the social listening and analytics spaces occurred in the last couple of months: just for the month of October 2018, research firm Ipsos bought Synthesio, Brandwatch merged with Crimson Hexagon and Linkfluence announced the acquisition of Scoop.it. It The consolidation of this industry is accelerating, although there have been some movements for a couple of years; for instance, in 2017, the acquisition of Buzzsumo by Brandwatch made a lot of sense from a SEO/content marketing perspective.

 

Call me biased, but the acquisition of Sysomos (company I work for at the moment I’m publishing this) by Meltwater announced in April 2018 made even more sense for me.

 

The Meltwater project: Outside insight

Over the recent years, this company notorious for its media monitoring software, has been acquiring data science startups and social data companies such as Datasift. To what purpose? CEO Jørn Lyseggen has an ambitious long-term plan for a new type of software that would connect as much unexplored yet relevant data as possible, make sense of it and bring a “big transformation when it comes to corporate decision-making”.

He promotes his vision in the book Outside Insight and I’m going to share highlights of the book.

Outside Insight: a swift in the paradigm of decision-making

As of now, companies rely in internal data to make business decisions: quarterly sales, website traffic, CRM stats etc.
But internal data is by essence
– Historical, thus lagging, like “driving a car looking in the rear-view mirror”
– Insular: the focus shouldn’t be on how the company is doing but where the industry is leading.

To summarise, “external online data is the biggest blind spot in corporate decision-making today”.

A personal comment on that: there are out there tons of websites and articles of list of companies that failed because they didn’t adapt to their markets. My favourite example is how Nokia missed the cultural trend in China that switched consumers from mobile phones to smartphones, as explained by the technology ethnographer Tricia Wang, in her TedTalk: if you haven’t seen her presentation on the human insights missing from Big data, go see it immediately and come back after!

Outside insight Meltwater 1
A summary of the decision-making swift of paradigm suggested by the Meltwater Outside Insight approach

Outline of the swift to an external data focus

In this approach, monitoring your industry is essential, and benchmark is “the most honest measure of success”. What matters is not that your performance improved, but how much more it improved compared to your competitors’.

That’s why the book also provides guidance and a framework to incorporate the Outside Insight methodology.
It includes understanding the competitive landscape you need to monitor:

Outside insight Meltwater 2
Outside Insight: framework to incorporate the OI methodology

And the last phase, the most ambitious, would be to convert your focus to new indicators, in touch with the Outside Insight paradigm:

Outside insight Meltwater 3

What role for online data in this Outside Insight approach?

Some of the KPIs regarding your competitors are just not available to you: hence the need to look at new indicators and interpret them efficiently.

News coverage, social media comments and likes, product reviews, job postings from competitors, industrial patents are some of the ‘digital breadcrumbs’ that inform us about how our company and our competitors are doing. By tracking those breadcrumbs, we can make more sense of how the industry is evolving.

This aligns with 3 of my personal beliefs regarding social data:
1. Social media data is a goldmine of insights, mostly used by the marketing industry but untapped by many other sectors, such as business strategy, product development or risk management
2. Far from dismissing the biases and flaws of social data (I’ll dedicate a blogpost on the topic), I recommend combining social data with additional on and offline data types to generate a fuller picture, representing the reality with different perspectives
3. Most of online data points are by nature qualitative, unstructured, messy, and/or way too numerous to be analysed manually: connecting all that data constitutes a serious challenge. So how do you cope with volume, velocity and unstructured data?

“To the rise of an entirely new software category”

As the data is different by nature, Jørn Lyseggen suggests the solution would be a new type of software, complex, relying on NLP, AI and machine learning.
While BI is focused on internal data and operational metrics, OI is “concerned with a real-time understanding of the ebb and flow of the competitive landscape in order to anticipate future threats and opportunities”.
The plans are ambitious, but Meltwater is making the move to connecting all the required dots to provide such a software.
Exciting times ahead…

If you want to know more about the software launched by Meltwater, please visit https://outsideinsight.com/

 

Learning SQL for data analysis in 4 online courses & certifications

SQL 7

Over the past 18 months, I have more than 15 online courses and certifications in the realm of data science and data visualisation. Although I found this absolutely awesome list of The best Data Science courses on the internet, ranked by reviews, I have to admit that I struggled to find a learning path for data analysis SQL.

My aim today is to suggest a learning path from basic to more advanced, and that includes lots of exercises. Also, free or super cheap.

Without any further ado, here is the list that I suggest if you are interested in SQL for data analysis:

1/ An intro on the go: Sololearn mobile app

For 27 bite-sized lessons SQL Fundamentals is a good way to start and will keep you busy in the public transport (without the guilt of wasting your time on your smartphone)

SQL 1

2/ An intro course:

In the past, I used to recommend Datacamp but will no longer. Perhaps worth checking Code Academy or Khan Academy.

3/ Time to practise: SQL Zoo exercises

The SQL tutorial is less a series of tutorials than just exercises and they greatly helped me practise.

Also, I’m fond of those cute emojis that appear when you get the reply right 😊

SQL 3

4/ Gain real experience: Coursera

One challenge while learning SQL is that the examples provided are always a bit academic: the data is perfect, no missing entry, no mistakes. Not really like in the real life.

The Coursera Specialization Managing Big Data with MySQL is by far the most challenging and hands-on SQL course I have enrolled in over the past year.

You will learn

1/ how to interpret & create ER diagrams and relational schemas

2/ write MySQL and Terada queries to retrieve data from real business (=messy!) databases that contain over 1 million rows of data

3/ translate business questions into SQL queries to create business value.

Anything missing from this list? A suggestion for a 5th and more challenging course? Drop a comment!

Dataviz: what the Dear data challenge taught me in 10 weeks

Dear data is a project warmly to anyone: data analysts, aspiring data analysts like myself, but also teachers, children, artists, and everyone else. With this blogpost, I hope to encourage you, not only read about it, but to run the challenge yourself.

Dear data

Never heard of it?

Dear data is a data drawing project run by Giorgia Lupi and Stefanie Posavec.

During a year, every week, those 2 talented designers living an ocean apart from each other, sent each other a data visualisation, based on observation of their daily lives. They shared their experience in the book Dear Data and encouraged their readers to join the Dear Data challenge.

My friend Lynda, who lives in Paris and I, run through the Dear Data challenge over 10 weeks, and I’d like to share with you today why you should find yourself a penal and run it too.

What would I need?

You need

  • A pen pal, if you can’t find one, there is a Google group for that (easy)
  • Felt pens or colouring pencils (although you can try with one pen and one colour)
  • 10 stamps and blank postcards (probably the most challenging items to obtain in this day and age)

Check, check, check: now, what are the instructions?

1/ At the beginning of each week, both pen pals agree on a topic to collect data on: a week of food, a week of weather, a week of passing through doors, whatever you want!

Think about also the metadata you are going to collect; for instance, for a week of food, are you going to collect the time you eat? The place? The people you are eating with?

Dear data: discuss and choose your weekly topic

2/ During a week, collect the data the way you prefer: when I asked her tips, Stefanie Posavec recommended the app Reporter to me, but I’m so old school I kept tracks on a notebook

Dear data 3

3/ At the end of the week, each party analyses the data she/he has collected, organises it and sketches a data visualisation.

Dear data 4

4/ Draw on a postcard your data visualisation.

Very important rule: you have to draw manually, don’t cheat and use Tableau or some non sensical dataviz tool!

On the address side of the postcard, you draw the legend and can add your comment / analysis.

Dear data 5

5 / Send the postcard on its way, and a couple of days later, catch up with your friend to compare your works and visualisations.

Dear data 6
Enter a caption

What will I gain for the experience?  

1/ No matter where you are starting from, you will gain data literacy

Whether you are on your path with data, whether you are considering a career in data science or journalism, or are just curious, this project is accessible to everyone.

There are no intimidating technical requirements, no need to learn R, SQL or Tableau beforehand. How Giorgia put it: “Starting small is how we hope to increase data literacy.”

2/ Metadata collection & data visualisation: you will learn by comparing the outcome

You might agree on a topic, you will be amazed by the final results look different between your work and your friend’s.

It boils down to 2 main reasons

  • (Meta)data collection: each party has a different interpretation of the topic and will therefore collect different metadata, or shall we say, contextual information
  • Even when you collect the same data, there is a thousand ways of representing it

Here is an example with a ‘week of food’

 

  Lynda Anne-Sophie
Data collected Concise: main meals Exhaustive: all the meals and beverages
Metadata collected size of portion, provenance (homemade VS take away VS leftovers), 4 main food types, company during the meals main food types, hours, provenance (homemade VS ready meals

 

3/ In short, you will learn about what type of data visualiser you are (strengths, challenges)

For instance, we have learnt that Lynda is great as summarising the data in a curated list of symbols and criteria, and leans towards figurative.

I try to represent as much data as possible, sometimes too much, and am obsessed by finding figurative representations.

Dear data 9For instance, for ‘a week of pleasures’, I realised that small pleasures are like coins you collect and the more you collect, the better mood you are in. I was reading ‘Ready Player one’ during that week, and that made me think that a (retro) gamification representation was the best!

4/ You will focus on what matters the most: storytelling

By sticking up to basic tools such as pens and paper, you will actually focus on rawer aspects on data communication and visualisation.

You will spend time trying finding and drawing a story, even on topics that might seem trivial, such as a ‘week of weather’.

For instance, when Lynda and I choose the topic “a week of my hair”, I had no idea what story I was going to tell. Whilst gathering the data, I realised that my behaviour towards my hair depended highly on the number of days since my last wash (twice a week)! I told a story of care and neglect, which is a perpetual bi-weekly cycle with a restart button in the middle.

Dear data 10

5/ A personal exploration

One of the most basic tips you hear when you go on a diet: write down everything you eat, so you can keep track.

I know that Dear Data could look like a prescriptive project (like an analog Fitbit) but, as Giorgia and Stefanie put it: “We’ve always conceived Dear Data as a “personal documentary” rather than a quantified-self project which is a subtle – but important – distinction. Instead of using data just to become more efficient, we argue we can use data to become more humane and to connect with ourselves and others at a deeper level.”

The main idea is that you learn a lot about yourself while collecting the data, and that does not have to be prescriptive but more a mindfulness project.

For instance, I forced Lynda into running ‘a week of smartphone’: how many times we reach out to our phones? Why? What do we do with it?

I thought it would help me switch off and work on my dependence issue, but I realised at I usually tend to have a very legitimate reason to reach out to my phone: the issue is the distracting notifications. So instead of blaming myself for reaching out to much, I discovered that the smartphone was a great tool, all I need is to delete some apps and restrain notifications to the bare minimum.

Dear data 11

I hope I convinced you to find a friend and start this project, don’t hesitate to share the outcome with the hashtag #deardata!

A week of learnings

Dear data 12

I hope I have encouraged you to take the challenge yourself!

Learn more about the Dear Data project

Check Lynda and my Dear Data pictures