Tip:
Highlight text to annotate it
X
In August 2011, Australian Minister for the Arts, Simon Crean, issued a discussion paper
seeking public input to the National Cultural Policy.
This policy will present a 10 year strategy vision for the arts, cultural and creative
endeavours in Australia.
The Office of Senator Kate Lundy, in collaboration with the Office of Minister Crean, ran a Digital
Culture Public Sphere consultation to gather public input to this policy, and used Palantir
to integrate and analyse the contributions from across a range of media.
The aim was to illicit visions for success from relevant sectors on the digital culture
landscape and also to seek specific ideas for how to reach these goals.
Using Palantir, we integrated contributions that people made to a range of sites including
Twitter, Facebook, Youtube, the Public Sphere wiki, and IdeaScale, a site for posting and
voting on ideas.
Let's take a look at how we import this data into the Palantir platform.
Here we've got a spreadsheet containing information about the people who registered to attend
the live Public Sphere event. We simply drag and drop this onto the Palantir workspace
to begin importing this data.
Next, we map the spreadsheet columns to properties and objects in the dynamic ontology, which
represents the conceptual model that's been defined for the Public Sphere event.
In this case we're mapping properties like names, emails addresses and digital culture
sector affiliations to people objects so we can model the contributions from different
sectors and different people across media types.
The result of this process is a collection of people objects that represent the people
who registered to attend the live event. These people objects have properties associated
with them. The email addresses, names and digital culture sector affiliation we just
saw.
Having seen one way we can load data, let's take a look at an overview of all the data
that was contributed during the Public Sphere consultation.
The object explorer application gives us a top down view of the almost 6000 contributions
across the different media types.
Within the object explorer, the timeline helper gives us a temporal look at this data. We
can also explore it and slice it by contributions to specific media sites. We can see for example
small numbers of contributions to YouTube and Facebook, a larger number of blog comments
and wiki edits, and a peak of IdeaScale votes towards the end of the consultation period
once ideas had come to fruition.
The largest peak on the graph represents more than 2000 tweets that were registered during
the live event on October 6th.
One of the challenges when gathering contributions from different sources is to identify those
contributions that were actually from the same person, even when that person has used
different user accounts with different names across these different systems.
Palantir provides entity resolution tools that can be used to help with this process.
These rule based approaches can be used to merge user accounts which share an email address
or a name, or any other properties, into a single underlying person entity.
Further manual resoltuion can also be done to allow tacit knowledge and human intuition
to be used to further resolve different user identities that represent the same undelying
person.
Now that we've resolved our user accounts to the underlying people, we can begin to
explore the contributions from each sector of the digital culture landscape.
These sectors are defined as digital arts, film and animation, media and music, games
development and cultural institutions and big picture contributions that cut across
these sectors. One way to approach this sector analysis is to begin with the people entities,
that is, the people who have made contributions.
The first step is to map each person into the sector or sectors that they represent.
18% of contributors attended the live event and explicitly provided this information at
registration.
For everyone else we must analyse the available information to determine which sectors they
represent. We can do this a number of ways. One approach is to use the text cloud help
that shows frequent terms that occur in the profile information that's associated with
people who contributed.
We can use this information to identify and group users. For example, those who mentioned
galleries, library collections, museums or archives in their profile are likely to represent
the cultural institution sector.
This grouping of matching users can then be manually checked to make sure that the sector
information that is assigned to each person is of high quality. Another approach is to
look at the domains associated with people's email addresse. Many cultural institutions
for example have readily recognisable email domains that make it relatively easy to identify
contributors from those institutions and again to allocate them to the relevant sector.
There are of course many other ways of analysing the data.
Once we've analysed and allocated sectors to the people who contributed, we can then
search from our contributing people to find all the documents and ideas that they've contributed.
The links on this graph represent the authoring links, that is, the links between a person
and the contributions that they've made across the different sites.
We can safely remove the orphan nodes from this graph. These represent people who for
instance have registered for the live event but have not attended, or at least have not
contributed to any of the digital sources.
The remaining set of people represent everyone who made a contribution to the various digital
sites from which data was contributed during the Public Sphere consultation.
Defining this set of users allows us to drill down on only the active contributing users
when we start exploring contributions across the different sectors.
We begin by histograming the properties of all users who contributed and we can see the
sector affiliations that we've previously associated with people.
We can drill down on any specific sector, for example, cultural institutions, and then
look at the documents that were contributed by people representing this sector.
We can see more than 1500 contributions were made from the cultural institutions sector,
and can timeline these across the Public Sphere consultation period to explore temporal patterns
in the data.
The most obvious peak is that of around 900 Tweets during the live event on October 6th,
and this pattern correlates with the larger pattern of all contributions across all sectors.
We can repeat this process for the other digital culture sectors. For example here we're showing
it for the digital arts sector, and again we can timeline the results and see there
is a similar distribution of contributions across the period with a peak of Tweets during
the live event.
Our final analysis will explore the Tweets contributed during the consultation period,
and focus in on those during the live event.
We'll begin by exploring the proportion of retweets to novel tweets, and we can use a
filter in the workspace to achieve this. We can organise the results and see the relative
proportion of tweets verses retweets. Across the consultation period 43% of Tweets were
retweets.
Next using our timeline helper, we can provide a temporal filter that allows us to focus
in on just the tweets from the period during the live event on Thursday the 6th of October.
From here we can search around to retrieve the linked entities, that is the people who
represent the authors of these Tweets. This will allow us to discover who was Tweeting
during the live event.
Of the 556 people who Tweeted with the #publicsphere hashtag, 65% did so during the live event.
When we lay out the graph, we can see some dense clusters representing people who've
Tweeted many times.
Our histogram helper can help us identify the top three tweeters during the live event.
In this case, Kathryn Greenhill, Fee Plumley and Pia Waugh.
Thanks for watching this presentation about how Palantir was used to analyse contributions
to the Digital Culture Public Sphere consultation.