Analysing The Digital Culture Public Sphere with Palantir

In August 2011, Australian Minister for the Arts, Simon Crean, issued a discussion paper seeking public input to the National Cultural Policy. This policy will present a 10 year strategy vision for the arts, cultural and creative endeavours in Australia. The Office of Senator Kate Lundy, in collaboration with the Office of Minister Crean, ran a Digital Culture Public Sphere consultation to gather public input to this policy, and used Palantir to integrate and analyse the contributions from across a range of media. The aim was to illicit visions for success from relevant sectors on the digital culture landscape and also to seek specific ideas for how to reach these goals. Using Palantir, we integrated contributions that people made to a range of sites including Twitter, Facebook, Youtube, the Public Sphere wiki, and IdeaScale, a site for posting and voting on ideas. Let's take a look at how we import this data into the Palantir platform. Here we've got a spreadsheet containing information about the people who registered to attend the live Public Sphere event. We simply drag and drop this onto the Palantir workspace to begin importing this data. Next, we map the spreadsheet columns to properties and objects in the dynamic ontology, which represents the conceptual model that's been defined for the Public Sphere event. In this case we're mapping properties like names, emails addresses and digital culture sector affiliations to people objects so we can model the contributions from different sectors and different people across media types. The result of this process is a collection of people objects that represent the people who registered to attend the live event. These people objects have properties associated with them. The email addresses, names and digital culture sector affiliation we just saw. Having seen one way we can load data, let's take a look at an overview of all the data that was contributed during the Public Sphere consultation. The object explorer application gives us a top down view of the almost 6000 contributions across the different media types. Within the object explorer, the timeline helper gives us a temporal look at this data. We can also explore it and slice it by contributions to specific media sites. We can see for example small numbers of contributions to YouTube and Facebook, a larger number of blog comments and wiki edits, and a peak of IdeaScale votes towards the end of the consultation period once ideas had come to fruition. The largest peak on the graph represents more than 2000 tweets that were registered during the live event on October 6th. One of the challenges when gathering contributions from different sources is to identify those contributions that were actually from the same person, even when that person has used different user accounts with different names across these different systems. Palantir provides entity resolution tools that can be used to help with this process. These rule based approaches can be used to merge user accounts which share an email address or a name, or any other properties, into a single underlying person entity. Further manual resoltuion can also be done to allow tacit knowledge and human intuition to be used to further resolve different user identities that represent the same undelying person. Now that we've resolved our user accounts to the underlying people, we can begin to explore the contributions from each sector of the digital culture landscape. These sectors are defined as digital arts, film and animation, media and music, games development and cultural institutions and big picture contributions that cut across these sectors. One way to approach this sector analysis is to begin with the people entities, that is, the people who have made contributions. The first step is to map each person into the sector or sectors that they represent. 18% of contributors attended the live event and explicitly provided this information at registration. For everyone else we must analyse the available information to determine which sectors they represent. We can do this a number of ways. One approach is to use the text cloud help that shows frequent terms that occur in the profile information that's associated with people who contributed. We can use this information to identify and group users. For example, those who mentioned galleries, library collections, museums or archives in their profile are likely to represent the cultural institution sector. This grouping of matching users can then be manually checked to make sure that the sector information that is assigned to each person is of high quality. Another approach is to look at the domains associated with people's email addresse. Many cultural institutions for example have readily recognisable email domains that make it relatively easy to identify contributors from those institutions and again to allocate them to the relevant sector. There are of course many other ways of analysing the data. Once we've analysed and allocated sectors to the people who contributed, we can then search from our contributing people to find all the documents and ideas that they've contributed. The links on this graph represent the authoring links, that is, the links between a person and the contributions that they've made across the different sites. We can safely remove the orphan nodes from this graph. These represent people who for instance have registered for the live event but have not attended, or at least have not contributed to any of the digital sources. The remaining set of people represent everyone who made a contribution to the various digital sites from which data was contributed during the Public Sphere consultation. Defining this set of users allows us to drill down on only the active contributing users when we start exploring contributions across the different sectors. We begin by histograming the properties of all users who contributed and we can see the sector affiliations that we've previously associated with people. We can drill down on any specific sector, for example, cultural institutions, and then look at the documents that were contributed by people representing this sector. We can see more than 1500 contributions were made from the cultural institutions sector, and can timeline these across the Public Sphere consultation period to explore temporal patterns in the data. The most obvious peak is that of around 900 Tweets during the live event on October 6th, and this pattern correlates with the larger pattern of all contributions across all sectors. We can repeat this process for the other digital culture sectors. For example here we're showing it for the digital arts sector, and again we can timeline the results and see there is a similar distribution of contributions across the period with a peak of Tweets during the live event. Our final analysis will explore the Tweets contributed during the consultation period, and focus in on those during the live event. We'll begin by exploring the proportion of retweets to novel tweets, and we can use a filter in the workspace to achieve this. We can organise the results and see the relative proportion of tweets verses retweets. Across the consultation period 43% of Tweets were retweets. Next using our timeline helper, we can provide a temporal filter that allows us to focus in on just the tweets from the period during the live event on Thursday the 6th of October. From here we can search around to retrieve the linked entities, that is the people who represent the authors of these Tweets. This will allow us to discover who was Tweeting during the live event. Of the 556 people who Tweeted with the #publicsphere hashtag, 65% did so during the live event. When we lay out the graph, we can see some dense clusters representing people who've Tweeted many times. Our histogram helper can help us identify the top three tweeters during the live event. In this case, Kathryn Greenhill, Fee Plumley and Pia Waugh. Thanks for watching this presentation about how Palantir was used to analyse contributions to the Digital Culture Public Sphere consultation.