Tip:
Highlight text to annotate it
X
The ultimate goal of the TwiNL project is to offer language researchers a resource that they have always dreamed of.
We archive Twitter.This archive is a collection of language that can be traced to the individual language user.
We can also infer from the data how old they are, which gender they have, in what location they were when they tweeted.
And this is crucial data to answer crucial theoretical questions.
There is also a TwiNL project website where everybody can look up statistics about Dutch tweets.
We could do this before, but what is new about this, is that since we have an archive you can look up tweets
also from a longer time period.
An advantage of this for researchers is that we can now build models which are more fine-grained.
So instead of collecting information about the Netherlands we can zoom in to smaller locations like a municipality.
The average person on Twitter is young.
So sociolinguists, the people who study the use of language in groups and the formation of languages within groups,
are very interested in this new archive.
They have only been able to do small studies, on the street, talking to people.
And this is just sitting in your chair and searching in this big archive.
eScience has two roles in the TwiNL project.
The first is finding algorithms for searching through large quantities of tweets,
which you do with eScience algorithms on a parallel machine.
The second is to find visualization techniques which are very well equipped to show large volumes of data to users.
An interesting example is following the weather.
In January 2013 Holland was hit by a snowstorm.
So what we did was, we searched on Twitter for the Dutch word for snow (sneeuw), and we checked where it was mentioned.
Another interesting application would be to follow diseases like the flu or hay fever.
What you could do, is check where in the Netherlands people use these words,
and then you could see exactly where the disease starts, and you could also follow it in a time path,
or location path to see how it develops.
A concrete example of using the TwiNL approach is predicting events.
Predicting events that are not planned, not known beforehand.
But clearly, a group of people, like a group of hooligans, is planning, by sending tweets to each other,
to cause some problems somewhere.
They will not be explicit, they will make indirect references: Who is going to drive? Who is going to pick up?
And it is this type of various implicit language clues that we try to extract from these massive streams of Twitter.
In the end we can use all of these clues to make fairly exact predictions of when the events is going to happen.
The TwiNL project was proposed by three partners.
The Netherlands eScience Center, SURFsara, and Radboud University.
It already inspired researchers because we solved a Big Data problem for them.
This has inspired research in linguistics.
For example, people that want to know the spread of dialect words. Where in particular dialect words are used,
in the Netherlands and in Belgium.
eScience, in TwiNL for example, is crucial for bringing our science ahead.
Because we are faced with lots of digital data; a lot of digital data is coming our way. Language data.
And we need to handle those large amounts with supercomputing.
The next step is to analyse these data with algorithms that do natural language processing.
And we need supercomputers for that as well.
These big streams of social media data and other textual archives are coming out of our ears, so to speak.
So we need big computational power and analysis, and storage, and memory. We need it all.