Tip:
Highlight text to annotate it
X
This video shall demonstrate the interaction with the Prolix system using two example
predictions for the opening weekend gross.
please refer to the textual submission for details about the individual views
of the system.
The first example is "Now You See Me". We select the prediction candidate movie
from a list of soon-to-be-released movies,
or by entering its name an autocomplete box. As soon as the movie is recognized
the other view load and present their appropriate data. In the view for
selecting related movies,
we use the meta data of IMDb to connect the movie to related ones.
Using the plot keywords, we receive a ranked list of similar movies.
Here, we find other "heist" plots such as "Dark Knight" and "Ocean's Eleven",
and magic shows like "The Illusionist" or "The Prestige".
We complete this list by manually adding other parts of the "Ocean's" series
and the "Mission Impossible" movies. This selection now helps us to better judge
the outcome of our machine learning-based predictions. In the view for parameterizing
and evaluating the machine learning based predictors,
we use the default configuration for an initial prediction. However, the resulting
predictor has used only 19 training instances
and displaced very poor error measures. We now can use different features
that exclude fewer training instances due to missing values.
In the future selections view, one column depicts how many instances
we lose but using a certain future. We de-select all Twitter features in order
to train an additional predictor
on the large IMDb dataset, and another one
with additional de-selected frequently missing attributes
like writer. Alternatively, we can fill in missing attributes in the training set
by using a median value. This new predictors have
much better a error measures but miss the insights of the Twitter data.
We therefore train two more predictors using
a combination of Twitter features and frequent IMDb attributes
as well as solely using the Twitter futures. To help analysts in choosing a
final prediction
we included a combination predictor calculating
a weighted average of selected base predictors.
Here, analysts can
adjust the weight for each predictor, based on their contextual knowledge
and the resulting error measures.
The second example is "The Conjuring"
showcasing an interesting Twitter behavior. We already have selected the candidate
movie and some related movies for reference.
If we train with only IMDb data, "The Conjuring" is predicted extremely low.
However, the Twitter-based prediction evolution forecasts an excellent income for the Horror genre.
We try to understand this difference by inspecting the attribute value distribution
of certain prediction ranges using a vertical lens
in the actual vs. predicted scatterplot. Here, we can see that
"The Conjuring" shares a low budget
and the partially inexperienced cast with other low predicted movies.
De-selecting these features indeed increases the prediction value.
The Twitter volume of the movie has
a distinct peak on Tuesday. We first try to rule out hype artifacts by excluding retweets,
allowing only one tweet per user and day and using
only tweets classified as positive interest. The peak remains
and an examination of the related word cloud reveals a strong "want to watch"
and "scary" opinion. We restrict the set of messages to those containing the word "trailer"
and see that there was a quite successful trailer commercial on the television.
Using this knowledge, we create predictors comparing these dynamic
message counts
amongst the movies and, again, create a final combination predictor.
Due to the Twitter related insights we add more weight to the Twitter protectors
for our final score.