Tip:
Highlight text to annotate it
X
Hi there! In this demonstration, I'm going to show you
how to create a new categorical column to name groups of points in a graph.
This trick is totally cool; it allows you to stay right in the flow of your analysis
and add meaningful category labels to your data. Let me show you how...
I'm looking at a the Boston housing data, and here in the graph, notice that there is
one obvious subset of points - here where this little break is -- these points are the
high value homes with low crime. To create a new column and label these, I'll select
Rows > Row Selection > Name Selection in Column. In the dialog, I'll create a name for the
new column: Value/Crime. If I wanted to, I could just leave these fields (the zero and
the one) and use this as a quick way to create binary variables, but I'd like to name these
specifically. So, for the selected points, I'll name those High/Low -- for High value,
Low crime. For the unselected points, I'll label those Middle for now.
Next, I'll do this all again, except I want to select the Lower valued homes with the
high crime. Again, I'll select Rows > Row Selection > Name Selection in Column. The
Column name appears in the dialog automatically, so I'll enter Low/High for Low Value, High
crime. Instead of labelling those that are not selected, I'll remove the zero. This will
allow the previous labels (High/Low and Middle) to be shown rather than overwriting them with
another value. I'll click OK to create this new subset.
See, now in the data table, I have this new column with rows in these categories: high/low,
middle, and low/high, and I can use these for continued analysis and exploration.
Thanks for watching!