Tip:
Highlight text to annotate it
X
Welcome to this demonstration of IBM InfoSphere DataStage,
that is part of the IBM InfoSphere Information Server platform.
In this series of videos, I'm going to show you how to build and run a DataStage
parallel job.
Here you see the job we are going to build
shown
in the DataStage Designer client.
It is an example of what is often called an ETL job,
that is an extraction
transformation
load job.
With DataStage,
it is easy to build ETL jobs.
And, DataStage utilizes parallel technologies
to enable these jobs to process huge amounts of data with amazing speed.
DataStage jobs have built-in components called "stages"
for extracting
data from and loading data to many different types of data resources
including files,
database tables, and enterprise application data resources.
It also has built-in stages for transforming data including stages for
joining, sorting, aggregating data and for implementing business logic.
Now let's take a look at at the job we're going to build.
It's a simple job but it will give you a good idea of how to build many
DataStage jobs.
Most DataStage jobs are just more complex variations
containing more stages and links and using other types of stages .
This job has three stages
connected by links.
The links are like pipes through which data flows
from one stage to another.
The first stage here
is called the DB2 connector stage.
It is used to extract data from DB2 tables.
We will use it in our job to extract data from a DB2 table named
EMPLOYEE
that sits in the database named SAMPLE.
The data extracted from the EMPLOYEE table then flows to the transformer
stage.
This stage is used to implement business transformations
and a specified data constraint.
In this example,
we will use the transformer stage to transform values going into a column
called
"employee name".
We will also implement
a constraint that selects
just employees who are not managers.
The transformed data then close to the sequential file stage which is used to
write the data to a sequential file.
You'll also see in this job
several
annotation stages.
These are used to help document the job.
The GUI design along with these annotation stages provides a clear
picture of the job design and specification.
After we build the job, we will then execute it using the DataStage
high-performance parallel engine.
So let's get started.
The first thing we need to do is to open an empty canvas for the job
where we will lay out the stages and links.
I'm going to close down this job
and then open a new canvas .
file new
parallel job
And i'm gonna save this
job
in a folder called "DS essentials"
which is going to store all of our objects.
I'll call
the job
"employee info"
The first thing we need to do is to lay out the job's stages and links.
We'll drag the stages from the palate
in the lower left hand corner
over to the canvas.
The DB2 connector stage will drag from the database folder
over to the canvas.
The transformer stage
will drag from the processing folder
over to the canvas.
And the sequential file stage will drag from the file folder
over to the canvas.
Next, we'll draw links between the se stages
to represent the flow of data from one to the other.
We'll start with the DB2 connector stage ... right mouse click and then just
move the mouse cursor over to the target ... that's how we draw the link.
You see the arrow here indicating the flow of data flows from the
DB2 connector to the transformer.
And we'll do the same here.
The next thing we'll do is to rename the links in stages from their default names,
as you see here,
to more meaningful names.
The DB2 connector stage is used to extract data from the employee table so
we'll call
this stage "employee".
Just click on the stage
and then start typing
to change the name.
And I'm gonna copy this.
And I'm also going to call this link ... cause data is flowing
through this link ... is employee data ... I'll name the link the same thing "employee".