IBM InfoSphere DataStage Skill Builder Part 1: How to build and run a DataStage parallel job

Welcome to this demonstration of IBM InfoSphere DataStage, that is part of the IBM InfoSphere Information Server platform. In this series of videos, I'm going to show you how to build and run a DataStage parallel job. Here you see the job we are going to build shown in the DataStage Designer client. It is an example of what is often called an ETL job, that is an extraction transformation load job. With DataStage, it is easy to build ETL jobs. And, DataStage utilizes parallel technologies to enable these jobs to process huge amounts of data with amazing speed. DataStage jobs have built-in components called "stages" for extracting data from and loading data to many different types of data resources including files, database tables, and enterprise application data resources. It also has built-in stages for transforming data including stages for joining, sorting, aggregating data and for implementing business logic. Now let's take a look at at the job we're going to build. It's a simple job but it will give you a good idea of how to build many DataStage jobs. Most DataStage jobs are just more complex variations containing more stages and links and using other types of stages . This job has three stages connected by links. The links are like pipes through which data flows from one stage to another. The first stage here is called the DB2 connector stage. It is used to extract data from DB2 tables. We will use it in our job to extract data from a DB2 table named EMPLOYEE that sits in the database named SAMPLE. The data extracted from the EMPLOYEE table then flows to the transformer stage. This stage is used to implement business transformations and a specified data constraint. In this example, we will use the transformer stage to transform values going into a column called "employee name". We will also implement a constraint that selects just employees who are not managers. The transformed data then close to the sequential file stage which is used to write the data to a sequential file. You'll also see in this job several annotation stages. These are used to help document the job. The GUI design along with these annotation stages provides a clear picture of the job design and specification. After we build the job, we will then execute it using the DataStage high-performance parallel engine. So let's get started. The first thing we need to do is to open an empty canvas for the job where we will lay out the stages and links. I'm going to close down this job and then open a new canvas . file new parallel job And i'm gonna save this job in a folder called "DS essentials" which is going to store all of our objects. I'll call the job "employee info" The first thing we need to do is to lay out the job's stages and links. We'll drag the stages from the palate in the lower left hand corner over to the canvas. The DB2 connector stage will drag from the database folder over to the canvas. The transformer stage will drag from the processing folder over to the canvas. And the sequential file stage will drag from the file folder over to the canvas. Next, we'll draw links between the se stages to represent the flow of data from one to the other. We'll start with the DB2 connector stage ... right mouse click and then just move the mouse cursor over to the target ... that's how we draw the link. You see the arrow here indicating the flow of data flows from the DB2 connector to the transformer. And we'll do the same here. The next thing we'll do is to rename the links in stages from their default names, as you see here, to more meaningful names. The DB2 connector stage is used to extract data from the employee table so we'll call this stage "employee". Just click on the stage and then start typing to change the name. And I'm gonna copy this. And I'm also going to call this link ... cause data is flowing through this link ... is employee data ... I'll name the link the same thing "employee".