Complex Flat File Stage Datastage Example Programs
Dynamic Relational Stages (DRS):. Read data from any DataStage stage.
The Complex Flat File (CFF) stage is a file stage. You can use the stage to read a file or write to a file, but You can use the stage to read a file or write to a file, but you cannot use the same stage to do both. The Data Set stage is a file stage. It allows you to read data from or write data to a data set. The stage can have a single input link or a single output link. It can be configured to execute in parallel or sequential mode.
4) Read Chapter 10 of the Parallel Job Developer Guide, 'Complex Flat File Stage'. 5) Have patience and take your time understanding your data. Regards, _____ - james wiles All generalizations are false, including this one - Mark Twain.
Read data from any supported relational database. Write to any DataStage stage. Write to any supported relational database. PeopleSoft-delivered ETL jobs use the DRS stage for all database sources or targets. This is represented in the Database group as 'Dynamic RDBMS.' When you create jobs, it is advisable to use the DRS stage rather than a specific type such as DB2 because a DRS will dynamically handle all of PeopleSoft supported database platforms. The following example shows a DRS database stage in a delivered Campus Solutions Warehouse job.
Image: DRS Stage Output Window This example illustrates the DRS Stage Output Window. In this example, the table name listed is the source of the data that this stage uses. The Columns window shown below enables you to select which columns of data you want to pass through to the next stage. When you click the Load button, the system queries the source table and populates the grid with all the column names and properties. You can then delete rows that are not needed. The following example shows the Columns window. Processing Stage Description Transformer Transformer stages perform transformations and conversions on extracted data.
Aggregator Aggregator stages group data from a single input link and perform aggregation functions such as COUNT, SUM, AVERAGE, FIRST, LAST, MIN, and MAX. FTP FTP Stages transfer files to other machines. Link Collector Link Collectors collect partitioned data and pieces them together. Interprocess An InterProcess (IPC) stage is a passive stage which provides a communication channel between WebSphere DataStage processes running simultaneously in the same job. It allows you to design jobs that run on SMP systems with great performance benefits. Pivot Pivot, an active stage, maps sets of columns in an input table to a single column in an output table. Sort Sort Stages allow you to perform Sort operations.
Transformer Stages Transformer stages enable you to:. Add, delete, or move columns.
Apply expressions to data. Use lookups to validate data. Filter data using constraints. Edit column metadata and derivations.
Define local stage variables, and before-stage and after-stage subroutines. Specify the order in which the links are processed.
Pass data on to either another transformer stage, or to a target stage. The following is an example of a delivered Transformer Stage (TransAssignValues Stage). Creating Transformer Stages You create a transformer stage by opening the Processing group in the palette, selecting the Transformer stage, and clicking in the Diagram window. After creating links to connect the transformer to a minimum of two other stages (the input and output stages), double-click the Transformer icon to open the Transformer window. In the example above, two boxes are shown in the upper area of the window representing two links. Transformer stages can have any number of links with a minimum of two. Hence, there could be any number of boxes in the upper area of the window.
Complex Flat File Stage
Labeling your links appropriately makes it easier for you to work in the Transformer Stage window. The lines that connect the links define how the data flows between them. When you first create a new transformer, you link it to other stages, and then open it for editing. There will not be any lines connecting the Link boxes. These connections can be created manually by clicking and dragging from a particular column of one link to a column in another link, or by selecting the Column Auto-Match button on the toolbar.
Transformer Toolbar Button Usage Stage Properties Define stage inputs and outputs when you link the transformer with other stages. Specify before-stage and after-stage subroutines (optional).
Define stage variables. Define order in which input and output links are processed if there is more than one input or output link. Constraints Enter a condition that filters incoming data, allowing only the rows that meet the constraint criteria to flow to the next stage.
Show All or Selected Relations If you have more than two links in the transformer, you can select one link and click this button to hide all connection lines except for those on the selected link. With only two links present, clicking this button hides or displays all connections. Show/Hide Stage Variables Show or hide a box that displays local stage variables that can be assigned values in expressions, or be used in expressions. Cut, Copy, Paste, Find/Replace These are standard Windows buttons. Load Column Definition Load a table definition from the repository, or import a new one from a database. Save Column Definition Save a column definition in the repository so that it can be used in other stages and jobs.
Column Auto-Match Automatically sets columns on an output link to be derived from matching columns on an input link. You can then go back and edit individual output link columns where you want a different derivation. Input Link Execution Order Order the reference links. The primary data link is always processed first. Output Link Execution Order Order all output links.
Datastage parallel stages groups DataStage and QualityStage stages are grouped into the following logical sections:. General objects. Data Quality Stages. Database connectors. Development and Debug stages. File stages.
Processing stages. Real Time stages. Restructure Stages. Sequence activities Please refer to the list below for a description of the stages used in DataStage and QualityStage. We classified all stages in order of importancy and frequency of use in real-life deployments (and also on certification exams). Also, the most widely used stages are marked bold or there is a link to a subpage available with a detailed description with examples.
DataStage and QualityStage parallel stages and activities. General elements. Link indicates a flow of the data.
There are three main types of links in Datastage: stream, reference and lookup. Container (can be private or shared) - the main outcome of having containers is to simplify visually a complex datastage job design and keep the design easy to understand. Annotation is used for adding floating datastage job notes and descriptions on a job canvas. Annotations provide a great way to document the ETL process and help understand what a given job does. Description Annotation shows the contents of a job description field.
One description annotation is allowed in a datastage job. Debug and development stages. Row generator produces a set of test data which fits the specified metadata (can be random or cycled through a specified list of values). Useful for testing and development.
Click for more. Column generator adds one or more column to the incoming flow and generates test data for this column.
Peek stage prints record column values to the job log which can be viewed in Director. It can have a single input link and multiple output links.Click for more. Sample stage samples an input data set.
Operates in two modes: percent mode and period mode. Head selects the first N rows from each partition of an input data set and copies them to an output data set. Tail is similiar to the Head stage. It select the last N rows from each partition. Write Range Map writes a data set in a form usable by the range partitioning method. Processing stages.
Aggregator joins data vertically by grouping incoming data stream and calculating summaries (sum, count, min, max, variance, etc.) for each group. The data can be grouped using two methods: hash table or pre-sort. Click for more. Copy - copies input data (a single stream) to one or more output data flows. FTP stage uses FTP protocol to transfer data to a remote machine. Filter filters out records that do not meet specified requirements.Click for more.
Funnel combines mulitple streams into one. Click for more. Join combines two or more inputs according to values of a key column(s).
Similiar concept to relational DBMS SQL join (ability to perform inner, left, right and full outer joins). Can have 1 left and multiple right inputs (all need to be sorted) and produces single output stream (no reject link). Click for more. Lookup combines two or more inputs according to values of a key column(s). Lookup stage can have 1 source and multiple lookup tables. Records don't need to be sorted and produces single output stream and a reject link. Click for more.
Merge combines one master input with multiple update inputs according to values of a key column(s). All inputs need to be sorted and unmatched secondary entries can be captured in multiple reject links. Click for more.
Modify stage alters the record schema of its input dataset. Useful for renaming columns, non-default data type conversions and null handling. Remove duplicates stage needs a single sorted data set as input. It removes all duplicate records according to a specification and writes to a single output.
Slowly Changing Dimension automates the process of updating dimension tables, where the data changes in time. It supports SCD type 1 and SCD type 2.Click for more. Sort sorts input columns.Click for more.
Transformer stage handles extracted data, performs data validation, conversions and lookups.Click for more. Change Capture - captures before and after state of two input data sets and outputs a single data set whose records represent the changes made. Change Apply - applies the change operations to a before data set to compute an after data set. It gets data from a Change Capture stage. Difference stage performs a record-by-record comparison of two input data sets and outputs a single data set whose records represent the difference between them. Similiar to Change Capture stage. Checksum - generates checksum from the specified columns in a row and adds it to the stream.
Used to determine if there are differencies between records. Compare performs a column-by-column comparison of records in two presorted input data sets. It can have two input links and one output link.
Encode encodes data with an encoding command, such as gzip. Decode decodes a data set previously encoded with the Encode Stage. External Filter permits speicifying an operating system command that acts as a filter on the processed data. Generic stage allows users to call an OSH operator from within DataStage stage with options as required.
Pivot Enterprise is used for horizontal pivoting. It maps multiple columns in an input row to a single column in multiple output rows. Pivoting data results in obtaining a dataset with fewer number of columns but more rows.
Surrogate Key Generator generates surrogate key for a column and manages the key source. Switch stage assigns each input row to an output link based on the value of a selector field. Provides a similiar concept to the switch statement in most programming languages.
Compress - packs a data set using a GZIP utility (or compress command on LINUX/UNIX). Expand extracts a previously compressed data set back into raw binary data.
----------------------------------------------------------------------------------------------------------------------------------------------------- ISI rFactor 2 - Confirmed Shots From Gjon Camaj Posted on: 11:22 PM After seeing several post on virtualr.net, regarding rFactor2 screenshots, weather they were real or Fake, I decided it was time to have a chat with my good friend Gjon Camaj. For those of you really wanting to know, Yes Gjon has confirmed that these shots are indeed for rFactor 2. He has also sent me a different angled vehicle shot seen below. Rfactor f1 2012.
File stage types. Sequential file is used to read data from or write data to one or more flat (sequential) files.Click for more. Data Set stage allows users to read data from or write data to a dataset. Datasets are operating system files, each of which has a control file (.ds extension by default) and one or more data files (unreadable by other applications).
Click for more info. File Set stage allows users to read data from or write data to a fileset. Filesets are operating system files, each of which has a control file (.fs extension) and data files. Unlike datasets, filesets preserve formatting and are readable by other applications.
Complex flat file allows reading from complex file structures on a mainframe machine, such as MVS data sets, header and trailer structured files, files that contain multiple record types, QSAM and VSAM files.Click for more info. External Source - permits reading data that is output from multiple source programs. External Target - permits writing data to one or more programs. Lookup File Set is similiar to FileSet stage. It is a partitioned hashed file which can be used for lookups.
Database stages. Oracle Enterprise allows reading data from and writing data to an Oracle database (database version from 9.x to 10g are supported). ODBC Enterprise permits reading data from and writing data to a database defined as an ODBC source. In most cases it is used for processing data from or to Microsoft Access databases and Microsoft Excel spreadsheets.
DB2/UDB Enterprise permits reading data from and writing data to a DB2 database. Teradata permits reading data from and writing data to a Teradata data warehouse. Three Teradata stages are available: Teradata connector, Teradata Enterprise and Teradata Multiload.
SQLServer Enterprise permits reading data from and writing data to Microsoft SQLl Server 2005 amd 2008 database. Sybase permits reading data from and writing data to Sybase databases.
Stored procedure stage supports Oracle, DB2, Sybase, Teradata and Microsoft SQL Server. The Stored Procedure stage can be used as a source (returns a rowset), as a target (pass a row to a stored procedure to write) or a transform (to invoke procedure processing within the database). MS OLEDB helps retrieve information from any type of information repository, such as a relational source, an ISAM file, a personal database, or a spreadsheet.
Dynamic Relational Stage (Dynamic DBMS, DRS stage) is used for reading from or writing to a number of different supported relational DB engines using native interfaces, such as Oracle, Microsoft SQL Server, DB2, Informix and Sybase. Informix (CLI or Load).
DB2 UDB (API or Load). Classic federation. RedBrick Load.
Netezza Enterpise. iWay Enterprise. Real Time stages.
XML Input stage makes it possible to transform hierarchical XML data to flat relational data sets. XML Output writes tabular data (relational tables, sequential files or any datastage data streams) to XML structures.
XML Transformer converts XML documents using an XSLT stylesheet. Websphere MQ stages provide a collection of connectivity options to access IBM WebSphere MQ enterprise messaging systems. There are two MQ stage types available in DataStage and QualityStage: WebSphere MQ connector and WebSphere MQ plug-in stage. Web services client. Web services transformer. Java client stage can be used as a source stage, as a target and as a lookup.
The java package consists of three public classes: com.ascentialsoftware.jds.Column, com.ascentialsoftware.jds.Row, com.ascentialsoftware.jds.Stage. Java transformer stage supports three links: input, output and reject. WISD Input - Information Services Input stage. WISD Output - Information Services Output stage. Restructure stages. Column export stage exports data from a number of columns of different data types into a single column of data type ustring, string, or binary. It can have one input link, one output link and a rejects link.
Click for more. Column import complementary to the Column Export stage. Typically used to divide data arriving in a single column into multiple columns. Combine records stage combines rows which have identical keys, into vectors of subrecords. Make subrecord combines specified input vectors into a vector of subrecords whose columns have the same names and data types as the original vectors. Make vector joins specified input columns into a vector of columns. Promote subrecord - promotes input subrecord columns to top-level columns.
Split subrecord - separates an input subrecord field into a set of top-level vector columns. Split vector promotes the elements of a fixed-length vector to a set of top-level columns.
Data quality QualityStage stages. Investigate stage analyzes data content of specified columns of each record from the source file.
Provides character and word investigation methods. Match frequency stage takes input from a file, database or processing stages and generates a frequence distribution report. MNS - multinational address standarization. QualityStage Legacy. Reference Match.
Standarize. Survive. Unduplicate Match. WAVES - worldwide address verification and enhancement system.
Sequence activity stage types. Job Activity specifies a Datastage server or parallel job to execute. Notification Activity - used for sending emails to user defined recipients from within Datastage. Sequencer used for synchronization of a control flow of multiple activities in a job sequence. Terminator Activity permits shutting down the whole sequence once a certain situation occurs.
Wait for file Activity - waits for a specific file to appear or disappear and launches the processing. EndLoop Activity. Exception Handler.
Execute Command. Nested Condition. Routine Activity.
StartLoop Activity. UserVariables Activity.