Talend Big Data Training
Duration: 30hrs
Course Content:
- Introduction to Talend big data
- Why Talend?
- Talend Editions and Features
- Talend Data Integration Overview
- Talend Environment
- Repository and Pallate
- Talend Design and Views
- Start Talend Open Studio for Data Integration
- Create a Talend project to contain tasks
- Create a Talend Job to perform a specific task
- Add and configure components to handle data input, data transformation, and data output
- Run a Talend Job and examine the results
- Process different types of files using Talend
- Connect to a database from a Talend Job
- Use a component to create a database table
- Write to and read from a database table from a Talend Job
- Write data to an XML file from a Talend Job
- Write an XML document to a file
- Use components to create an archive and delete files
- Assignment
- Store configuration information centrally for use in multiple components
- Execute Job sections conditionally
- Create a schema for use in multiple components
- Create variables for component configuration parameters
- Run a Job to access specific values for the variables
- Troubleshoot a join by examining failed lookups
- Use components to filter data
- Generate sample data rows
- Duplicate output flows
- Perform aggregate calculations on rows
- Extend data from one source with data extracted from a second source
- Assignment
- Log data rows in the console rather than storing them
- Employ mechanisms to kill a Job under specific circumstances
- Include Job elements that change the behavior based on the success or failure of individual components or subjobs
- Build a visual model of a Talend Job or project
- Copy an existing Job as the basis for a new Job
- Add comments to document a Job and its components
- Generate HTML documentation for a Job
- Export a Job
- Run an exported Job independently of Talend Open Studio
- Create a new version of an existing Job
- Assignment
- Environment – Overview
- Repository and Pallate
- Design and Views
- Connect to a Hadoop cluster from a Talend Job
- Store a raw Web log file to HDFS
- Write text data files to HDFS
- Read text files from HDFS
- Read data from a SQL database and write it to HDFS
- List a folder’s contents and operate on each file separately (Iteration)
- Move, copy, append, delete, and rename HDFS files
- Read selected file attributes from HDFS files
- Conditionally operate on HDFS files
- Develop and run MapReduce jobs
- Convert a standard job into a MapReduce job
- Create Metadata for your Hadoop cluster connection
- Configure context variables
- Retrieve the schema of a file using Talend Wizard
- Send data to Hadoop HDFS
- Load multiple files into HDFS
- Sort and aggregate data using MapReduce components
- Filter data using MapReduce components
- Develop and run Pig Jobs using Talend components
- Sort, join, and aggregate data using Pig components
- Filter data in multiple ways using Pig components
- Replicate Pig data streams
- Small Project / Case study
- Miscellaneous topics
- Run Talend Jobs with the Apache Oozie Job Manager
- Check data with Data Viewer
- Read and write HBase tables
- Write data to a HTML file
- Talend Data Quality and MDM Overview
- Performance tuning techniques
- best practices
- Coding guidelines in
- Small project – Case study