USA:+1-703-445-4802
India:+91-8143111555 / +91-8790216888
Whats app: +91-8143110555

Google Data Engineer training in Hyderabad India

Google Data Engineer

A Professional Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems.



The Data Engineer also analyzes data to gain insight into business outcomes, builds statistical models to support decision-making, and creates machine learning models to automate and simplify key business processes.

Duration:30-35hrs

Course Content:

Section 1: Designing data processing systems
1.1 Designing flexible data representations. Considerations include:
·         future advances in data technology
·         changes to business requirements
·         awareness of current state and how to migrate the design to a future state
·         data modeling
·         tradeoffs
·         distributed systems
·         schema design
1.2 Designing data pipelines. Considerations include:
·         future advances in data technology
·         changes to business requirements
·         awareness of current state and how to migrate the design to a future state
·         data modeling
·         tradeoffs
·         system availability
·         distributed systems
·         schema design
·         common sources of error (eg. removing selection bias)
1.3 Designing data processing infrastructure. Considerations include:
·         future advances in data technology
·         changes to business requirements
·         awareness of current state, how to migrate the design to the future state
·         data modeling
·         tradeoffs
·         system availability
·         distributed systems
·         schema design
·         capacity planning
·         different types of architectures: message brokers, message queues, middleware, service-oriented
Section 2: Building and maintaining data structures and databases
2.1 Building and maintaining flexible data representations
2.2 Building and maintaining pipelines. Considerations include:
·         data cleansing
·         batch and streaming
·         transformation
·         acquire and import data
·         testing and quality control
·         connecting to new data sources
2.3 Building and maintaining processing infrastructure. Considerations include:
·         provisioning resources
·         monitoring pipelines
·         adjusting pipelines
·         testing and quality control
Section 3: Analyzing data and enabling machine learning
3.1 Analyzing data. Considerations include:
·         data collection and labeling
·         data visualization
·         dimensionality reduction
·         data cleaning/normalization
·         defining success metrics
3.2 Machine learning. Considerations include:
·         feature selection/engineering
·         algorithm selection
·         debugging a model
3.3 Machine learning model deployment. Considerations include:
·         performance/cost optimization
·         online/dynamic learning
Section 4: Modeling business processes for analysis and optimization
4.1 Mapping business requirements to data representations. Considerations include:
·         working with business users
·         gathering business requirements
4.2 Optimizing data representations, data infrastructure performance and cost. Considerations include:
·         resizing and scaling resources
·         data cleansing, distributed systems
·         high performance algorithms
·         common sources of error (eg. removing selection bias)
Section 5: Ensuring reliability
5.1 Performing quality control. Considerations include:
·         verification
·         building and running test suites
·         pipeline monitoring
5.2 Assessing, troubleshooting, and improving data representations and data processing infrastructure.
5.3 Recovering data. Considerations include:
·         planning (e.g. fault-tolerance)
·         executing (e.g., rerunning failed jobs, performing retrospective re-analysis)
·         stress testing data recovery plans and processes
Section 6: Visualizing data and advocating policy
6.1 Building (or selecting) data visualization and reporting tools. Considerations include:
·         automation
·         decision support
·         data summarization, (e.g, translation up the chain, fidelity, trackability, integrity)
6.2 Advocating policies and publishing data and reports.
Section 7: Designing for security and compliance
7.1 Designing secure data infrastructure and processes. Considerations include:
·         Identity and Access Management (IAM)
·         data security
·         penetration testing
·         Separation of Duties (SoD)
·         security control
7.2 Designing for legal compliance. Considerations include:
·         legislation (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), etc.)
·         audits