Google Data Engineer
A Professional Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems.
The Data Engineer also analyzes data to gain insight into business outcomes, builds statistical models to support decision-making, and creates machine learning models to automate and simplify key business processes.
Duration:30-35hrs
Course Content:
Section 1: Designing data processing systems
1.1 Designing flexible data representations.
Considerations include:
·
future
advances in data technology
·
changes
to business requirements
·
awareness
of current state and how to migrate the design to a future state
·
data
modeling
·
tradeoffs
·
distributed
systems
·
schema
design
1.2 Designing data pipelines. Considerations
include:
·
future
advances in data technology
·
changes
to business requirements
·
awareness
of current state and how to migrate the design to a future state
·
data
modeling
·
tradeoffs
·
system
availability
·
distributed
systems
·
schema
design
·
common
sources of error (eg. removing selection bias)
1.3 Designing data processing infrastructure.
Considerations include:
·
future
advances in data technology
·
changes
to business requirements
·
awareness
of current state, how to migrate the design to the future state
·
data
modeling
·
tradeoffs
·
system
availability
·
distributed
systems
·
schema
design
·
capacity
planning
·
different
types of architectures: message brokers, message queues, middleware,
service-oriented
Section 2: Building and maintaining data structures
and databases
2.1 Building and maintaining flexible data
representations
2.2 Building and maintaining pipelines.
Considerations include:
·
data
cleansing
·
batch and
streaming
·
transformation
·
acquire
and import data
·
testing
and quality control
·
connecting
to new data sources
2.3 Building and maintaining processing
infrastructure. Considerations include:
·
provisioning
resources
·
monitoring
pipelines
·
adjusting
pipelines
·
testing
and quality control
Section 3: Analyzing data and enabling machine
learning
3.1 Analyzing data. Considerations include:
·
data
collection and labeling
·
data
visualization
·
dimensionality
reduction
·
data
cleaning/normalization
·
defining
success metrics
3.2 Machine learning. Considerations include:
·
feature
selection/engineering
·
algorithm
selection
·
debugging
a model
3.3 Machine learning model deployment.
Considerations include:
·
performance/cost
optimization
·
online/dynamic
learning
Section 4: Modeling business processes for analysis
and optimization
4.1 Mapping business requirements to data
representations. Considerations include:
·
working
with business users
·
gathering
business requirements
4.2 Optimizing data representations, data
infrastructure performance and cost. Considerations include:
·
resizing
and scaling resources
·
data
cleansing, distributed systems
·
high
performance algorithms
·
common
sources of error (eg. removing selection bias)
Section 5: Ensuring reliability
5.1 Performing quality control. Considerations include:
·
verification
·
building
and running test suites
·
pipeline
monitoring
5.2 Assessing, troubleshooting, and improving data
representations and data processing infrastructure.
5.3 Recovering data. Considerations include:
·
planning
(e.g. fault-tolerance)
·
executing
(e.g., rerunning failed jobs, performing retrospective re-analysis)
·
stress
testing data recovery plans and processes
Section 6: Visualizing data and advocating policy
6.1 Building (or selecting) data visualization and
reporting tools. Considerations include:
·
automation
·
decision
support
·
data
summarization, (e.g, translation up the chain, fidelity, trackability,
integrity)
6.2 Advocating policies and publishing data and
reports.
Section 7: Designing for security and compliance
7.1 Designing secure data infrastructure and
processes. Considerations include:
·
Identity
and Access Management (IAM)
·
data
security
·
penetration
testing
·
Separation
of Duties (SoD)
·
security
control
7.2 Designing for legal compliance. Considerations
include:
·
legislation
(e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s
Online Privacy Protection Act (COPPA), etc.)
·
audits