python-data-engineering
Comprehensive Python data engineering patterns for AWS Data Lake, including PySpark, Pandas, Apache Airflow, AWS Glue, ETL pipelines, data quality, schema management, performance optimization, FastAPI services, streaming with Kafka/Kinesis, data validation with Great Expectations, testing strategies, error handling, logging, and production deployment on AWS EMR and Glue.
Installation and usage
Comprehensive Python data engineering patterns for AWS Data Lake, including PySpark, Pandas, Apache Airflow, AWS Glue, ETL pipelines, data quality, schema management, performance optimization, FastAPI services, streaming with Kafka/Kinesis, data validation with Great Expectations, testing strategies, error handling, logging, and production deployment on AWS EMR and Glue.
После установки вы можете использовать этот skill, выполнив следующую команду в терминале:
skills use python-data-engineering