Hi,I reviewed the project details and understand your requirement (each process flow) to create a Data Pipeline for Data Lake. I acn develop the pipeline from End to End Process in less than and cosst-effective way. Let's connect and discussion more on it.
Short Into about Me:
Currently working in one of the Big4 Companies and solving Big Data Business problem using AWS Cloud and Pyspark Engine.
I have expertise on the following Tools/Technologies:
- AWS Cloud Services (DynamoDB, S3, Glue Lambda, EC2, Athena, Step function, EC2, ECS, secret manager (kms), s3, sns, ses, sqssqs, cloudwatch etc)
- Apache Spark (PySpark)
- Python3, unit test script, Numpy, Pandas
- Creation of cloud formation templates for various AWS Services
- CI/CD Pipelines
- SQL/MySQL
- Git, GitHub, BitBucket
- Big Data Pipelines
- IBM DB2
- MS Excel, MS Office
- PyCharm, VSCode, Databricks, Jupyter Notebook, MS Excel, Putty
Please provide more details if it's more than what you have given in the project details. Thank you!
Thanks & Regards,
Prakash