Automating Your Data Science Tasks In Python (importing CSV files to database AUTOMATION TUTORIAL)

24 Comments



This is part 2 on automating the import of CSV files to a database – a common data science task . In this video, we’ll take our python script from part 1, which manually imports a CSV/Excel file to a postgres database, and completely automate the script so that it can import multiple files without the need to edit the code. You’d essentially be saving hours of work.

Importing CSV files to a database is a common task needed for data science and analytics and it can be done completely with python using pandas dataframes, numpy, os library, and the posgres database wrapper psycopg2.

This video focuses on automation. If you’re interested in build a script to import files to a database, check out the first video in our series: https://youtu.be/wqBFgaMgFQA.

Topics Covered:
– Automatically identifying your CSVs and handling multiple files using for loops, bash commands, and the os library
– Automatically cleaning the table name and column headers using methods like replace() and lower() using format() and %s variables
– Automatically converting your pandas dataframe to a CSV file using for loops and dictionaries
– Automatically creating multiple database tables and importing your data to a postgres database on Amazon Web Services (AWS) in python

______________________________________________________________________
Resources:

– How to setup Anaconda: https://docs.anaconda.com/anaconda/install/
– How to install psygopg2: https://anaconda.org/anaconda/psycopg2

๐Ÿ‘‰ Subscribe to my channel: https://bit.ly/2GsFxmA
๐Ÿ‘‰ Practice real data science interview questions: https://platform.stratascratch.com/coding?utm_source=youtube&utm_medium=click&utm_campaign=YT+description+link

______________________________________________________________________
Timestamps:

Intro: (0:00)
Identifying CSVs automatically: (2:20)
Automatically making a new directory and moving files: (6:55)
Converting CSV files to pandas dataframes: (13:13)
Automating the cleaning of table names and headers: (18:17)
Automatically connect to database and import files: (25:17)

______________________________________________________________________

Contact:

If you have any questions, comments, or feedback, please leave them here!
Feel free to also email me at nathan@stratascratch.com
______________________________________________________________________

source

24 Comments
    • blank
      StrataScratch
      May 13, 2022 08:35 am Reply

      Find Part 1 where I create the script's functionality before automating the work here (https://youtu.be/wqBFgaMgFQA).

      Let me know what you guys think. Is this useful?

    • blank
      David kiania
      May 13, 2022 08:35 am Reply

      Most helpful youtube video series I have seen in a long long time … thank you soo much.

    • blank
      Diaconescu Tiberiu
      May 13, 2022 08:35 am Reply

      I am on windows and mv command is not recognized what can i replace it with

    • blank
      Arthur Amanyire
      May 13, 2022 08:35 am Reply

      Hi Nathan, thanks alot for sharing this. I have tried to follow along but i have hit a wall. At the point where the datasets directory is created all is fine but the code to move the csv files to the datasets directory runs with no errors but the files are not moved. I'm using windows and i noticed you are using mac, could the issue be the os i'm using?

      One other thing, would you consider continuing this series by showing how to pull the data from db to tableau/power bi for realtime analysis

    • blank
      Chris Popp
      May 13, 2022 08:35 am Reply

      If you are using AWS services, you might want to import boto3 and use SDK api calls instead of using psycopg2. Also, if you are using AWS, you donโ€™t want to pass your database password as plaintext in a parameter. Itโ€™s better to use boto3 and pull the sensitive data from AWS Secrets Manager service.

    • blank
      Pulak Kabir
      May 13, 2022 08:35 am Reply

      bro, how to start with an (almost) empty CLASS with the headers of your data file?

    • blank
      Victoria
      May 13, 2022 08:35 am Reply

      This is absolutely amazing, Nathan. Can't thank you enough for sharing your talent in teaching across the community of DS. Would there be a "quick" way to change this code to open .xls instead of .csv? And if possible transform that .xls into a .csv file and proceed with the rest of your code? That would be super helpful if you have any advise (or other videos to refer to). Thanks!

    • blank
      Ivee Pendo
      May 13, 2022 08:35 am Reply

      hi!

      for csv in csv_files:

      mv_file = "mv '{0}' {1}".format(csv, dataset_dir)

      os.system(mv_file)

      this part didn't move my files into the new datasets directory. Am I missing something? Thanks!

    • blank
      XxShadowzeekxX
      May 13, 2022 08:35 am Reply

      what do you mean by running the os system and executing on the batch command? How do I do this because my files have not moved to the new datasets folder? Thank you!

    • blank
      Anthony O`Brien Vicencio Palla
      May 13, 2022 08:35 am Reply

      Hi nate,nice video ๐Ÿ™‚
      I have a questions… How I fix this?? COPY from stdin failed: error in .read() call: UnicodeDecodeError 'charmap' codec can't decode byte 0x81 in position 885: character maps to <undefined>

      CONTEXT: COPY tags, line 9260

    • blank
      Sudarshan VU
      May 13, 2022 08:35 am Reply

      Awesome learning platform.
      Here my question is Instead of pandas df can we do same project task in Pyspark Pyspark-SQL and AWS RDS???

    • blank
      Prateek
      May 13, 2022 08:35 am Reply

      Hey Nate, your videos are just too good. I love how your channel is so dedicated towards real word data science. By the way I noticed that you started a video series, "For your Data Science Project" and I really want you to continue making videos for this particular series because there's literally no one on YouTube with such guidance on DS projects and I have been looking for one since a very long time because I have my placements just after 12 months and I really want to make a full stack data science project. Thank you.

    • blank
      Harsha Madhwani
      May 13, 2022 08:35 am Reply

      Simply Superb… love the way you explain and make it sound so simple. Thank you so much!!

    • blank
      Stephen Sander
      May 13, 2022 08:35 am Reply

      Hi I love your videos, is it possible to make this script a rest API?

    • blank
      wanderborn
      May 13, 2022 08:35 am Reply

      Thanks a lot. It's just what I was looking for to learn-how.
      Just a though: wouldn't it be right/appropriate to clean the data in .csv and then move it to db?

    • blank
      javierjdaza
      May 13, 2022 08:35 am Reply

      dude why use copy instead of insert into sql???

    • blank
      Oluwaseun Atoyebi
      May 13, 2022 08:35 am Reply

      This is incredible my friend. Can you do a video where you load each of the 3 files into separate tables? I'd be curious to see how you dynamically handle that. Thanks for the great work.

    • blank
      abdo bourenane
      May 13, 2022 08:35 am Reply

      What an amazing code !! , i love your explanation , just if you can give us the github link to the data so we can practice , Thank you

    • blank
      Everett Witt
      May 13, 2022 08:35 am Reply

      this is an incredible channel – thanks a ton man

    • blank
      Md Abul Kalam Azad
      May 13, 2022 08:35 am Reply

      Love you teaching style.

    • blank
      Mysterious BD
      May 13, 2022 08:35 am Reply

      Awesome… Looking for a while for such code. Thanks for sharing it

    • blank
      Nargis Parvin
      May 13, 2022 08:35 am Reply

      Excellent work sir – you are making python so easy,great job and I respect for your work.

    • blank
      Grant Culp
      May 13, 2022 08:35 am Reply

      Good video! Just a preference on using .format vs. f strings?

    • blank
      BitLink TV
      May 13, 2022 08:35 am Reply

      Your content is mind-blowing Nate at StrataScratch, Join us @ http://www.bitlink.tv to claim your prize! we are a CryptoFriendly community

Leave us a comment