Read Csv and Keep as Metrix Python

Maret 12, 2022 Posting Komentar

This tutorial explains how to read a CSV file in python using read_csv function of pandas parcel. Without utilise of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python packet for data manipulation and supports various functions to load and import information from various formats. Here we are covering how to deal with common issues in importing CSV file.

Install and Load Pandas Parcel

Make sure you have pandas packet already installed on your system. If you set upwards python using Anaconda, it comes with pandas package so y'all don't need to install it again. Otherwise you tin install information technology by using command pip install pandas. Next stride is to load the package by running the following command. pd is an allonym of pandas package. We volition apply it instead of full name "pandas".

import pandas as pd

Create Sample Information for Import

The program below creates a sample pandas dataframe which can exist used further for sit-in.

dt = {'ID': [11, 12, xiii, fourteen, 15],             'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],             'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],             'salary': [74, 76, 96, 71, 78]} mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary'])

The sample information looks like below -

          ID first_name company  salary 0  11      David     Aon      74 one  12      Jamie     TCS      76 2  13      Steve  Google      96 3  14    Stevart     RBS      71 iv  15       John       .      78

Save data equally CSV in the working directory

Check working directory earlier you relieve your datafile.

import os os.getcwd()

Incase you want to change the working directory, you tin can specify it in under os.chdir( ) function. Unmarried backslash does not work in Python so utilize two backslashes while specifying file location.

os.chdir("C:\\Users\\DELL\\Documents\\")

The following command tells python to write data in CSV format in your working directory.

mydt.to_csv('workingfile.csv', alphabetize=False)

Example 1 : Read CSV file with header row

Information technology'due south the basic syntax of read_csv() function. You just need to mention the filename. Information technology assumes you have column names in kickoff row of your CSV file.

mydata = pd.read_csv("workingfile.csv")

It stores the data the way It should be equally we accept headers in the offset row of our datafile. Information technology is important to highlight that header=0 is the default value. Hence nosotros don't need to mention the header= parameter. It ways header starts from first row as indexing in python starts from 0. The above code is equivalent to this line of code. pd.read_csv("workingfile.csv", header=0)

Inspect data later importing

mydata.shape mydata.columns mydata.dtypes

It returns 5 number of rows and 4 number of columns. Column Names are ['ID', 'first_name', 'visitor', 'salary']

See the cavalcade types of data we imported. first_name and company are grapheme variables. Remaining variables are numeric ones.

ID             int64 first_name    object visitor       object bacon         int64

Example two : Read CSV file with header in second row

Suppose you lot have column or variable names in 2d row. To read this kind of CSV file, you can submit the post-obit control.

mydata = pd.read_csv("workingfile.csv", header = one)

header=ane tells python to pick header from second row. It'due south setting 2nd row equally header. It's not a realistic example. I just used information technology for illustration and then that you get an idea how to solve it. To make it practical, you can add random values in first row in CSV file and then import it again.

          11    David     Aon  74          0  12    Jamie     TCS  76 1  xiii    Steve  Google  96 two  14  Stevart     RBS  71 3  xv     John       .  78

Define your own column names instead of header row from CSV file

mydata0 = pd.read_csv("workingfile.csv", skiprows=1, names=['CustID', 'Proper noun', 'Companies', 'Income'])

skiprows = i ways nosotros are ignoring first row and names= option is used to assign variable names manually.

          CustID     Name Companies  Income 0      11    David       Aon      74 1      12    Jamie       TCS      76 two      13    Steve    Google      96 three      fourteen  Stevart       RBS      71 4      15     John         .      78

Example 3 : Skip rows but continue header

mydata = pd.read_csv("workingfile.csv", skiprows=[ane,2])

In this example, nosotros are skipping second and third rows while importing. Don't forget index starts from 0 in python and then 0 refers to first row and 1 refers to second row and 2 implies 3rd row.

          ID first_name company  salary 0  13      Steve  Google      96 ane  xiv    Stevart     RBS      71 2  xv       John       .      78

Instead of [1,two] y'all can also write range(i,3). Both means the aforementioned thing but range( ) function is very useful when yous want to skip many rows and then it saves fourth dimension of manually defining row position.

Hidden hush-hush of skiprows pick

When skiprows = 4, it means skipping four rows from summit. skiprows=[one,2,3,4] means skipping rows from second through 5th. Information technology is considering when list is specified in skiprows= option, it skips rows at alphabetize positions. When a single integer value is specified in the choice, it considers skip those rows from top

Instance 4 : Read CSV file without header row

If yous specify "header = None", python would assign a series of numbers starting from 0 to (number of columns - ane) as column names. In this datafile, we accept column names in first row.

mydata0 = pd.read_csv("workingfile.csv", header = None)

See the output shown beneath-

Output

Add prefix to column names

mydata0 = pd.read_csv("workingfile.csv", header = None,          prefix="var")

In this instance, we are setting var equally prefix which tells python to include this keyword before each cavalcade name.

          var0        var1     var2    var3 0   ID  first_name  company  bacon i   eleven       David      Aon      74 2   12       Jamie      TCS      76 iii   13       Steve   Google      96 four   14     Stevart      RBS      71 5   xv        John        .      78

Example 5 : Specify missing values

The na_values= options is used to prepare some values every bit blank / missing values while importing CSV file.

mydata00 = pd.read_csv("workingfile.csv",          na_values=['.'])

          ID first_name visitor  bacon 0  eleven      David     Aon      74 1  12      Jamie     TCS      76 2  xiii      Steve  Google      96 iii  fourteen    Stevart     RBS      71 4  xv       John          NaN          78

Example half-dozen : Set Index Column

mydata01 = pd.read_csv("workingfile.csv",          index_col ='ID')

          first_name company  salary ID                            11      David     Aon      74 12      Jamie     TCS      76 13      Steve  Google      96 14    Stevart     RBS      71 xv       John       .      78

As you can see in the to a higher place output, the column ID has been set as index column.

Example seven : Read CSV File from External URL

Yous tin can directly read information from the CSV file that is stored on a web link. It is very handy when you demand to load publicly bachelor datasets from github, kaggle and other websites.

mydata02 = pd.read_csv("http://winterolympicsmedals.com/medals.csv")

This DataFrame contains 2311 rows and 8 columns. Using mydata02.shape, you can generate this summary.

Example 8 : Skip Last v Rows While Importing CSV

mydata04 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skip_footer=five)

In the above code, we are excluding lesser 5 rows using skip_footer= parameter.

Example 9 : Read only get-go 5 rows

mydata05 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=five)

Using nrows= pick, y'all can load summit K number of rows.

Example 10 : Interpreting "," as thousands separator

mydata06  = pd.read_csv("http://winterolympicsmedals.com/medals.csv", thousands=",")

Example 11 : Read only specific columns

mydata07 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[ane,5,7])

The above code reads only columns based on index positions which are second, 6th and eighth position.

Example 12 : Read some rows and columns

mydata08 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[1,five,7], nrows=5)

In the in a higher place control, nosotros take combined usecols= and nrows= options. It will select only first 5 rows and selected columns.

Example thirteen : Read file with semi colon delimiter

mydata09 = pd.read_csv("file_path", sep =          ';')

Using sep= parameter in read_csv( ) role, you lot can import file with any delimiter other than default comma. In this case, we are using semi-colon every bit a separator.

Instance 14 : Change cavalcade type while importing CSV

Suppose yous want to change column format from int64 to float64 while loading CSV file into Python. We tin use dtype = selection for the same.

mydf = pd.read_csv("workingfile.csv", dtype = {"salary" : "float64"})

Instance fifteen : Measure time taken to import big CSV file

With the use of verbose=True, you can capture time taken for Tokenization, conversion and Parser retentiveness cleanup.

mydf = pd.read_csv("workingfile.csv", verbose=True)

Example 16 : How to read CSV file without using Pandas parcel

To import CSV file with pure python manner, you lot can submit the post-obit control :

import csv with open("C:/Users/DELL/Downloads/nycflights.csv") as f:   d = DictReader(f)   l=list(d)

You lot tin likewise download and load CSV file from URL or external webpage.

import csv import requests  response = requests.become('https://dyurovsky.github.io/psyc201/data/lab2/nycflights.csv').text lines = response.splitlines() d = csv.DictReader(lines) l = listing(d)

EndNote

Afterward completion of this tutorial, I hope y'all gained conviction in importing CSV file into Python with ways to clean and manage file. You can also check out this tutorial which explains how to import files of different format to Python. Once done, you should learn how to perform common data manipulation or wrangling tasks similar filtering, selecting and renaming columns, identify and remove duplicates etc on pandas dataframe.

grossmestans1945.blogspot.com

Source: https://www.listendata.com/2019/06/pandas-read-csv.html

Gross Mestans1945