Read Csv and Keep as Metrix Python
This tutorial explains how to read a CSV file in python using read_csv function of pandas parcel. Without utilise of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python packet for data manipulation and supports various functions to load and import information from various formats. Here we are covering how to deal with common issues in importing CSV file.
Install and Load Pandas Parcel
Make sure you have pandas packet already installed on your system. If you set upwards python using Anaconda, it comes with pandas package so y'all don't need to install it again. Otherwise you tin install information technology by using command pip install pandas
. Next stride is to load the package by running the following command. pd
is an allonym of pandas package. We volition apply it instead of full name "pandas".
import pandas as pd
Create Sample Information for Import
The program below creates a sample pandas dataframe which can exist used further for sit-in.
dt = {'ID': [11, 12, xiii, fourteen, 15], 'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'], 'company': ['Aon', 'TCS', 'Google', 'RBS', '.'], 'salary': [74, 76, 96, 71, 78]} mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary'])
The sample information looks like below -
ID first_name company salary 0 11 David Aon 74 one 12 Jamie TCS 76 2 13 Steve Google 96 3 14 Stevart RBS 71 iv 15 John . 78
Save data equally CSV in the working directory
Check working directory earlier you relieve your datafile.
import os os.getcwd()
Incase you want to change the working directory, you tin can specify it in under os.chdir( )
function. Unmarried backslash does not work in Python so utilize two backslashes while specifying file location.
os.chdir("C:\\Users\\DELL\\Documents\\")
The following command tells python to write data in CSV format in your working directory.
mydt.to_csv('workingfile.csv', alphabetize=False)
Example 1 : Read CSV file with header row
Information technology'due south the basic syntax of read_csv() function. You just need to mention the filename. Information technology assumes you have column names in kickoff row of your CSV file.
mydata = pd.read_csv("workingfile.csv")
It stores the data the way It should be equally we accept headers in the offset row of our datafile. Information technology is important to highlight that header=0
is the default value. Hence nosotros don't need to mention the header= parameter. It ways header starts from first row as indexing in python starts from 0. The above code is equivalent to this line of code. pd.read_csv("workingfile.csv", header=0)
Inspect data later importing
mydata.shape mydata.columns mydata.dtypes
It returns 5 number of rows and 4 number of columns. Column Names are ['ID', 'first_name', 'visitor', 'salary']
See the cavalcade types of data we imported. first_name and company are grapheme variables. Remaining variables are numeric ones.
ID int64 first_name object visitor object bacon int64
Example two : Read CSV file with header in second row
Suppose you lot have column or variable names in 2d row. To read this kind of CSV file, you can submit the post-obit control.
mydata = pd.read_csv("workingfile.csv", header = one)
header=ane
tells python to pick header from second row. It'due south setting 2nd row equally header. It's not a realistic example. I just used information technology for illustration and then that you get an idea how to solve it. To make it practical, you can add random values in first row in CSV file and then import it again.
11 David Aon 74 0 12 Jamie TCS 76 1 xiii Steve Google 96 two 14 Stevart RBS 71 3 xv John . 78
Define your own column names instead of header row from CSV file
mydata0 = pd.read_csv("workingfile.csv", skiprows=1, names=['CustID', 'Proper noun', 'Companies', 'Income'])
skiprows = i ways nosotros are ignoring first row and names= option is used to assign variable names manually.
CustID Name Companies Income 0 11 David Aon 74 1 12 Jamie TCS 76 two 13 Steve Google 96 three fourteen Stevart RBS 71 4 15 John . 78
Example 3 : Skip rows but continue header
mydata = pd.read_csv("workingfile.csv", skiprows=[ane,2])
In this example, nosotros are skipping second and third rows while importing. Don't forget index starts from 0 in python and then 0 refers to first row and 1 refers to second row and 2 implies 3rd row.
ID first_name company salary 0 13 Steve Google 96 ane xiv Stevart RBS 71 2 xv John . 78
Instead of [1,two] y'all can also write range(i,3)
. Both means the aforementioned thing but range( ) function is very useful when yous want to skip many rows and then it saves fourth dimension of manually defining row position.
Hidden hush-hush of skiprows pick
When skiprows = 4, it means skipping four rows from summit. skiprows=[one,2,3,4] means skipping rows from second through 5th. Information technology is considering when list is specified in skiprows= option, it skips rows at alphabetize positions. When a single integer value is specified in the choice, it considers skip those rows from top
Instance 4 : Read CSV file without header row
If yous specify "header = None", python would assign a series of numbers starting from 0 to (number of columns - ane) as column names. In this datafile, we accept column names in first row.
mydata0 = pd.read_csv("workingfile.csv", header = None)
See the output shown beneath-
![]() |
Output |
Add prefix to column names
mydata0 = pd.read_csv("workingfile.csv", header = None, prefix="var")
In this instance, we are setting var
equally prefix which tells python to include this keyword before each cavalcade name.
var0 var1 var2 var3 0 ID first_name company bacon i eleven David Aon 74 2 12 Jamie TCS 76 iii 13 Steve Google 96 four 14 Stevart RBS 71 5 xv John . 78
Example 5 : Specify missing values
The na_values=
options is used to prepare some values every bit blank / missing values while importing CSV file.
mydata00 = pd.read_csv("workingfile.csv", na_values=['.'])
ID first_name visitor bacon 0 eleven David Aon 74 1 12 Jamie TCS 76 2 xiii Steve Google 96 iii fourteen Stevart RBS 71 4 xv John NaN 78
Example half-dozen : Set Index Column
mydata01 = pd.read_csv("workingfile.csv", index_col ='ID')
first_name company salary ID 11 David Aon 74 12 Jamie TCS 76 13 Steve Google 96 14 Stevart RBS 71 xv John . 78
As you can see in the to a higher place output, the column ID has been set as index column.
Example seven : Read CSV File from External URL
Yous tin can directly read information from the CSV file that is stored on a web link. It is very handy when you demand to load publicly bachelor datasets from github, kaggle and other websites.
mydata02 = pd.read_csv("http://winterolympicsmedals.com/medals.csv")
This DataFrame contains 2311 rows and 8 columns. Using mydata02.shape
, you can generate this summary.
Example 8 : Skip Last v Rows While Importing CSV
mydata04 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skip_footer=five)
In the above code, we are excluding lesser 5 rows using skip_footer= parameter.
Example 9 : Read only get-go 5 rows
mydata05 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=five)
Using nrows= pick, y'all can load summit K number of rows.
Example 10 : Interpreting "," as thousands separator
mydata06 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", thousands=",")
Example 11 : Read only specific columns
mydata07 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[ane,5,7])
The above code reads only columns based on index positions which are second, 6th and eighth position.
Example 12 : Read some rows and columns
mydata08 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[1,five,7], nrows=5)
In the in a higher place control, nosotros take combined usecols= and nrows= options. It will select only first 5 rows and selected columns.
Example thirteen : Read file with semi colon delimiter
mydata09 = pd.read_csv("file_path", sep = ';')
Using sep= parameter in read_csv( ) role, you lot can import file with any delimiter other than default comma. In this case, we are using semi-colon every bit a separator.
Instance 14 : Change cavalcade type while importing CSV
Suppose yous want to change column format from int64 to float64 while loading CSV file into Python. We tin use dtype = selection for the same.
mydf = pd.read_csv("workingfile.csv", dtype = {"salary" : "float64"})
Instance fifteen : Measure time taken to import big CSV file
With the use of verbose=True
, you can capture time taken for Tokenization, conversion and Parser retentiveness cleanup.
mydf = pd.read_csv("workingfile.csv", verbose=True)
Example 16 : How to read CSV file without using Pandas parcel
To import CSV file with pure python manner, you lot can submit the post-obit control :
import csv with open("C:/Users/DELL/Downloads/nycflights.csv") as f: d = DictReader(f) l=list(d)
You lot tin likewise download and load CSV file from URL or external webpage.
import csv import requests response = requests.become('https://dyurovsky.github.io/psyc201/data/lab2/nycflights.csv').text lines = response.splitlines() d = csv.DictReader(lines) l = listing(d)
EndNote
Afterward completion of this tutorial, I hope y'all gained conviction in importing CSV file into Python with ways to clean and manage file. You can also check out this tutorial which explains how to import files of different format to Python. Once done, you should learn how to perform common data manipulation or wrangling tasks similar filtering, selecting and renaming columns, identify and remove duplicates etc on pandas dataframe.
Source: https://www.listendata.com/2019/06/pandas-read-csv.html
Posting Komentar untuk "Read Csv and Keep as Metrix Python"