How to Read Expressions From a File in C

Nearly of the data is available in a tabular format of CSV files. It is very popular. You can catechumen them to a pandas DataFrame using the read_csv function. The pandas.read_csv is used to load a CSV file as a pandas dataframe.

In this article, you will learn the dissimilar features of the read_csv role of pandas apart from loading the CSV file and the parameters which can be customized to go better output from the read_csv function.

pandas.read_csv

  • Syntax: pandas.read_csv( filepath_or_buffer, sep, header, index_col, usecols, prefix, dtype, converters, skiprows, skiprows, nrows, na_values, parse_dates)Purpose: Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking the file into chunks.
  • Parameters:
    • filepath_or_buffer : str, path object or file-like object Any valid string path is acceptable. The string could be a URL as well. Path object refers to os.PathLike. File-like objects with a read() method, such every bit a filehandle (e.g. via built-in open function) or StringIO.
    • sep : str, (Default ',') Separating boundary which distinguishes between any two subsequent information items.
    • header : int, list of int, (Default 'infer') Row number(southward) to use as the column names, and the start of the data. The default beliefs is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the get-go line of the file.
    • names : array-like List of column names to employ. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this listing are not allowed.
    • index_col : int, str, sequence of int/str, or False, (Default None) Cavalcade(south) to use as the row labels of the DataFrame, either given as string name or cavalcade index. If a sequence of int/str is given, a MultiIndex is used.
    • usecols : listing-like or callable Return a subset of the columns. If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True.
    • prefix : str Prefix to add together to cavalcade numbers when no header, e.g. 'X' for X0, X1
    • dtype : Type proper name or dict of cavalcade -> type Data blazon for data or columns. Eastward.g. {'a': np.float64, 'b': np.int32, 'c': 'Int64'} Utilize str or object together with suitable na_values settings to preserve and not interpret dtype.
    • converters : dict Dict of functions for converting values in sure columns. Keys can either be integers or cavalcade labels.
    • skiprows : list-like, int or callable Line numbers to skip (0-indexed) or the number of lines to skip (int) at the showtime of the file. If callable, the callable role volition be evaluated against the row indices, returning True if the row should be skipped and False otherwise.
    • skipfooter : int Number of lines at bottom of the file to skip
    • nrows : int Number of rows of file to read. Useful for reading pieces of large files.
    • na_values : scalar, str, list-like, or dict Additional strings to recognize every bit NA/NaN. If dict passed, specific per-column NA values. By default the post-obit values are interpreted as NaN: '', '#Northward/A', '#Northward/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '', 'N/A', 'NA', 'Null', 'NaN', 'n/a', 'nan', 'null'.
    • parse_dates : bool or listing of int or names or list of lists or dict, (default False) If set up to True, will try to parse the index, else parse the columns passed
  • Returns: DataFrame or TextParser, A comma-separated values (CSV) file is returned equally a two-dimensional data construction with labeled axes. _For full list of parameters, refer to the offical documentation

Reading CSV file

The pandas read_csv function tin can be used in dissimilar ways as per necessity like using custom separators, reading just selective columns/rows and so on. All cases are covered below one subsequently another.

Default Separator

To read a CSV file, call the pandas role read_csv() and laissez passer the file path as input.

Step 1: Import Pandas

                      import            pandas            as            pd        

Pace 2: Read the CSV

                      # Read the csv file            df            = pd.read_csv("data1.csv")            # First 5 rows            df.caput()        
read_csv file from pandas

Dissimilar, Custom Separators

Past default, a CSV is seperated past comma. Only yous can employ other seperators besides. The pandas.read_csvfunction is not limited to reading the CSV file with default separator (i.e. comma). Information technology can be used for other separators such every bit ;, | or :. To load CSV files with such separators, the sep parameter is used to pass the separator used in the CSV file.

Permit'due south load a file with | separator

          #            Read            the csv            file            sep='|'            df = pd.read_csv("data2.csv", sep='|') df                  
Custom Separators for read  _csv pandas file

Ready whatever row as column header

Let's see the information frame created using the read_csv pandas function without any header parameter:

                      # Read the csv file            df            = pd.read_csv("data1.csv") df.head()                  
Column header for read  _csv pandas file

The row 0 seems to be a better fit for the header. It can explain better about the figures in the tabular array. You can brand this 0 row as a header while reading the CSV by using the header parameter. Header parameter takes the value as a row number.

Note: Row numbering starts from 0 including column header

                      # Read the csv file with header parameter            df            = pd.read_csv("data1.csv",            header=1)            df.head()                  
Column header for read  _csv pandas file

Renaming cavalcade headers

While reading the CSV file, you tin can rename the column headers past using the names parameter. The names parameter takes the listing of names of the column header.

          # Read the csv            file            with names            parameter            df            = pd.read_csv(            "data.csv"            , names=[            'Ranking'            ,            'ST Proper name'            ,            'Popular'            ,            'NS'            ,            'D'            ])            df.caput()                  
Renaming Column header for read  _csv pandas file

To avert the sometime header being inferred every bit a row for the information frame, you can provide the header parameter which will override the onetime header names with new names.

          # Read the csv            file            with header            and            names            parameter            df            = pd.read_csv(            "data.csv"            , header=0, names=[            'Ranking'            ,            'ST Proper noun'            ,            'Popular'            ,            'NS'            ,            'D'            ])            df.caput()                  
Renaming Column header for read  _csv pandas file

Loading CSV without column headers in pandas

There is a chance that the CSV file yous load doesn't have whatever cavalcade header. The pandas will make the first row equally a cavalcade header in the default case.

                      # Read the csv file            df            = pd.read_csv("data3.csv") df.caput()                  
Default case without column header

To avoid whatever row being inferred as cavalcade header, you lot tin specify header equally None. Information technology will force pandas to create numbered columns starting from 0.

                      # Read the csv file with header=None            df            = pd.read_csv("data3.csv",            header=None)            df.caput()                  
Default case without column header

Adding Prefixes to numbered columns

You can also give prefixes to the numbered column headers using the prefix parameter of pandas read_csv function.

                      # Read the csv file with header=None and prefix=column_            df            = pd.read_csv("data3.csv",            header=None,            prefix='column_')            df.caput()                  

Ready whatsoever column(s) as Index

By default, Pandas adds an initial alphabetize to the information frame loaded from the CSV file. You can control this behavior and make any column of your CSV as an index by using the index_col parameter.

It takes the proper noun of the desired cavalcade which has to exist made equally an index.

Case 1: Making one cavalcade every bit index

          # Read the csv file            with            'Rank'            as            index df = pd.read_csv("data.csv", index_col='Rank') df.head()                  

Case 2: Making multiple columns as index

For two or more columns to be fabricated equally an index, pass them as a list.

          # Read the csv            file            with            'Rank'            and            'Date'            every bit            index            df = pd.read_csv("information.csv", index_col=['Rank',            'Date']) df.head()                  

Selecting columns while reading CSV

In do, all the columns of the CSV file are not of import. Y'all can select simply the necessary columns afterward loading the file but if yous're aware of those beforehand, you tin save the space and fourth dimension.

usecols parameter takes the list of columns you want to load in your data frame.

Selecting columns using list

          #            Read            the csv file            with            'Rank',            'Date'            and            'Population'            columns (listing) df = pd.read_csv("information.csv", usecols=['Rank',            'Date',            'Population']) df.head()                  
Selecting column for read_csv pandas file

Selecting columns using callable functions

usecols parameter can besides accept callable functions. The callable functions evaluate on column names to select that specific column where the office evaluates to True.

          # Read the csv file            with            columns            where            length            of            column proper name >            x            df = pd.read_csv("data.csv", usecols=lambda x: len(10)>x) df.head()                  
Selecting column for read_csv pandas file

Selecting/skipping rows while reading CSV

You tin skip or select a specific number of rows from the dataset using the pandas.read_csv function. At that place are 3 parameters that can do this task: nrows, skiprows and skipfooter.

All of them accept different functions. Let'southward discuss each of them separately.

A. nrows : This parameter allows you to control how many rows y'all desire to load from the CSV file. Information technology takes an integer specifying row count.

                      # Read the csv file with 5 rows            df            = pd.read_csv("information.csv",            nrows=five)            df                  
Selecting rows for read_csv pandas file

B. skiprows : This parameter allows you to skip rows from the beginning of the file.

Skiprows by specifying row indices

                      # Read the csv file with offset row skipped            df            = pd.read_csv("data.csv",            skiprows=1)            df.head()                  
Selecting rows for read_csv pandas file

Skiprows by using callback office

skiprows parameter tin likewise take a callable function every bit input which evaluates on row indices. This means the callable role will check for every row indices to decide if that row should be skipped or not.

                      # Read the csv file with odd rows skipped            df            = pd.read_csv("information.csv",            skiprows=lambda            x: ten%2!=0) df.caput()                  
Selecting rows for read_csv pandas file

C. skipfooter : This parameter allows you to skip rows from the stop of the file.

                      # Read the csv file with one row skipped from the end            df            = pd.read_csv("data.csv",            skipfooter=1)            df.tail()                  
Selecting rows for read_csv pandas file

Changing the data type of columns

You tin specify the information types of columns while reading the CSV file. dtype parameter takes in the dictionary of columns with their data types defined. To assign the information types, you can import them from the numpy package and mention them against suitable columns.

Data Blazon of Rank earlier change

                      # Read the csv file                        df            = pd.read_csv("data.csv")            # Display datatype of Rank            df.Rank.dtypes                  
                                    dtype              ('int64')                              

Data Blazon of Rank after modify

          #            import            numpy            import            numpy            equally            np  #            Read            the csv file with information            type            specified for            Rank.            df            = pd.read_csv("data.csv", dtype={'Rank':np.int8})  #            Brandish            datatype            of rank            df.Rank.dtypes                  
                                    dtype              ('int8')                              

Parse Dates while reading CSV

Appointment fourth dimension values are very crucial for data analysis. You lot can convert a column to a datetime blazon column while reading the CSV in two ways:

Method one. Make the desired column as an index and pass parse_dates=True

          # Read the csv file            with            'Date'            as            index and parse_dates=Truthful            df = pd.read_csv("data.csv", index_col='Date', parse_dates=True, nrows=5)  # Brandish index df.index                  
          DatetimeIndex(['2021            -02            -25', '2021            -04            -14', '2021            -02            -19', '2021            -02            -24',                '2021            -02            -13'],               dtype='datetime64[ns]', name='Date', freq=None)                  

Method 2. Pass desired column in parse_dates as list

          # Read the csv file            with            parse_dates=['Date'] df = pd.read_csv("information.csv", parse_dates=['Date'], nrows=5)  # Display datatypes            of            columns df.dtypes                  
                      Rank            int64            Country                          object                        Population                          object                        National            Share            (%)                          object                        Date            datetime64[ns] dtype:                          object                              

Calculation more NaN values

Pandas library can handle a lot of missing values. But in that location are many cases where the data contains missing values in forms that are not nowadays in the pandas NA values list. It doesn't empathize 'missing', 'not establish', or 'not available' as missing values.

So, y'all need to assign them as missing. To do this, use the na_values parameter that takes a listing of such values.

Loading CSV without specifying na_values

                      # Read the csv file            df            = pd.read_csv("data.csv",            nrows=five)            df                  
Adding NaN values

Loading CSV with specifying na_values

          # Read the csv file            with            'missing'            as            na_values df = pd.read_csv("data.csv", na_values=['missing'], nrows=v) df                  
Adding NaN values

Convert values of the column while reading CSV

You can transform, modify, or convert the values of the columns of the CSV file while loading the CSV itself. This tin be done by using the converters parameter. converters takes in a dictionary with keys equally the column names and values are the functions to be applied to them.

Let'due south convert the comma seperated values (i.e 19,98,12,341) of the Population column in the dataset to integer value (199812341) while reading the CSV.

                      # Function which converts comma seperated value to integer            toInt = lambda x:            int(10.replace(',',            ''))            if            x!='missing'            else            -one            # Read the csv file                        df = pd.read_csv("information.csv", converters={'Population': toInt}) df.head()                  

Practical Tips

  • Earlier loading the CSV file into a pandas data frame, always take a skimmed await at the file. It will help you gauge which columns yous should import and determine what information types your columns should have.
  • Y'all should besides lookout for the total row count of the dataset. A system with four GB RAM may not be able to load seven-8M rows.

Exam your knowledge

Q1: You cannot load files with the $ separator using the pandas read_csv office. True or Fake?

Answer:

Answer: Faux. Because, you can use sep parameter in read_csv function.

Q2: What is the use of the converters parameter in the read_csv part?

Respond:

Reply: converters parameter is used to change the values of the columns while loading the CSV.

Q3: How will you make pandas recognize that a detail column is datetime type?

Answer:

Answer: By using parse_dates parameter.

Q4: A dataset contains missing values no, not available, and '-100'. How volition you lot specify them as missing values for Pandas to correctly translate them? (Presume CSV file name: example1.csv)

Answer:

Reply: Past using na_values parameter.

                          import              pandas              as              pd  df = pd.read_csv("example1.csv", na_values=['no',              'not available',              '-100'])                      

Q5: How would you read a CSV file where,

  1. The heading of the columns is in the tertiary row (numbered from 1).
  2. The last 5 lines of the file have garbage text and should be avoided.
  3. Simply the column names whose kickoff letter starts with vowels should be included. Presume they are one give-and-take merely.

(CSV file name: example2.csv)

Reply:

Answer:

                          import              pandas              every bit              pd  colnameWithVowels = lambda              x:              x.lower()[0]              in              ['a',              'e',              'i',              'o',              'u']  df = pd.read_csv("example2.csv", usecols=colnameWithVowels, header=two, skipfooter=five)                      

The commodity was contributed by Kaustubh Chiliad and Shrivarsheni

roushdervive.blogspot.com

Source: https://www.machinelearningplus.com/pandas/pandas-read_csv-completed/

0 Response to "How to Read Expressions From a File in C"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel