How to rename all the columns in a DataFrame in one-go?

In Python, there are many ways to achieve the same thing, and it varies person to person.

Below is the first method.

>>> df=spark.read.csv("/employees/employees.csv",header=True,inferSchema=True)
>>> df.printSchema()                            
                                
root
 |-- Emp ID: integer (nullable = true)
 |-- Name Prefix: string (nullable = true)
 |-- First Name: string (nullable = true)
 |-- Middle Initial: string (nullable = true)
 |-- Last Name: string (nullable = true)
 |-- Gender: string (nullable = true)
 |-- E Mail: string (nullable = true)
 |-- Father's Name: string (nullable = true)
 |-- Mother's Name: string (nullable = true)
 |-- Mother's Maiden Name: string (nullable = true)
 |-- Date of Birth: string (nullable = true)
 |-- Date of Joining: string (nullable = true)
 |-- Salary: integer (nullable = true)
 |-- Phone No. : string (nullable = true)
 |-- Place Name: string (nullable = true)
 |-- County: string (nullable = true)
 |-- City: string (nullable = true)
 |-- State: string (nullable = true)
 |-- Zip: integer (nullable = true)
 |-- Region: string (nullable = true)


 

 

 

 

 

 

Above schema has space,"." and "'S" character in the column names. 

Here is the one-line code to rename all the columns.

>>> df=df.toDF(*(x.replace(" ","_").replace("._","").replace("'s_","_").lower() for x in df.columns))
>>> df.printSchema()
root
 |-- emp_id: integer (nullable = true)
 |-- name_prefix: string (nullable = true)
 |-- first_name: string (nullable = true)
 |-- middle_initial: string (nullable = true)
 |-- last_name: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- e_mail: string (nullable = true)
 |-- father_name: string (nullable = true)
 |-- mother_name: string (nullable = true)
 |-- mother_maiden_name: string (nullable = true)
 |-- date_of_birth: string (nullable = true)
 |-- date_of_joining: string (nullable = true)
 |-- salary: integer (nullable = true)
 |-- phone_no: string (nullable = true)
 |-- place_name: string (nullable = true)
 |-- county: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- zip: integer (nullable = true)
 |-- region: string (nullable = true)



 

 

 

 

Now all the column names have changed with lower case.

Here is the 2nd method. 

def col_rename(df):

    for old_col in df.columns:

        new_col=old_col.replace("._","").replace("'s_","_").lower()

        df=df.withColumnRenamed(old_col,new_col)

    return df

 

df=col_rename(df)


 

 

 

 

        

 



No comments:

Post a Comment