The magic of join

Often we want to compare or connect two lists and see what the overlap is like. While this is possible in spreadsheet software (Excel and alike) it is often overly complicated and does not produce the results we would like.

UNIX join to the rescue.

With the following example we can join two files easily:

 cat students.csv
 > #studentID,name,semester
 > 0,Peter,3
 > 1,Anna,2
 > 2,Sonja,7
 
 cat grades.csv
 > #studentID,course,grade
 > 2,Physics,89
 > 0,Math,40
 > 0,Physics,30
 
 join students.csv grades.csv
 > #studentID,name,semester,course,grade
 > 0,Peter,3,Math,40
 > 0,Peter,3,Physics,30
 > 2,Sonja,7,Physics,89

What happened to Anna? She is not in both files and thus omitted from the output. If we want to include all entries from one file we can do so.

 join -a 1 students.csv grades.csv
 > #studentID,name,semester,course,grade
 > 0,Peter,3,Math,40
 > 0,Peter,3,Physics,30
 > 1,Anna,2
 > 2,Sonja,7,Physics,89
 

Better! But if we want to use this table to sort by grade or course there are no entries for Anna. In fact most parsers will complain, that row 4 has less fields than the others. We can include empty fields with added separators with the auto output format.

 join -a 1 -o auto students.csv grades.csv
 > #studentID,name,semester,course,grade
 > 0,Peter,3,Math,40
 > 0,Peter,3,Physics,30
 > 1,Anna,2,,
 > 2,Sonja,7,Physics,89
 

These two added commas will save us a lot of headache down the line.