How to make movies with VMD, imagemagick and ffmpeg

Text will follow. First the resulting video as a GIF.

2017/09/22 08:00 · thoelken

How to clean Text from HTML tags and UTF-8 symbols

The mailing list I use for the bike project I'm involved in, is set to forward text mode only and in some cases spits out unintelligible HTML/UTF gibberish. To clean up the mess, I found a quite robust converter under MailChimp.

2017/05/02 07:45 · thoelken

The magic of join

Often we want to compare or connect two lists and see what the overlap is like. While this is possible in spreadsheet software (Excel and alike) it is often overly complicated and does not produce the results we would like.

UNIX join to the rescue.

With the following example we can join two files easily:

 cat students.csv
 > #studentID,name,semester
 > 0,Peter,3
 > 1,Anna,2
 > 2,Sonja,7
 
 cat grades.csv
 > #studentID,course,grade
 > 2,Physics,89
 > 0,Math,40
 > 0,Physics,30
 
 join students.csv grades.csv
 > #studentID,name,semester,course,grade
 > 0,Peter,3,Math,40
 > 0,Peter,3,Physics,30
 > 2,Sonja,7,Physics,89

What happened to Anna? She is not in both files and thus omitted from the output. If we want to include all entries from one file we can do so.

 join -a 1 students.csv grades.csv
 > #studentID,name,semester,course,grade
 > 0,Peter,3,Math,40
 > 0,Peter,3,Physics,30
 > 1,Anna,2
 > 2,Sonja,7,Physics,89
 

Better! But if we want to use this table to sort by grade or course there are no entries for Anna. In fact most parsers will complain, that row 4 has less fields than the others. We can include empty fields with added separators with the auto output format.

 join -a 1 -o auto students.csv grades.csv
 > #studentID,name,semester,course,grade
 > 0,Peter,3,Math,40
 > 0,Peter,3,Physics,30
 > 1,Anna,2,,
 > 2,Sonja,7,Physics,89
 

These two added commas will save us a lot of headache down the line.

2017/04/06 12:04 · thoelken

DNA binding proteins

I found this very interesting site: Lecture notes on DNA binding proteins

A very nice but extremely ugly and somewhat old course material about DNA binding proteins with all physics relevant vor interactions and general DNA folding properties.

2017/03/31 13:08 · thoelken

The fun of GFF parsing

Lately I come to work a lot with especially eukaryotic genome annotation (-sigh- prokaryotes are sooo much easier) and have to rely on tools reading GFF or GTF formatted annotations.

Once you get into the trenches of elaborate exon structures of different isoforms, you will notice that neither GFF nor GTF were ever a good idea for quick parsing or any analysis. To make matters worse, both formats adhere only sometimes very loosely to some vague (mostly optional) conventions -sigh-.

Some explanations or rather recommendations can be found here:

The last is part of a widely used GFF utility suite including gffread:

And the following site tries to validate any of the arbitrarily pieced-together GFFs:

How come all these will-do “standards” have become so widely adopted and every improvement always lead to even more confusion and more unparsable laissez-faire data junk. </rant>

2017/01/27 10:21 · thoelken

Older entries >>