Indiana University
  •  
  •  
  •  

Running ClustalW on Big Red

On Big Red, you can use ClustalW to align multiple sequences. A parallel implementation of ClustalW 1.82 (ClustalW-MPI) is installed at:

/N/soft/`whatami`/clustalw-mpi-1.14

The README files are in:

/N/soft/`whatami`/clustalw-mpi-1.14/doc

The clustalwjob script submits a job that runs ClustalW. The clustalwjob script should be in your path by default, and its manual page should be in your default path for manual pages. Syntax for clustalwjob is:

clustalwjob options_to_clustalw -CPUS n -wallhours h

Replace options_to_clustalw with command line options, n with the number of processors to use, and h with the maximum amount of time the job should be allowed to run. If you omit the CPUS option, four processors will be used. To request more than four processors, specify an integer value that is a multiple of 4. If you specify a value that is not a multiple of 4, the value will be increased to the next multiple of 4. The maximum number of processors is 128 (unless you also specify a larger queue; see the clustalwjob man page). For example, to use 16 processors to align amino acid sequences in file aaseqs, run:

clustalwjob -infile=aaseqs -type=protein -align -CPUS 16

If you omit the -wallhours option, your job will be allowed to run for two hours. Use the -wallhours option to request more time, up to 336 hours (14 days). Queues other than the default have lower time limits; see the clustalwjob man page.)

Options to ClustalW are usually available by entering the command with no argument, but this feature is not available in the parallel version. Options are listed in:

/N/soft/`whatami`/clustalw-mpi-1.14/README.OPTIONS

Options are listed here for your convenience:

CLUSTAL W (1.82) Multiple Sequence Alignments clustalw option list:- -help -check -options -align -newtree=filename -usetree=filename -newtree1=filename -usetree1=filename -newtree2=filename -usetree2=filename -bootstrap -tree -quicktree -convert -interactive -batch -infile=filename -profile1=filename -profile2=filename -type=protein OR dna -profile -sequences -matrix=filename -dnamatrix=filename -negative -noweights -gapopen=f -gapext=f -endgaps -nopgap -nohgap -novgap -hgapresidues=string -maxdiv=n -gapdist=n -pwmatrix=filename -pwdnamatrix=filename -pwgapopen=f -pwgapext=f -ktuple=n -window=n -pairgap=n -topdiags=n -score=percent OR absolute -transweight=f -seed=n -kimura -tossgaps -bootlabels=node OR branch -debug=n -output=gcg OR gde OR pir OR phylip OR nexus -outputtree=nj OR phylip OR dist OR nexus -outfile=filename -outorder=input OR aligned -case=lower OR upper -seqnos=off OR on -nosecstr1 -nosecstr2 -secstrout=structure OR mask OR both OR none -helixgap=n -strandgap=n -loopgap=n -terminalgap=n -helixendin=n -helixendout=n -strandendin=n -strandendout=n

When you run clustalwjob, you'll receive a message when your job is submitted to the queue, and another when the job finishes. To check the status of your job, use the llq command.

In addition to output files that clustalwjob produces, such as .aln files, clustalwjob will produce files with filenames similar to clustalwjob.9999.err and clustalwjob.9999.out, where 9999 is the number of your job. Such files contain information that clustalwjob would print to the screen if you were running it interactively from the command line.