Here's a starting point:
require 'haml'
haml_doc = <<EOT
%html
%head
:css
.id {font-weight: bold;}
.signalp {color:#000099; font-weight: bold;}
.motif {color:#FF3300; font-weight: bold;}
h3 {word-wrap: break-word;}
p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
%body
EOT
engine = Haml::Engine.new(haml_doc)
puts engine.render
Which outputs this when run:
<html>
<head>
<style>
.id {font-weight: bold;}
.signalp {color:#000099; font-weight: bold;}
.motif {color:#FF3300; font-weight: bold;}
h3 {word-wrap: break-word;}
p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
</style>
</head>
<body></body>
</html>
From there, you can easily write to a file using:
File.write(output, engine.render)
instead of using puts
to output it to the console.
To use this, you need to flesh out the haml_doc
with additional Haml to loop over your input data and massage it into an array or hash that you can iterate over cleanly, without embedding all sorts of scan
and conditional logic. A view should be primarily used to output content, not manipulate data.
Just above the engine = Haml...
line you'd want to read your input data and massage it, and store it in an instance variable that Haml can iterate over. You have the basic idea in your original code but instead of trying to output HTML, create an object or sub-hash that you can pass to Haml.
Normally this would all be separated into separate files for the model, the view and the controller, like in Rails or big Sinatra apps, but this really isn't a big app, so you can put it all in one file. Keep your logic clean and it'll be fine.
Without sample input data and an expected output it's hard to do more, but that'll give you a starting point.
Based on the data samples, here's something that gets in you the ballpark. I won't polish it because, after all, you have to do some of it, but this is a reasonable start. The first part is mocking up something reasonably like the Bio you reference in your code, but which I've never seen. You don't need this part, but might want to look through it:
module Bio
FastaFormat = 1
SAMPLE_DATA = <<-EOT
>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00001_f4_15 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00003_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00003_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00004_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00004_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00009_f2_3 - Signal P Cleavage Site => 22:23
MLKCFSIIMGLILLLEIGGGCA-IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD
>isotig00009_f3_9 - Signal P Cleavage Site => 16:17
MKTGIIIFISTVVVLP-ITLKPCGVPFSCCIPDQASGVANTQCGYGVRSPEQQNTFHTKIYTTGCADMFTMWINRYLYYIAGIAGVIVLVELFGFCFAHSLINDIKRQKARWAHR
>isotig00009_f6_13 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00009_f6_14 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
EOT
class FlatFile
class Entry
attr_reader :definition, :aaseq
def initialize(definition, aaseq)
@definition = definition
@aaseq = aaseq
end
end
def initialize
end
def self.open(filetype, filename)
SAMPLE_DATA.split("\n").each_slice(2).map{ |seq_id, sequence| Entry.new(seq_id, sequence) }
end
def each_entry
@sample_data.each do |_entry|
yield _entry
end
end
end
end
Here's where the fun begins. I modified your get_hash
routine to parse the strings how I'd do it. Instead of a hash, it returns an array of hashes. Each sub-hash is ready to be used, in other words, the data is parsed and ready to be output:
include Bio
def make_array_of_hashes(input_file)
Bio::FlatFile.open(
Bio::FastaFormat,
input_file
).map { |entry|
id_start, id_end = entry.definition.split('-').map(&:strip)
signalp, seq_end = entry.aaseq.split('-')
motif = seq_end.scan(/(?:WL|KK|RR|KR|R..R|R....R)/)
{
:id_start => id_start,
:id_end => id_end,
:signalp => signalp,
:motif => motif
}
}
end
This is a simple way to define the HAML document inside the body of a script. I only output, there's no logic in the template except to loop. Everything else was handled prior to the view being processed:
haml_doc = <<EOT
!!!
%html
%head
:css
.id {font-weight: bold;}
.signalp {color:#000099; font-weight: bold;}
.motif {color:#FF3300; font-weight: bold;}
h3 {word-wrap: break-word;}
p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
%body
- data.each do |d|
%p
%span.id= d[:id_start]
%span= d[:id_end]
%br/
%span.signalp= d[:signalp]
- d[:motif].each do |m|
%span= m
EOT
And here's how to use it:
require 'haml'
data = make_array_of_hashes('sample.txt')
engine = Haml::Engine.new(haml_doc)
puts engine.render(Object.new, :data => data)
Which, when run outputs:
<!DOCTYPE html>
<html>
<head>
<style>
.id {font-weight: bold;}
.signalp {color:#000099; font-weight: bold;}
.motif {color:#FF3300; font-weight: bold;}
h3 {word-wrap: break-word;}
p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
</style>
</head>
<body></body>
<p>
<span class='id'>>isotig00001_f4_14</span>
<span>Signal P Cleavage Site => 11:12</span>
<br>
<span class='signalp'>MMHLLCIVLLL</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00001_f4_15</span>
<span>Signal P Cleavage Site => 10:11</span>
<br>
<span class='signalp'>MHLLCIVLLL</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00003_f6_8</span>
<span>Signal P Cleavage Site => 11:12</span>
<br>
<span class='signalp'>MMHLLCIVLLL</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00003_f6_9</span>
<span>Signal P Cleavage Site => 10:11</span>
<br>
<span class='signalp'>MHLLCIVLLL</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00004_f6_8</span>
<span>Signal P Cleavage Site => 11:12</span>
<br>
<span class='signalp'>MMHLLCIVLLL</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00004_f6_9</span>
<span>Signal P Cleavage Site => 10:11</span>
<br>
<span class='signalp'>MHLLCIVLLL</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00009_f2_3</span>
<span>Signal P Cleavage Site => 22:23</span>
<br>
<span class='signalp'>MLKCFSIIMGLILLLEIGGGCA</span>
<span>KR</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00009_f3_9</span>
<span>Signal P Cleavage Site => 16:17</span>
<br>
<span class='signalp'>MKTGIIIFISTVVVLP</span>
<span>KR</span>
</p>
<p>
<span class='id'>>isotig00009_f6_13</span>
<span>Signal P Cleavage Site => 11:12</span>
<br>
<span class='signalp'>MMHLLCIVLLL</span>
<span>WL</span>
</p>
<p>
<span class='id'>>isotig00009_f6_14</span>
<span>Signal P Cleavage Site => 10:11</span>
<br>
<span class='signalp'>MHLLCIVLLL</span>
<span>WL</span>
</p>
</html>