Parsing individual lines in a robots.txt file with C#
-
26-09-2019 - |
Question
Working on an application to parse robots.txt. I wrote myself a method that pulled the the file from a webserver, and threw the ouput into a textbox. I would like the output to display a single line of text for every line thats in the file, just as it would appear if you were looking at the robots.txt normally, however the ouput in my textbox is all of the lines of text without carriage returns or line breaks. So I thought I'd be crafty, make a string[] for all the lines, make a foreach loop and all would be well. Alas that did not work, so then I thought I would try System.Enviornment.Newline, still not working. Here's the code as it sounds now....how can I change this so I get all the individual lines of robots.txt as opposed to a bunch of text cobbled together?
public void getRobots()
{
WebClient wClient = new WebClient();
string url = String.Format("http://{0}/robots.txt", urlBox.Text);
try
{
Stream data = wClient.OpenRead(url);
StreamReader read = new StreamReader(data);
string[] lines = new string[] { read.ReadToEnd() };
foreach (string line in lines)
{
textBox1.AppendText(line + System.Environment.NewLine);
}
}
catch (WebException ex)
{
MessageBox.Show(ex.Message, null, MessageBoxButtons.OK);
}
}
Solution
You are reading the entire file into the first element of the lines
array:
string[] lines = new string[] {read.ReadToEnd()};
So all your loop is doing is adding the whole contents of the file into the TextBox, followed by a newline character. Replace that line with these:
string content = read.ReadToEnd();
string[] lines = content.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
And see if that works.
Edit: an alternative and perhaps more efficient way, as per Fish's comment below about reading line by line—replace the code within the try
block with this:
Stream data = wClient.OpenRead(url);
StreamReader read = new StreamReader(data);
while (read.Peek() >= 0)
{
textBox1.AppendText(read.ReadLine() + System.Environment.NewLine);
}
OTHER TIPS
You need to make the textBox1 multiline. Then I think you can simply go
textBox1.Lines = lines;
but let me check that
Try
public void getRobots()
{
WebClient wClient = new WebClient();
string robotText;
string[] robotLines;
System.Text.StringBuilder robotStringBuilder;
robotText = wClient.DownloadString(String.Format("http://{0}/robots.txt", urlBox.Text));
robotLines = robotText.Split(Environment.NewLine);
robotStringBuilder = New StringBuilder();
foreach (string line in robotLines)
{
robotStringBuilder.Append(line);
robotStringBuilder.Append(Environment.NewLine);
}
textbox1.Text = robotStringBuilder.ToString();
}
Try using .Read() in a while loop instead of .ReadToEnd() - I think you're just getting the entire file as one line in your lines array. Debug and check the count of lines[] to verify this.
Edit: Here's a bit of sample code. Haven't tested it, but I think it should work OK;
Stream data = wClient.OpenRead(url);
StreamReader read = new StreamReader(data);
List<string> lines = new List<string>();
string nextLine = read.ReadLine();
while (nextLine != null)
{
lines.Add(nextLine);
nextLine = read.ReadLine();
}
textBox1.Lines = lines.ToArray();