Question

I have a program that reads approximately 2million rows from a database into a List. Each row is a location that contains information such as geographic co-ordinates.

Once data is added to the List I use a foreach loop and grab the co-ordinates to create a kml file. The loop encounters an OutOfMemoryException error when the number of rows is large (but works perfectly otherwise).

Any suggestions on how to handle this so that the program can work with very large sets of data? The kml library is SharpKML.

I am still new to C# so please go easy!

This is the loop:

            using (SqlConnection conn = new SqlConnection(connstring))
        {
            conn.Open();
            SqlCommand cmd = new SqlCommand(select, conn);

            using (cmd)
            {
                SqlDataReader reader = cmd.ExecuteReader();
                while (reader.Read())
                {
                    double lat = reader.GetDouble(1);
                    double lon = reader.GetDouble(2);
                    string country = reader.GetString(3);
                    string county = reader.GetString(4);
                    double TIV = reader.GetDouble(5);
                    double cnpshare = reader.GetDouble(6);
                    double locshare = reader.GetDouble(7);

                    //Add results to list
                    results.Add(new data(lat, lon, country, county, TIV, cnpshare, locshare));
                }
                reader.Close();
            }
            conn.Close();
        }

            int count = results.Count();
            Console.WriteLine("number of rows in results = " + count.ToString());

            //This code segment generates the kml point plot

            Document doc = new Document();
            try
            {
                foreach (data l in results)
                {
                    Point point = new Point();
                    point.Coordinate = new Vector(l.lat, l.lon);

                    Placemark placemark = new Placemark();
                    placemark.Geometry = point;
                    placemark.Name = Convert.ToString(l.tiv);

                    doc.AddFeature(placemark);

                }
            }
            catch(OutOfMemoryException e)
            {
                throw e;
            }

This is the class uused in the List

        public class data
    {
        public double lat { get; set; }
        public double lon { get; set; }
        public string country { get; set; }
        public string county { get; set; }
        public double tiv { get; set; }
        public double cnpshare { get; set; }
        public double locshare { get; set; }

        public data(double lat, double lon, string country, string county, double tiv, double cnpshare,
            double locshare)
        {
            this.lat = lat;
            this.lon = lon;
            this.country = country;
            this.county = county;
            this.tiv = tiv;
            this.cnpshare = cnpshare;
            this.locshare = locshare;
        }

    }
Was it helpful?

Solution

If there is no big delay in populating list with data from database and you did not mentioned problems with populating list with data, why not immediately create your Point and Placemark object. Code is below.

    var doc = new Document();

    using (SqlConnection conn = new SqlConnection(connstring))
    {
        conn.Open();
        SqlCommand cmd = new SqlCommand(select, conn);

        using (cmd)
        {
            var reader = cmd.ExecuteReader();
            while (reader.Read())
            {
                double lat = reader.GetDouble(1);
                double lon = reader.GetDouble(2);
                string country = reader.GetString(3);
                string county = reader.GetString(4);
                double TIV = reader.GetDouble(5);
                double cnpshare = reader.GetDouble(6);
                double locshare = reader.GetDouble(7);

                var point = new Point();
                point.Coordinate = new Vector(lat , lon );

                var placemark = new Placemark();
                placemark.Geometry = point;
                placemark.Name = Convert.ToString(TIV);

                doc.AddFeature(placemark);

            reader.Close();
        }
        conn.Close();
    }

If there is no good reason for retrieving so many data in memory, try with some lazy loading approach.

OTHER TIPS

Why do you need to store all the data before writing it? Rather than adding each row to a list, you should process each row as it is read, then forget about it.

For instance, try rolling your code together like this:

Document doc = new Document();
while (reader.Read())
{
    // read from db
    double lat = reader.GetDouble(1);
    double lon = reader.GetDouble(2);
    string country = reader.GetString(3);
    string county = reader.GetString(4);
    double TIV = reader.GetDouble(5);
    double cnpshare = reader.GetDouble(6);
    double locshare = reader.GetDouble(7);

    var currentData = new data(lat, lon, country, county, TIV, cnpshare, locshare));

    // write to file
    Point point = new Point();
    point.Coordinate = new Vector(currentData.lat, currentData.lon);

    Placemark placemark = new Placemark();
    placemark.Geometry = point;
    placemark.Name = Convert.ToString(currentData.tiv);

    doc.AddFeature(placemark);
}

This will only work if Document is implemented sensibly though.

Oliver is right (up-vote from me). Performance wise you can do some other stuff. First do not query for fields you're not going to use. Then move all variable declarations (Oliver's code) before the while statement (?). Finally instead of waiting your sql server to collect and send all records back, do it progressively with steps. For example if your records have an UID and the order to get them is this UID then start with a local C# variable "var lastID = 0", change your select statement to something like (pre-format) "select top 1000 ... where UID > lastID" and repeat your queries until you get nothing or anything will less than 1000 records.

@drdigit,

I would avoid executing queries in loop. One query should always return as much data as it is needed in that moment. In this case you would have 1000 queries that returns 1000 rows. Maybe it is better for quickly showing first 1000 rows, but I'm not sure if it will be faster if you execute 1000 faster queries in loop instead you execute only one query.Maybe I'm wrong....

I think your approach is good for lazy loading if there is need for that in this situation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top