Monday, 4 May 2015

r - SpatialPointsDataFrame too slow FOR LOOP


I've run into a small problem where my for loop has been running for 3.5 hrs+. I have a Geodataset in SQL Server with a field called Shape that stores co-ordinates in geometry data type. Firstly, I connect R to my DB via ODBC and retrieve the information I want (also converting the Shape column to something readable)


library(sp)
library(rgeos)
library(RODBC)

ch<-odbcConnect("SpatialAnalysis", rows_at_time=1)
df<-sqlQuery(ch, "select OBJECTID, LOT_NO, Shape.STAsText() as WKT FROM SRC_PLI_QLD")
cnt<-sqlQuery(ch, "select count(OBJECTID) from SRC_PLI_QLD")

This has 2.5 million points. I now read them into a SpatialPointsDataFrame, reading the first element first.


point.sp <- SpatialPointsDataFrame(readWKT(df$WKT[1]),
data=data.frame(OBJECTID=df$OBJECTID[1], LOT_NO=df$LOT_NO[1]))

Now I read the rest of the elements. AND this is where the problem is. It has been 3.5 hrs and still running.


for (n in 2:as.integer(cnt)) {

point.sp <- rbind(point.sp,
SpatialPointsDataFrame(readWKT(df$WKT[n]),
data.frame(OBJECTID=df$OBJECTID[n], LOT_NO=df$LOT_NO[n])))
}

What is the problem in the above mentioned for loop? Is there another way I can do this?



Answer



I solved this in another way and it took less than 10 seconds. The change was in the query. Instead of retrieving Shape.STAsText() as a WKT object from Shape column, I retrieved the Lat and Long value from Shape


df<-sqlQuery(ch, "select OBJECTID, LOT_NO, Shape.STY as Lat, Shape.STX as Lon FROM SRC_PLI_QLD")
cnt<-sqlQuery(ch, "select count(OBJECTID) from SRC_PLI_QLD")


Then I did :


coordinates(df) =~Lat+Lon

and my dataframe df has been converted to a SpatialPointsDataFrame


> class(df)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...