Possible Duplicate:
Select only the first rows for each unique value of a column in R
I have a matrix of the following form:
col1 col2
1 2
1 2
1 2
1 2
1 2
2 5
2 5
2 5
3 7
3 7
3 7
3 7
3 7
3 7
3 7
3 7
4 2
4 2
4 2
I would like to select all the unique rows based on ‘col1’.
which in this case would be the first row from each unique value in col1:
subset:
col1 col2
1 2
2 5
3 7
4 2
Here’s what I’ve tried:
https://dl.dropbox.com/u/22681355/matrix.csv
mat<-read.csv("matrix.csv")
sub<-unique(mat$V1)
subset(mat, mat==c(sub)
It spits out much more than I would expect to get and I get this error mesage:
Warning message:
In contacts$V1 == c(g) :
longer object length is not a multiple of shorter object length
1
You can use the unique
function:
unique(mat$V1) # and not matrix$v1
[1] 44 281 1312
You can also write
unique(mat)
and it will give you unique lines (I tried it on your file).
If you want to select on V1
s values, you can do this:
> mat[!duplicated(mat$V1), ]
X V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1547 44 14 1 2 100 17 0 0 0 0
23 5385 281 67 2 10 100 10 0 0 0 0
33 17347 1312 1 2 6 100 8 0 0 0 0
5