Last updated: 2019-07-24
This code cleans and standardizes data on the LD50 of pesticides to the honey bee, Apis mellifera from EPA’s ECOTOX database, matches these data to USGS pesticide names using CAS numbers, and exports a slimmed down version of the cleaned data.
library(tidyverse)
library(readxl)
library(scales)
# Create path for most recent ECOTOX file
filename <- "ecotox_ld50_apis_mellifera_20170731"
path <- paste("../data/",filename,".csv",sep="")
# Store date when data were downloaded from ECOTOX
data_date <- substr(filename, (nchar(filename)-7), nchar(filename))
# Read in data
# Convert blanks, NC (not coded), and NR (not reported) to NA (i.e. missing data)
data <- read.csv(path, na.strings=c(""," ","NC","NR"))
# Check structure of the data
str(data)
'data.frame': 1668 obs. of 55 variables:
$ CAS.Number : int 50293 50293 50293 50293 50293 50293 50293 50293 50293 50293 ...
$ Chemical.Name : Factor w/ 456 levels "(1-Methylethyl)phosphoramidic acid ethyl-3-methyl-4-(methylthio)phenyl ester",..: 73 73 73 73 73 73 73 73 73 73 ...
$ Chemical.Formulation : Factor w/ 25 levels "Active ingredient",..: 13 13 20 13 13 10 13 13 13 13 ...
$ Chemical.Comment : Factor w/ 671 levels "10.1","10WP",..: 251 250 249 250 250 248 247 247 247 247 ...
$ Species.Scientific.Name : Factor w/ 5 levels "Apis mellifera",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Species.Common.Name : Factor w/ 5 levels "Carniolan Honey Bee",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Organism.Source : Factor w/ 5 levels "Captive breeding colony",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Organism.Lifestage : Factor w/ 7 levels "Adult(s)","Larva(e)",..: 1 4 1 4 4 1 1 1 1 1 ...
$ Exposure.Duration.Mean.Op..Days.: Factor w/ 1 level ">=": NA NA NA NA NA NA NA NA NA NA ...
$ Exposure.Duration.Mean..Days. : num 2 2 2 2 2 2 1 1 1 1 ...
$ Exposure.Duration.Min.Op..Days. : logi NA NA NA NA NA NA ...
$ Exposure.Duration.Min..Days. : num NA NA NA NA NA NA NA NA NA NA ...
$ Exposure.Duration.Max.Op..Days. : logi NA NA NA NA NA NA ...
$ Exposure.Duration.Max..Days. : int NA NA NA NA NA NA NA NA NA NA ...
$ Exposure.Duration.Unit..Days. : Factor w/ 2 levels "Day(s)","Emergence": 1 1 1 1 1 1 1 1 1 1 ...
$ Test.Type : Factor w/ 3 levels "Acute","Not coded",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Test.Location : Factor w/ 1 level "Lab": 1 1 1 1 1 1 1 1 1 1 ...
$ Test.Method : Factor w/ 5 levels "European and Mediterranean Plant Protection Organization",..: 5 5 5 5 5 5 2 2 2 2 ...
$ Exposure.Type : Factor w/ 11 levels "Dermal","Diet, unspecified",..: 9 11 9 11 9 9 11 11 11 11 ...
$ Dose.Mean.Op : logi NA NA NA NA NA NA ...
$ Dose.Mean : logi NA NA NA NA NA NA ...
$ Dose.Min.Op : logi NA NA NA NA NA NA ...
$ Dose.Min : num NA NA NA NA NA NA NA NA NA NA ...
$ Dose.Max.Op : logi NA NA NA NA NA NA ...
$ Dose.Max : num NA NA NA NA NA NA NA NA NA NA ...
$ Dose.Units : Factor w/ 10 levels "AI mg/L","AI ppm",..: NA NA NA NA NA NA NA NA NA NA ...
$ Exposure.Sample.Number.Mean.Op : logi NA NA NA NA NA NA ...
$ Exposure.Sample.Number.Mean : int NA NA NA NA NA NA NA NA NA NA ...
$ Exposure.Sample.Number.Min.Op : logi NA NA NA NA NA NA ...
$ Exposure.Sample.Number.Min : int NA NA NA NA NA NA NA NA NA NA ...
$ Exposure.Sample.Number.Max.Op : logi NA NA NA NA NA NA ...
$ Exposure.Sample.Number.Max : int NA NA NA NA NA NA NA NA NA NA ...
$ Dose.Number : Factor w/ 17 levels "' 4-5","' 5-6",..: NA NA NA NA NA NA 12 12 12 12 ...
$ Endpoint : Factor w/ 1 level "LD50": 1 1 1 1 1 1 1 1 1 1 ...
$ Effect : Factor w/ 1 level "Mortality": 1 1 1 1 1 1 1 1 1 1 ...
$ Effect.Measurement : Factor w/ 1 level "Mortality": 1 1 1 1 1 1 1 1 1 1 ...
$ Observed.Response.Mean.Op : Factor w/ 4 levels "<","<=",">","~": NA NA NA NA NA NA NA NA NA NA ...
$ Observed.Response.Mean : num 7.12 6.4 5.36 3.9 3.7 5.95 9.6 6.9 7.5 20.1 ...
$ Observed.Response.Min.Op : Factor w/ 2 levels "<",">": NA NA NA NA NA NA NA NA NA NA ...
$ Observed.Response.Min : num NA NA NA NA NA NA NA NA NA NA ...
$ Observed.Response.Max.Op : Factor w/ 1 level "<": NA NA NA NA NA NA NA NA NA NA ...
$ Observed.Response.Max : num NA NA NA NA NA NA NA NA NA NA ...
$ Result.Statistical.Method : logi NA NA NA NA NA NA ...
$ Observed.Response.Value : logi NA NA NA NA NA NA ...
$ Observed.Response.Units : Factor w/ 33 levels "%","AI lb/acre",..: 32 32 32 32 32 32 32 32 32 32 ...
$ Conc.1.Type..Author. : Factor w/ 4 levels "Active ingredient",..: 1 3 3 3 3 3 2 2 2 2 ...
$ Conc.1..Author. : Factor w/ 1027 levels "<=170","<=50",..: 945 907 855 737 730 869 1009 914 950 649 ...
$ Conc.1.Units..Author. : Factor w/ 33 levels "%","AI lb/acre",..: 32 32 32 32 32 32 32 32 32 32 ...
$ Result.Comment : Factor w/ 376 levels "//","COLONY 1//",..: 1 1 1 1 1 1 52 85 121 161 ...
$ Reference.Number : int 344 344 344 344 344 344 101387 101387 101387 101387 ...
$ Source : Factor w/ 66 levels "Acta Hortic.288:133-138",..: 27 27 27 27 27 27 39 39 39 39 ...
$ Author : Factor w/ 64 levels "Aliano,N.P., and M.D. Ellis",..: 58 58 58 58 58 58 25 25 25 25 ...
$ Title : Factor w/ 66 levels "Acute Contact Toxicity of Oxalic Acid to Varroa destructor (Acari: Varroidae) and Their Apis mellifera (Hymenop"| __truncated__,..: 44 44 44 44 44 44 56 56 56 56 ...
$ Publication.Year : int 1992 1992 1992 1992 1992 1992 1965 1965 1965 1965 ...
$ Summary.of.Additional.Parameters: Factor w/ 329 levels "Conc 1 (Author): \xe6 Active ingredient >0.04 - <0.05 ug/org | Conc 2 (Author): \xe6 NR (NR - NR) NR | Conc 3 ("| __truncated__,..: 206 322 322 322 322 322 277 277 277 277 ...
# Fix variable names with double periods
# This code replaces '..' with '.'
names(data) <- gsub("..",".",names(data), fixed=TRUE)
# Select variables of interest & rename them
data_slim<-select(data,
cas = CAS.Number,
cmpd_epa = Chemical.Name,
species = Species.Scientific.Name,
sp_common = Species.Common.Name,
org_source = Organism.Source,
stage = Organism.Lifestage,
exp_days = Exposure.Duration.Mean.Days.,
exp_type = Exposure.Type,
form = Conc.1.Type.Author.,
n = Exposure.Sample.Number.Mean,
dose_n = Dose.Number,
ld50_op = Observed.Response.Mean.Op,
ld50 = Observed.Response.Mean,
ld50_min_op = Observed.Response.Min.Op,
ld50_min = Observed.Response.Min,
ld50_max_op = Observed.Response.Max.Op,
ld50_max = Observed.Response.Max,
units = Observed.Response.Units,
method = Test.Method,
ref = Reference.Number
)
# Convert CAS number to factor
data_slim$cas <- as.factor(data_slim$cas)
# Check structure of new data
str(data_slim)
'data.frame': 1668 obs. of 20 variables:
$ cas : Factor w/ 456 levels "50293","51036",..: 1 1 1 1 1 1 1 1 1 1 ...
$ cmpd_epa : Factor w/ 456 levels "(1-Methylethyl)phosphoramidic acid ethyl-3-methyl-4-(methylthio)phenyl ester",..: 73 73 73 73 73 73 73 73 73 73 ...
$ species : Factor w/ 5 levels "Apis mellifera",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sp_common : Factor w/ 5 levels "Carniolan Honey Bee",..: 4 4 4 4 4 4 4 4 4 4 ...
$ org_source : Factor w/ 5 levels "Captive breeding colony",..: 3 3 3 3 3 3 3 3 3 3 ...
$ stage : Factor w/ 7 levels "Adult(s)","Larva(e)",..: 1 4 1 4 4 1 1 1 1 1 ...
$ exp_days : num 2 2 2 2 2 2 1 1 1 1 ...
$ exp_type : Factor w/ 11 levels "Dermal","Diet, unspecified",..: 9 11 9 11 9 9 11 11 11 11 ...
$ form : Factor w/ 4 levels "Active ingredient",..: 1 3 3 3 3 3 2 2 2 2 ...
$ n : int NA NA NA NA NA NA NA NA NA NA ...
$ dose_n : Factor w/ 17 levels "' 4-5","' 5-6",..: NA NA NA NA NA NA 12 12 12 12 ...
$ ld50_op : Factor w/ 4 levels "<","<=",">","~": NA NA NA NA NA NA NA NA NA NA ...
$ ld50 : num 7.12 6.4 5.36 3.9 3.7 5.95 9.6 6.9 7.5 20.1 ...
$ ld50_min_op: Factor w/ 2 levels "<",">": NA NA NA NA NA NA NA NA NA NA ...
$ ld50_min : num NA NA NA NA NA NA NA NA NA NA ...
$ ld50_max_op: Factor w/ 1 level "<": NA NA NA NA NA NA NA NA NA NA ...
$ ld50_max : num NA NA NA NA NA NA NA NA NA NA ...
$ units : Factor w/ 33 levels "%","AI lb/acre",..: 32 32 32 32 32 32 32 32 32 32 ...
$ method : Factor w/ 5 levels "European and Mediterranean Plant Protection Organization",..: 5 5 5 5 5 5 2 2 2 2 ...
$ ref : int 344 344 344 344 344 344 101387 101387 101387 101387 ...
# Make table of reference/source info
ref_key <- data %>%
select(ref = Reference.Number,
pub = Source,
auth = Author,
title = Title,
year = Publication.Year) %>%
group_by(ref, pub, auth, title, year) %>%
summarise(obs = length(ref))
ref_key <- ref_key[order(ref_key$ref),] # order by ref number
# Save reference key with associated date for future reference
ref_path <- paste("../keys/","ecotox_ref_key","_",data_date,".csv",sep="")
write.csv(ref_key, ref_path)
# Code exposure type into fewer categories (contact, oral, other)
# Note: 'Dropwise' occurs only in Torchio (1973) and involves putting a drop of known volume on each bee
exp_type <- levels(data_slim$exp_type)
exp_type
[1] "Dermal"
[2] "Diet, unspecified"
[3] "Direct application"
[4] "Drinking water"
[5] "Dropwise"
[6] "Environmental, unspecified"
[7] "Food"
[8] "Multiple routes between application groups"
[9] "Oral via capsule"
[10] "Spray"
[11] "Topical, general"
exp_grp <- c("contact",
"oral",
"contact",
"oral",
"contact",
"other",
"oral",
"other",
"oral",
"contact",
"contact")
exp_key <- as.data.frame(cbind(exp_type,exp_grp))
exp_key
exp_type exp_grp
1 Dermal contact
2 Diet, unspecified oral
3 Direct application contact
4 Drinking water oral
5 Dropwise contact
6 Environmental, unspecified other
7 Food oral
8 Multiple routes between application groups other
9 Oral via capsule oral
10 Spray contact
11 Topical, general contact
# Use table join to add new, recoded variable
data_slim <- left_join(data_slim, exp_key, by="exp_type")
# Convert number of doses into a numeric value (minimum # doses)
dose_n <- levels(data_slim$dose_n)
dose_n
[1] "' 4-5" "' 5-6" "' 5-8" "' 6-7" "' 6-8" ">=4" ">=5" ">=6"
[9] ">=7" "10" "11" "2" "5" "6" "7" "8"
[17] "9"
dose_min <- c(4,5,5,6,6,4,5,6,7,10,11,2,5,6,7,8,9)
dose_key <- as.data.frame(cbind(dose_n, dose_min))
dose_key
dose_n dose_min
1 ' 4-5 4
2 ' 5-6 5
3 ' 5-8 5
4 ' 6-7 6
5 ' 6-8 6
6 >=4 4
7 >=5 5
8 >=6 6
9 >=7 7
10 10 10
11 11 11
12 2 2
13 5 5
14 6 6
15 7 7
16 8 8
17 9 9
# Use table join to add new, minimum dose variable (and make numeric)
data_slim <- left_join(data_slim, dose_key, by="dose_n")
data_slim$dose_min <- as.numeric(as.character(data_slim$dose_min))
To standardize into common units of ug/bee, I exported all of the units from the dataset, then determined (a) which ones could be standardized, and (b) standardization factors for converting into ug/bee - these are related in the unit_key
. Most cases simply involve unit conversions (i.e. mg/org
to ug/org
). For units like ug
I checked the original studies to interpret the units, making corrections when necessary. Units that were checked in original references are reflected in the ecotox_unit_key.csv
in the column called checked_ref
(y
= yes, n
= no). Some LD50s were reported on a body weight basis; I assumed an individual adult bee weighs 0.12 g, see COLOSS Bee Manual Section 2.2
# Check distribution of observations across units
table(data_slim$units)
% AI lb/acre AI mg/L AI mg/org AI ng
6 40 1 1 3
AI ng/org AI ng/org/d AI oz/acre AI ppm AI ppm food
122 4 1 24 3
AI ug AI ug/g AI ug/g bdwt AI ug/ml AI ug/org
1 2 6 3 50
lb/gal mg mg/bee mg/cm2 mg/org
1 12 3 9 12
ml/1000 org ng/org pg/org ppb ppm
3 54 1 9 11
ug ug/bee ug/eu ug/g ug/g bdwt
74 207 11 12 21
ug/g org ug/org ug/ul
34 923 3
# Fix a few mistaken entries
# Laurino et al. 2011 - Table 2, incorrect units recorded in ECOTOX
which((data_slim == "AI ppm food"), arr.ind=TRUE)
row col
[1,] 1570 18
[2,] 1571 18
[3,] 1572 18
data_slim[1570,18] <- "AI ng/org" # correct dataset
data_slim[1571,18] <- "AI ng/org" # correct dataset
data_slim[1572,18] <- "AI ng/org" # correct dataset
# Lindberg et al. 2000 - for some reason one entry from Fig. 2 was recorded differently from others
data_slim[245,21] <- "other" # correct dataset
data_slim[245,8] <- "Multiple routes between application groups" # correct dataset
# Read in unit key and join to dataset
unit_key <- read.csv("../keys/ecotox_unit_key.csv",
na.strings=c("na"))
unit_key <- select(unit_key, units, units_keep, units_conv) # get rid of unnecessary columns
unit_key
units units_keep units_conv
1 % n NA
2 AI lb/acre n NA
3 AI mg/L n NA
4 AI mg/org y 1.0e+03
5 AI ng y 1.0e-03
6 AI ng/org y 1.0e-03
7 AI ng/org/d n NA
8 AI oz/acre n NA
9 AI ppm n NA
10 AI ppm food n NA
11 AI ug y 1.0e+00
12 AI ug/g y 1.2e-01
13 AI ug/g bdwt y 1.2e-01
14 AI ug/ml n NA
15 AI ug/org y 1.0e+00
16 lb/gal n NA
17 mg y NA
18 mg/bee y 1.0e+03
19 mg/cm2 n NA
20 mg/org y 1.0e+03
21 ml/1000 org n NA
22 ng/org y 1.0e-03
23 pg/org y 1.0e-06
24 ppb n NA
25 ppm n NA
26 ug y 1.0e+00
27 ug/bee y 1.0e+00
28 ug/eu n NA
29 ug/g y 1.2e-01
30 ug/g bdwt y 1.2e-01
31 ug/g org y 1.2e-01
32 ug/org y 1.0e+00
33 ug/ul n NA
data_slim <- left_join(data_slim, unit_key, by="units")
# Generate new outcomes standardized to ug/bee
data_slim <- mutate(data_slim,
ld50_ugbee = ld50*units_conv,
ld50_min_ugbee = ld50_min*units_conv,
ld50_max_ugbee = ld50_max*units_conv)
# Check for obviously wierd values
ggplot(data=data_slim, aes(ld50_ugbee)) +
geom_histogram(bins=100) +
scale_x_log10(breaks = trans_breaks("log10", n=10, function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
theme_classic()
# Inspect values on the extremes of the distribution
check_units <- filter(data_slim, ld50_ugbee < 10^-4 | ld50_ugbee > 10^4)
# Ref 70351 (5 extreme obs): diazinon & benomyl; all check out from original paper
# Ref 64900 (1 extreme obs): amitraz; unit error, it was ug/bee but ECOTOX recorded pg/bee (see Table 2 in paper)
# Ref 58587 (1 extreme obs): tau-fluvalinate; not a unit error, but kind of a wierd experimental design with mix of exposure pathways
# Identify location of amitraz outlier
which((data_slim == 2.5500e-06), arr.ind=TRUE)
row col
which((data_slim == 1.57e-06), arr.ind=TRUE)
row col
[1,] 975 26
which((data_slim == 4.32e-06), arr.ind=TRUE)
row col
[1,] 975 27
# Correct dataset
data_slim[975,18] <- "ug/org"
data_slim[975,24] <- 1
data_slim[975,25] <- 2.55
data_slim[975,26] <- 1.57
data_slim[975,27] <- 4.32
print(t(data_slim[975,])) # check that it worked
975
cas "33089611"
cmpd_epa "N'-(2,4-Dimethylphenyl)-N-[[(2,4-dimethylphenyl)imino]methyl]-N-methylmethanimidanide"
species "Apis mellifera ssp. ligustica"
sp_common "Italian Honeybee"
org_source "Not coded"
stage "Adult(s)"
exp_days "1"
exp_type "Topical, general"
form "Formulation"
n NA
dose_n "' 5-8"
ld50_op NA
ld50 "2.55"
ld50_min_op NA
ld50_min "1.57"
ld50_max_op NA
ld50_max "4.32"
units "ug/org"
method "Not coded"
ref "64900"
exp_grp "contact"
dose_min "5"
units_keep "y"
units_conv "1"
ld50_ugbee "2.55"
ld50_min_ugbee "1.57"
ld50_max_ugbee "4.32"
summary(data_slim)
cas
50293 : 79
138261413: 75
1332407 : 73
153719234: 66
210880925: 49
2702729 : 45
(Other) :1281
cmpd_epa
1,1'-(2,2,2-Trichloroethylidene)bis[4-chlorobenzene] : 79
(2E)-1-[(6-Chloro-3-pyridinyl)methyl]-N-nitro-2-imidazolidinimine : 75
Copper chloride oxide hydrate : 73
3-[(2-Chloro-5-thiazolyl)methyl]tetrahydro-5-methyl-N-nitro-4H-1,3,5-oxadiazin-4-imine: 66
[C(E)]-N-[(2-Chloro-5-thiazolyl)methyl]-N'-methyl-N''-nitroguanidine : 49
2-(2,4-Dichlorophenoxy)acetic acid sodium salt (1:1) : 45
(Other) :1281
species sp_common
Apis mellifera :1345 Carniolan Honey Bee: 206
Apis mellifera ssp. carnica : 206 European Dark Bee : 20
Apis mellifera ssp. caucasica: 4 European Honey Bee : 4
Apis mellifera ssp. ligustica: 93 Honey Bee :1345
Apis mellifera ssp. mellifera: 20 Italian Honeybee : 93
org_source stage
Captive breeding colony: 4 Adult(s) :771
Laboratory : 60 Larva(e) : 96
Not coded :1182 New, newly or recent hatch : 4
Not reported : 413 Not coded :257
Wild : 9 Not reported :533
Organisms at different lifestages: 2
Post-emergence : 5
exp_days exp_type
Min. : 0.0139 Topical, general :778
1st Qu.: 1.0000 Food :458
Median : 2.0000 Oral via capsule :146
Mean : 2.0055 Multiple routes between application groups:121
3rd Qu.: 2.0000 Dermal : 46
Max. :21.0000 Direct application : 42
NA's :117 (Other) : 77
form n dose_n ld50_op
Active ingredient:925 Min. :10.00 2 : 115 < : 34
Formulation :407 1st Qu.:10.00 ' 5-6 : 113 <= : 2
Not coded :252 Median :10.00 >=4 : 83 > : 446
Total : 84 Mean :18.24 8 : 83 ~ : 18
3rd Qu.:30.00 5 : 61 NA's:1168
Max. :60.00 (Other): 172
NA's :1526 NA's :1041
ld50 ld50_min_op ld50_min ld50_max_op
Min. :0.000e+00 < : 1 Min. : 0.00 < : 29
1st Qu.:1.000e+00 > : 30 1st Qu.: 0.12 NA's:1639
Median :1.200e+01 NA's:1637 Median : 2.08
Mean :6.276e+05 Mean : 90.65
3rd Qu.:1.000e+02 3rd Qu.: 22.30
Max. :1.020e+09 Max. :7480.00
NA's :39 NA's :1375
ld50_max units
Min. : 0.001 ug/org :924
1st Qu.: 0.190 ug/bee :207
Median : 3.900 AI ng/org:125
Mean : 143.196 ug : 74
3rd Qu.: 43.890 ng/org : 54
Max. :12000.000 (Other) :283
NA's :1375 NA's : 1
method
European and Mediterranean Plant Protection Organization: 2
Not coded :512
Not reported :435
Organization for Economic Cooperation and Development : 19
United States Environmental Protection Agency :700
ref exp_grp dose_min units_keep
Min. : 344 contact:886 Min. : 2.000 n : 126
1st Qu.: 344 oral :619 1st Qu.: 4.000 y :1541
Median : 39264 other :163 Median : 5.000 NA's: 1
Mean : 52142 Mean : 5.242
3rd Qu.: 91623 3rd Qu.: 7.000
Max. :173554 Max. :11.000
NA's :1041
units_conv ld50_ugbee ld50_min_ugbee
Min. : 0.001 Min. :0.00e+00 Min. : 0.0000
1st Qu.: 1.000 1st Qu.:0.00e+00 1st Qu.: 0.0346
Median : 1.000 Median :7.00e+00 Median : 0.1400
Mean : 11.292 Mean :6.80e+05 Mean : 29.5882
3rd Qu.: 1.000 3rd Qu.:9.10e+01 3rd Qu.: 5.8350
Max. :1000.000 Max. :1.02e+09 Max. :1489.0000
NA's :139 NA's :165 NA's :1398
ld50_max_ugbee
Min. : 0.0011
1st Qu.: 0.0639
Median : 0.2500
Mean : 51.2546
3rd Qu.: 9.1475
Max. :2962.7200
NA's :1398
Notes
# Are some of these patterns explained by source/method?
method <- data_slim %>%
group_by(method) %>%
summarise(obs = length(ld50_ugbee),
n_doses = sum(!is.na(dose_min)),
n_orgs = sum(!is.na(n)),
n_exp_days = sum(!is.na(exp_days)),
n_ld50_min = sum(!is.na(ld50_min)))
as.data.frame(t(method))
V1
method European and Mediterranean Plant Protection Organization
obs 2
n_doses 0
n_orgs 0
n_exp_days 2
n_ld50_min 0
V2 V3
method Not coded Not reported
obs 512 435
n_doses 373 235
n_orgs 0 142
n_exp_days 397 435
n_ld50_min 130 53
V4
method Organization for Economic Cooperation and Development
obs 19
n_doses 19
n_orgs 0
n_exp_days 19
n_ld50_min 19
V5
method United States Environmental Protection Agency
obs 700
n_doses 0
n_orgs 0
n_exp_days 698
n_ld50_min 91
Notes
This section is intended to aid in the interpretation of the min/max values reported in ECOTOX. I identified references with min/max values, then consulted the original papers to determine the units in which these values were reported, and then added them back to the dataset.
# First, create table of refs that report min/max
min_max <- data_slim %>%
filter(ld50_min_ugbee >= 0|ld50_max_ugbee >= 0) %>%
group_by(ref) %>%
summarise(obs_range = length(ref)) %>%
left_join(ref_key, by='ref') %>%
arrange(desc(obs_range))
# Export min/max table for editing, with associated date
minmax_path <- paste("../keys/","ecotox_min_max","_",data_date,".csv",sep="")
write.csv(min_max, minmax_path, row.names=FALSE)
# Next, check original references and create new column for the range units (google scholar search, refs saved in EndNote, csv filed edited outside of R)
# CI_95 = 95% confidence interval
# FL = fiducial limits
# Load revised min/max table
minmax_key_path <- paste("../keys/","ecotox_min_max_units","_",data_date,".csv",sep="")
min_max_key <- read.csv(minmax_key_path)
range_key <- select(min_max_key,
ref,range_units)
# Add column to dataset indicating range units
data_slim <- data_slim %>%
left_join(range_key, by="ref")
data_slim <- within(data_slim, range_units[is.na(ld50_min_ugbee)] <- NA) # clean up so only rows w/ range values have units
This section fills in missing life stage information. I identified references with missing life stage values, then consulted the original papers to determine life stages, and added them back into the dataset.
# First, create table of refs that do not report life stage
stage_check <- data_slim %>%
filter(stage=="Not coded"|stage=="Not reported") %>%
group_by(ref, stage) %>%
summarise(stage_missing = length(stage)) %>%
left_join(ref_key, by='ref') %>%
arrange(desc(stage_missing)) %>%
select(ref,stage,obs,stage_missing,auth,year, pub, title)
# Export life stage table for editing, with associated date
stage_path <- paste("../keys/","ecotox_stage","_",data_date,".csv",sep="")
write.csv(stage_check, stage_path, row.names=FALSE)
# Next, check original references and create new column of 'confirmed' life stage info (google scholar search, refs saved in EndNote)
# When studies included multiple methodologies (e.g. adults + larvae), check the individual data points to see what they correspond to in the paper
# Load revised life stage lookup table
stage_key_path <- paste("../keys/","ecotox_stage_fixed","_",data_date,".csv",sep="")
stage_new <- read.csv(stage_key_path)
stage_key <- stage_new %>%
select(ref, stage, stage_new)
stage_key$stage_new <- as.factor(stage_key$stage_new)
# Fix the stage column with updated information
data_slim_stage_new <- data_slim %>%
filter(stage=="Not coded"|stage=="Not reported") %>%
left_join(stage_key, by=c("ref","stage")) %>%
select(-stage) %>%
rename(stage = stage_new)
Warning: Column `stage` joining factors with different levels, coercing to
character vector
# Save records that already had stage info
data_slim_stage_old <- data_slim %>%
filter((stage!="Not coded" & stage!="Not reported")|is.na(stage)==TRUE)
# Save corrected full dataframe
data_slim_fixed <- data_slim_stage_new %>%
rbind(data_slim_stage_old)
summary(data_slim_fixed)
cas
50293 : 79
138261413: 75
1332407 : 73
153719234: 66
210880925: 49
2702729 : 45
(Other) :1281
cmpd_epa
1,1'-(2,2,2-Trichloroethylidene)bis[4-chlorobenzene] : 79
(2E)-1-[(6-Chloro-3-pyridinyl)methyl]-N-nitro-2-imidazolidinimine : 75
Copper chloride oxide hydrate : 73
3-[(2-Chloro-5-thiazolyl)methyl]tetrahydro-5-methyl-N-nitro-4H-1,3,5-oxadiazin-4-imine: 66
[C(E)]-N-[(2-Chloro-5-thiazolyl)methyl]-N'-methyl-N''-nitroguanidine : 49
2-(2,4-Dichlorophenoxy)acetic acid sodium salt (1:1) : 45
(Other) :1281
species sp_common
Apis mellifera :1345 Carniolan Honey Bee: 206
Apis mellifera ssp. carnica : 206 European Dark Bee : 20
Apis mellifera ssp. caucasica: 4 European Honey Bee : 4
Apis mellifera ssp. ligustica: 93 Honey Bee :1345
Apis mellifera ssp. mellifera: 20 Italian Honeybee : 93
org_source exp_days
Captive breeding colony: 4 Min. : 0.0139
Laboratory : 60 1st Qu.: 1.0000
Not coded :1182 Median : 2.0000
Not reported : 413 Mean : 2.0055
Wild : 9 3rd Qu.: 2.0000
Max. :21.0000
NA's :117
exp_type form
Topical, general :778 Active ingredient:925
Food :458 Formulation :407
Oral via capsule :146 Not coded :252
Multiple routes between application groups:121 Total : 84
Dermal : 46
Direct application : 42
(Other) : 77
n dose_n ld50_op ld50
Min. :10.00 2 : 115 < : 34 Min. :0.000e+00
1st Qu.:10.00 ' 5-6 : 113 <= : 2 1st Qu.:1.000e+00
Median :10.00 >=4 : 83 > : 446 Median :1.200e+01
Mean :18.24 8 : 83 ~ : 18 Mean :6.276e+05
3rd Qu.:30.00 5 : 61 NA's:1168 3rd Qu.:1.000e+02
Max. :60.00 (Other): 172 Max. :1.020e+09
NA's :1526 NA's :1041 NA's :39
ld50_min_op ld50_min ld50_max_op ld50_max
< : 1 Min. : 0.00 < : 29 Min. : 0.001
> : 30 1st Qu.: 0.12 NA's:1639 1st Qu.: 0.190
NA's:1637 Median : 2.08 Median : 3.900
Mean : 90.65 Mean : 143.196
3rd Qu.: 22.30 3rd Qu.: 43.890
Max. :7480.00 Max. :12000.000
NA's :1375 NA's :1375
units
ug/org :924
ug/bee :207
AI ng/org:125
ug : 74
ng/org : 54
(Other) :283
NA's : 1
method
European and Mediterranean Plant Protection Organization: 2
Not coded :512
Not reported :435
Organization for Economic Cooperation and Development : 19
United States Environmental Protection Agency :700
ref exp_grp dose_min units_keep
Min. : 344 contact:886 Min. : 2.000 n : 126
1st Qu.: 344 oral :619 1st Qu.: 4.000 y :1541
Median : 39264 other :163 Median : 5.000 NA's: 1
Mean : 52142 Mean : 5.242
3rd Qu.: 91623 3rd Qu.: 7.000
Max. :173554 Max. :11.000
NA's :1041
units_conv ld50_ugbee ld50_min_ugbee
Min. : 0.001 Min. :0.00e+00 Min. : 0.0000
1st Qu.: 1.000 1st Qu.:0.00e+00 1st Qu.: 0.0346
Median : 1.000 Median :7.00e+00 Median : 0.1400
Mean : 11.292 Mean :6.80e+05 Mean : 29.5882
3rd Qu.: 1.000 3rd Qu.:9.10e+01 3rd Qu.: 5.8350
Max. :1000.000 Max. :1.02e+09 Max. :1489.0000
NA's :139 NA's :165 NA's :1398
ld50_max_ugbee range_units
Min. : 0.0011 CI : 16
1st Qu.: 0.0639 CI_95 : 138
Median : 0.2500 EPA/unknown: 87
Mean : 51.2546 FL : 16
3rd Qu.: 9.1475 FL_95 : 12
Max. :2962.7200 unknown : 1
NA's :1398 NA's :1398
stage
Adult(s) :1269
EPA/unknown : 282
Larva(e) : 105
Post-emergence : 5
New, newly or recent hatch : 4
Organisms at different lifestages: 2
(Other) : 1
Select only those observations that meet the following criteria:
Also re-select variables of interest to remove non-standardized LD50 values and re-organize columns.
data_clean <- data_slim_fixed %>%
filter(exp_days<=4,
(exp_grp=="contact"|exp_grp=="oral"),
(stage=="Adult(s)"|stage=="EPA/unknown"),
ld50_ugbee>0) %>%
select(cas,
cmpd_epa,
species,
sp_common,
org_source,
stage,
exp_days,
form,
n,
dose_min,
exp_type,
exp_grp,
ld50_op,
ld50_min_op,
ld50_max_op,
ld50_ugbee,
ld50_min_ugbee,
ld50_max_ugbee,
range_units,
method,
ref)
# Load key relating USGS names and CAS numbers
# See metadata in excel file for how it was generated
cas_key <- read_excel("../keys/USGS_Pesticide-CASRN.xlsx",1)
head(cas_key, 10)
# A tibble: 10 x 5
compound_state compound_cty cas cmpd_mult cas_mult
<chr> <chr> <chr> <chr> <chr>
1 1METHYLCYCLOPROPENE 1-METHYLCYCLOPROPENE 3100047 n n
2 24D 2,4-D 94757 n y
3 24D 2,4-D 2702729 n y
4 24DB 2,4-DB 94826 n n
5 6BENZYLADENINE 6-BENZYLADENINE 1214397 n n
6 ABAMECTIN ABAMECTIN 71751412 n n
7 ACEPHATE ACEPHATE 30560191 n n
8 ACEQUINOCYL ACEQUINOCYL 57960197 n n
9 ACETAMIPRID ACETAMIPRID 135410207 n n
10 ACETOCHLOR ACETOCHLOR 34256821 n n
cas_key_slim <- select(cas_key,
cmpd_usgs = compound_state,
cas = cas)
# Load category key & add combined category/class column
# See metadata in excel file for how it was generated
cat_key <- read_excel("../keys/USGS_Pesticide-Category.xlsx",1)
head(cat_key, 10)
# A tibble: 10 x 7
compound compound_cty compound_2017 category class group source
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1METHYLCYC… 1-METHYLCYCLO… 1-METHYLCYCLO… PGR unclass… NC none
2 24D 2,4-D 2,4-D H Phenoxy… O HRAC
3 24DB 2,4-DB 2,4-DB H Phenoxy… O HRAC
4 6BENZYLADE… 6-BENZYLADENI… 6-BENZYLADENI… PGR biologi… NC none
5 ABAMECTIN ABAMECTIN ABAMECTIN I avermec… 6 IRAC
6 ACEPHATE ACEPHATE ACEPHATE I organop… 1B IRAC
7 ACEQUINOCYL ACEQUINOCYL ACEQUINOCYL I unclass… 20B IRAC
8 ACETAMIPRID ACETAMIPRID ACETAMIPRID I neonico… 4A IRAC
9 ACETOCHLOR ACETOCHLOR ACETOCHLOR H Chloroa… K3 HRAC
10 ACIBENZOLAR ACIBENZOLAR ACIBENZOLAR F benzo-t… P01 FRAC
# Create new variable that is Category-Group
cat_key$cat_grp <- sub('-NA$', '',
paste(cat_key$category, cat_key$group,
sep = "-"))
# Remove unnecessary columns
cat_key_slim <- select(cat_key,
cmpd_usgs = compound,
cat = category,
cat_grp = cat_grp,
class = class,
grp_source = source)
# Create combined key w/ USGS names and categories
comb_key <- left_join(cas_key_slim, cat_key_slim, by="cmpd_usgs")
comb_key$cat_grp <- as.factor(comb_key$cat_grp)
head(comb_key, n=10) # view sample of key
# A tibble: 10 x 6
cmpd_usgs cas cat cat_grp class grp_source
<chr> <chr> <chr> <fct> <chr> <chr>
1 1METHYLCYCLOPROPE… 3100047 PGR PGR-NC unclassified none
2 24D 94757 H H-O Phenoxy-carboxylic… HRAC
3 24D 2702729 H H-O Phenoxy-carboxylic… HRAC
4 24DB 94826 H H-O Phenoxy-carboxylic… HRAC
5 6BENZYLADENINE 1214397 PGR PGR-NC biological none
6 ABAMECTIN 71751412 I I-6 avermectin IRAC
7 ACEPHATE 30560191 I I-1B organophosphate IRAC
8 ACEQUINOCYL 57960197 I I-20B unclassified IRAC
9 ACETAMIPRID 1354102… I I-4A neonicotinoid IRAC
10 ACETOCHLOR 34256821 H H-K3 Chloroacetamide HRAC
# Merge USGS names and categories with LD50 dataset
# Ignore warning - just converting factor to character vector
# Note this is a full join - i.e. USGS compounds that don't match any ECOTOX value will be added to the dataset w/ NAs for missing data
data_full <- full_join(data_clean, comb_key, by="cas")
Warning: Column `cas` joining factor and character vector, coercing into
character vector
str(data_full)
'data.frame': 1643 obs. of 26 variables:
$ cas : chr "50293" "50293" "50293" "50293" ...
$ cmpd_epa : Factor w/ 456 levels "(1-Methylethyl)phosphoramidic acid ethyl-3-methyl-4-(methylthio)phenyl ester",..: 73 73 73 73 257 404 400 434 425 425 ...
$ species : Factor w/ 5 levels "Apis mellifera",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sp_common : Factor w/ 5 levels "Carniolan Honey Bee",..: 4 4 4 4 4 4 4 4 4 4 ...
$ org_source : Factor w/ 5 levels "Captive breeding colony",..: 3 3 3 4 3 3 3 4 3 3 ...
$ stage : Factor w/ 9 levels "Adult(s)","EPA/unknown",..: 2 2 2 1 2 2 2 1 1 1 ...
$ exp_days : num 2 2 2 1 2 2 2 1 1 1 ...
$ form : Factor w/ 4 levels "Active ingredient",..: 3 3 3 1 1 1 1 1 2 1 ...
$ n : int NA NA NA NA NA NA NA NA NA NA ...
$ dose_min : num NA NA NA NA NA NA NA NA 8 5 ...
$ exp_type : Factor w/ 11 levels "Dermal","Diet, unspecified",..: 11 11 9 1 11 11 11 1 2 11 ...
$ exp_grp : Factor w/ 3 levels "contact","oral",..: 1 1 2 1 1 1 1 1 2 1 ...
$ ld50_op : Factor w/ 4 levels "<","<=",">","~": NA NA NA NA 3 NA NA NA NA NA ...
$ ld50_min_op : Factor w/ 2 levels "<",">": NA NA NA NA NA NA NA NA NA NA ...
$ ld50_max_op : Factor w/ 1 level "<": NA NA NA NA NA NA NA NA NA NA ...
$ ld50_ugbee : num 6.4 3.9 3.7 13.7 11 ...
$ ld50_min_ugbee: num NA NA NA NA NA ...
$ ld50_max_ugbee: num NA NA NA NA NA ...
$ range_units : Factor w/ 6 levels "CI","CI_95","EPA/unknown",..: NA NA NA NA NA NA NA NA NA 2 ...
$ method : Factor w/ 5 levels "European and Mediterranean Plant Protection Organization",..: 5 5 5 2 5 5 5 2 2 2 ...
$ ref : int 344 344 344 35123 344 344 344 35123 101175 119503 ...
$ cmpd_usgs : chr NA NA NA NA ...
$ cat : chr NA NA NA NA ...
$ cat_grp : Factor w/ 105 levels "F-1","F-11","F-12",..: NA NA NA NA 104 80 NA 80 NA NA ...
$ class : chr NA NA NA NA ...
$ grp_source : chr NA NA NA NA ...
new_filename<-paste(filename,"_clean", sep="")
new_path<-paste("../output/",new_filename,".csv", sep="")
write.csv(data_full, file=new_path)
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_1.0.0 readxl_1.3.1 forcats_0.4.0 stringr_1.4.0
[5] dplyr_0.8.3 purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
[9] tibble_2.1.3 ggplot2_3.2.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 cellranger_1.1.0 pillar_1.4.2 compiler_3.6.1
[5] tools_3.6.1 zeallot_0.1.0 digest_0.6.20 lubridate_1.7.4
[9] jsonlite_1.6 evaluate_0.14 nlme_3.1-140 gtable_0.3.0
[13] lattice_0.20-38 pkgconfig_2.0.2 rlang_0.4.0 cli_1.1.0
[17] rstudioapi_0.10 yaml_2.2.0 haven_2.1.1 xfun_0.8
[21] withr_2.1.2 xml2_1.2.0 httr_1.4.0 knitr_1.23
[25] vctrs_0.2.0 hms_0.5.0 generics_0.0.2 fs_1.3.1
[29] grid_3.6.1 tidyselect_0.2.5 glue_1.3.1 R6_2.4.0
[33] fansi_0.4.0 rmarkdown_1.14 modelr_0.1.4 magrittr_1.5
[37] backports_1.1.4 htmltools_0.3.6 rvest_0.3.4 assertthat_0.2.1
[41] colorspace_1.4-1 labeling_0.3 utf8_1.1.4 stringi_1.4.3
[45] lazyeval_0.2.2 munsell_0.5.0 broom_0.5.2 crayon_1.3.4
This R Markdown site was created with workflowr