xml2: grab text in array after specified text in same row

Question

I have XML with a bunch of envelope elements. Inside of each one is an array. Each row in the array has 2 elements. The first is an identifier and the second is the text I want to grab. I need the first value of the row to identify the correct row so that I can grab the correct value.

In the example below I have 'food' in rows denoted with the code 610954. I want to grab the 2 elements after this code (c('pizza', 'burger'). Likewise there are 'drinks' denoted by the code 605380. I want to grab c('coke', 'pepsi'). How can I use the xml2 package to do this?

library(xml2)
library(magrittr)

myxml <- read_xml('
<inside>
 <envelope>
  <card-entries type="array">
     <card-entry>
       <card-id type="integer">605380</card-id>
       <value>coke</value>
     </card-entry>
     <card-entry>
       <card-id type="integer">610954</card-id>
       <value>pizza</value>
     </card-entry>  
   </card-entries>
 </envelope>
 <envelope>
  <card-entries type="array">
     <card-entry>
       <card-id type="integer">605380</card-id>
       <value>pepsi</value>
     </card-entry>
     <card-entry>
       <card-id type="integer">610954</card-id>
       <value>burger</value>
     </card-entry>  
   </card-entries>
 </envelope>
</inside>
'
)

## as far as I can parse it (but not specific enough)
myxml %>%
    xml_find_all('//envelope/card-entries[@type="array"]/card-entry') %>%
    xml_text()

food <- -CODE THAT GIVES HERE c('pizza', 'burger')- # 610954
drinks <- -CODE THAT GIVES HERE c('coke', 'pepsi')- # 605380

Show source
| R   | xml   | xml2   2017-01-06 20:01 2 Answers

Answers to xml2: grab text in array after specified text in same row ( 2 )

  1. 2017-01-06 22:01

    Your original approach could be modified like this to get the drinks:

    myxml %>%
      xml_find_all('//envelope/card-entries[@type="array"]/card-entry[card-id = "605380"]/value') %>%
      xml_text()
    #[1] "coke"  "pepsi"
    

    But you could go with a variety of other approaches

    # get following sibling called value
    myxml %>% 
      # foods
      xml_find_all('//card-id[text()="610954"]/following-sibling::value') %>%
      xml_text()
    #[1] "pizza"  "burger"
    
    # get following::value[1] - Specify [1] or you would get all following values, 
    # including "pepsi".  With value[1] you get only the following value.
    myxml %>% 
      # foods
      xml_find_all('//card-id[text()="610954"]/following::value[1]') %>%
      xml_text()
    #[1] "pizza"  "burger"
    
    # look for value nodes with a preceding sibling with the appropriate card-id
    myxml %>% 
      # drinks
      xml_find_all('//value[preceding-sibling::card-id[text()="605380"]]') %>%
      xml_text()
    #[1] "coke"  "pepsi"
    
    # Get value node that is a child of card-entry nodes with the appropriate card-id.
    # specifically looking in envelope elements
    myxml %>% 
      # drinks
      xml_find_all('//envelope/card-entries/card-entry[card-id = "605380"]/value') %>%
      xml_text()
    #[1] "coke"  "pepsi"
    
    # less specific
    myxml %>% 
      xml_find_all('//card-entry[card-id = "605380"]/value') %>%
      xml_text()
    #[1] "coke"  "pepsi"
    
  2. 2017-01-06 22:01

    How about something like:

    library(tidyverse)
    library(stringr)
    myxml %>%
      xml_find_all('//envelope/card-entries[@type="array"]/card-entry') %>%
      xml_text() %>% 
      map(.f = str_sub, start = c(1, 7), end = c(6, 1000000L)) %>% 
      reduce(rbind) %>% 
      as_tibble() %>% 
      mutate(type = ifelse(V1 == 605380, yes = "drinks", no = "food"))
    

    And then you can easily subset drinks and food separately.

Leave a reply to - xml2: grab text in array after specified text in same row

◀ Go back