US20140122511A1 - Framework for generating programs to process beacons - Google Patents
Framework for generating programs to process beacons Download PDFInfo
- Publication number
- US20140122511A1 US20140122511A1 US13/660,788 US201213660788A US2014122511A1 US 20140122511 A1 US20140122511 A1 US 20140122511A1 US 201213660788 A US201213660788 A US 201213660788A US 2014122511 A1 US2014122511 A1 US 2014122511A1
- Authority
- US
- United States
- Prior art keywords
- beacons
- information
- objects
- beacon
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
Definitions
- a user may view a video in a media player.
- the companies often seek to improve their service by analyzing events that occur while the users are using their client devices. For example, while viewing the video, the user performs different actions, such as seeking to different times in the video, stopping the video, hovering over icons, etc.
- Web requests are generated to document the actions taken at the client devices (also referred to as “beacons”).
- a server may aggregate information, such as the IP address of the computer being used; the time the material was viewed; the type of browser that was used, the type of action taken by the user, etc.
- the beacons are logged and aggregated for the company.
- the beacons include information that is in an unstructured format.
- the unstructured format is not in a pre-defined data model that a company can easily store in a structured database. For example, many analysis applications are keyed to retrieve data in fields in a structured database.
- the beacons do not include data that can easily be stored in the correct fields. Thus, if a company is going to analyze the information in the beacons, the company needs to transform the unstructured data into structured data.
- the structured data organizes the data in a format desired by the company where the company can then analyze the structured data.
- a method receives a specification for processing beacons.
- the beacons are associated with an event occurring at a client while a user is interacting with a web application and include unstructured data.
- the method then parses the specification to determine an object model including objects determined from the specification where different specifications are parsed into a format of the object model.
- a generator is determined from a set of generators. Each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model.
- the method then runs the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
- a non-transitory computer-readable storage medium containing instructions, that when executed, control a computer system to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
- an apparatus comprising: one or more computer processors; and a computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
- FIG. 1 depicts a simplified system for processing beacons according to one embodiment.
- FIG. 2 shows an example of a compiler according to one embodiment.
- FIG. 3 depicts a simplified flowchart for generating target programs according to one embodiment.
- FIG. 4 shows a specification according to one embodiment.
- FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects.
- FIG. 6 shows an example of a target program according to one embodiment.
- Described herein are techniques for a framework for processing beacons.
- numerous examples and specific details are set forth in order to provide a thorough understanding of particular embodiments.
- Particular embodiments as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
- FIG. 1 depicts a simplified system 100 for processing beacons according to one embodiment.
- System 100 includes clients 102 , a server 104 , beacon target programs 106 , and a beacon target program generation compiler 108 .
- the beacons may include unicode strings and URL encoded binary strings. To obtain any further semantic meaning of the beacon data, the beacon data needs to be interpreted and transformed by target programs.
- beacons are described, which may be web event logs for events that occur while users use clients 102 , other types of unstructured data may be appreciated.
- beacons may also include extensible mark-up language (XML) specifications, hypertext transfer mark-up language (HTML) code, and other human-readable documentation.
- XML extensible mark-up language
- HTML hypertext transfer mark-up language
- Users interact with clients 102 to produce events. For example, users may interact with websites on the worldwide web (WWW), such as through mouse clicks, hovering over objects, and other user interactions with web pages.
- WWW worldwide web
- Beacons are created based on the events and include information for the actions taken by the users and may also include other metadata about the event.
- the metadata may include user identification information, what platform (e.g., device type or operating system) is being used, what application is being used, etc.
- the beacons may be unstructured data. Also, different clients 102 and different web sites may generate beacons in different formats.
- a server 104 receives and stores the beacons for later processing.
- server 104 may aggregate beacons from multiple network devices.
- server 104 may be a distributed system of servers that are storing the beacons.
- server 104 stores the beacons, but other storage devices may store the beacons.
- target programs 106 may be executed to process the beacons.
- target programs 106 may determine beacons that are of interest and then transform the unstructured data of the beacons into structured data that can be used by a company. For example, different target programs 106 may be interested in different types of beacons. Each target program 106 would identify the applicable beacons. Then, target programs 106 transform the unstructured data into structured data.
- the structured data may be stored in a database for later querying, such as to generate reports.
- compiler 108 receives a specification and uses the specification to automatically generate a target program 106 .
- compiler 108 uses the specification to automatically generate a target program 106 .
- Using the specification allows users to declaratively specify what beacons are of interest and what structured data is desired.
- Compiler 108 then generates target programs 106 that can process the beacons and perform the desired transformations from unstructured data to structured data.
- FIG. 2 shows a more detailed example of compiler 108 according to one embodiment.
- Specifications 202 may be written using a specific grammar that declares what beacons are of interest and what structured data is desired. Users may write different specifications 202 to generate different structured data from different beacons.
- an abstract syntax tree generator 203 first converts specifications 202 into abstract syntax trees 204 .
- the abstract syntax tree is an abstract way of representing the syntax of different specifications 202 .
- an abstract syntax tree is a tree representation of the syntactic structure of the input program. The syntax tree is built through the use of a parser, which produces a tree representation of the input program based on a grammar specification.
- An object model generator 205 uses the abstract syntax trees to generate object models 206 .
- Object models 206 convert nodes of the abstract syntax tree into objects that are in the object model.
- the object model is used such that generators 208 can be written to read a specific format defined in the object model. This allows generators 208 to be reused to process different specifications 202 .
- specifications 202 may be written and parsed into object models 206 .
- object models 206 with different objects may be generated, but the same generators 208 may be used.
- the same generator 208 may be used because each generator 208 is configured to parse the same format of an object model 206 .
- the object model is a simplified and generalized view of the input specification based on the abstract syntax tree. The object model is generated by passing over the abstract syntax tree multiple times. Specification correctness checks may be performed (semantic analysis), symbols may be resolved (e.g., various references that must be resolved and disambiguated), and a simplified structure is created (called the object model) so that generators 208 can be written more concisely.
- Object models 206 are in a format that can be read by different generators 208 - 1 - 208 -N.
- Each generator 208 - 1 - 208 -N may generate target programs #1-N, respectively.
- some generators 208 may generate MapReduce source code, structured query language (SQL) queries, representational state transfer (REST) requests, HTML documentation, and other target programs.
- Each generator 208 may be written to process the formats of object models 206 and thus multiple generators 208 do not need to be written for different specifications 202 . That is, if MapReduce code is desired, the same MapReduce generator 208 is used for multiple specifications 202 .
- the objects in object model 206 may change, but the same generator 208 may be used.
- FIG. 3 depicts a simplified flowchart for generating target programs 106 according to one embodiment.
- compiler 108 receives a specification 202 .
- Specification 202 specifies which beacons to process and what transformations of the unstructured data to specified structured data are desired.
- specification 202 does not include code that is used to process beacons and transform the unstructured data to structured data.
- compiler 108 may parse the specification for correctness. For example, compiler 108 parse the specification for semantic correctness, such as compiler 108 may determine that a basefact is referencing a beacon that is not defined.
- compiler 108 parses specification 202 into an abstract syntax tree 204 .
- the abstract syntax tree organizes the elements of specification 202 into a tree structure.
- compiler 108 converts abstract syntax tree 204 into an object model 206 .
- compiler 108 parses nodes of abstract syntax tree 204 to generate object model 206 .
- Object model 206 organizes specification 202 into objects.
- compiler 108 determines a generator 208 for a target program 106 .
- compiler 108 may receive a user selection of a generator 208 .
- the selected generator 208 is configured to produce a specific type of target program 106 .
- compiler 108 generates target program 106 for generator 208 based on object model 206 .
- FIG. 4 shows a specification 202 according to one embodiment.
- Specification 202 produces a target program 106 to convert a video ID to a video name, transform a browser name for the browser used to play a video to a browser name, and count the number of times the video was played. It should be noted that specification 202 may not be a complete specification and has parts redacted, such as when a “ . . . ” is shown.
- Specification 202 includes three sections of “composite”, “beacon”, and “basefact”.
- a composite defines what is in the beacon, such as the raw data that is in the beacon, and how to transform the raw data in the beacon.
- three composite objects of “Video”, “Browser”, and “Count” are shown. Composites may have any number of input fields and one or more output fields.
- the Video composite object has an input parameter object named “video_id”. This is what the beacon parameter name is in a raw log line.
- the unstructured data may include the term “video_id”.
- the Video composite object includes an output field object called “video_name”. This is the field name after video_id is transformed.
- a mapper object for “MapReduceJob” includes transformational logic for the output field object video_name.
- the mapper object includes details for performing the transformation that is specified in the mapper definition located at conversionMethod. Additional mappers may also be included in a composite object that may perform other transformations.
- other composite objects of “Browser” and “Count” are included. Details have not been provided, but would be similar to those found in the Video composite object. It will be understood that specification 202 may include any number of composite objects 402 . For example, specification 202 may include additional composite objects (not shown) that may be used by other beacon objects.
- a beacon object is identified as “playback_start” and uniquely identifies the beacon within specification 202 . Because specification 202 may include multiple composite objects, the beacon object identifies which composite objects are part of this beacon object.
- the beacon includes three field objects: “selected_video”, which references the Video composite object; “user_browser”, which references the Browser composite object; and “count”, which references the Count composite object. The field objects are used to refer back to composite objects.
- specification 202 defines a basefact object of “start_by_video_and_browser”.
- the basefact object is used to define what structured data is desired and what unstructured data should be used to populate the structured data.
- the basefact object may use multiple basefacts objects. For example, this basefact object uses the “playback_start” beacon object to determine applicable data. That is, this basefact ignores all other beacon objects that are not named “playback_start” in specification 202 .
- the basefact object includes three structured data field objects for the “playback_start” beacon.
- the structured data fields may be different types, such as dimension or fact fields.
- a dimension maps a field in the beacon to a structured data field.
- a fact may perform a function (e.g., an aggregation function) on a field in the beacon to determine a result that is mapped to a structured data field.
- a first structured data field of “videoName” is defined as a dimension of the video_name field object in the composite object referenced by the selected_video field object in the beacon object and a second structured data field of “browserName” is defined as a dimension from the name field object in the composite object referenced by the user_browser field object in the beacon object.
- a third structured data field of “totalCount” is defined as a fact that is the aggregation of the count field object in the composite object referenced by the count field object in the beacon object.
- compiler 108 selects a generator 208 that is used to generate a target program 106 .
- compiler 108 converts specification 202 into object model 206 .
- Generator 208 takes object model 206 and generates code in a software language that is used to process beacons.
- compiler 108 generates MapReduce job code as a target program 106 .
- Target program 106 is configured to receive unstructured data, such as raw web event log lines, and generate structured data specified by the starts_by_video_and_browser basefact definition. That is, transformed data from the beacons is stored in structured data fields of videoName, browserName, and totalCount.
- FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects that generator 208 analyzes to generate code for target program 106 .
- generator 208 identifies the beacon object for the basefact object.
- specification 202 may include multiple beacon objects and the beacon object for this basefact object is the playback_start beacon object.
- Generator 208 generates filtering code that determines which beacons should be processed by target program 106 .
- the structured data field objects in the basefact object point to field objects in the beacon object at 504 .
- selected_video, user_browser, and count are referenced in both the basefact and the beacon objects.
- the field objects in the beacon object are associated with composite objects.
- Generator 208 uses the referenced composite objects from the beacon object to generate instructions on how to map unstructured data to structured data. For example, generator 208 generates instructions on how to tokenize (breaking the text of the beacon into words or phrases) and transform raw web log data to structured data.
- the basefact object defines the structured data by the terms videoName, browserName, and totalCount, which are structured data fields that can be defined in a database.
- the transformations for the field objects in the basefact object are specified in the composite object that each beacon field object references as was discussed with respect to 506 . Also, for the fact field object, generator 208 generates instructions to aggregate rows based on the count composite object.
- Target program 106 can then be used to process beacons and produce the transformed data as specified in the basefact definition.
- FIG. 6 shows an example of target program 106 according to one embodiment.
- Generator 208 may generate target program 106 based on specification 202 and object model 206 .
- the function “Map” defines the aggregator/reducer based on the MapReduce paradigm. Dimensions correspond to Keys, and Facts correspond to Values.
- the term “playback_start” is based on which beacons were defined by specification 202 . In this case, only events defined by playback_start beacons are reviewed.
- the conversion found in the composite Video is found, and at 612 , the conversion found in the composite Browser is found.
- the functions “Identity ⁇ Long>( )” and “StaticInputAction ⁇ Long>(1L)” are determined based on the fact “sum” in the basefact in specification 202 . The above information is determined by reviewing object model 206 to generate the target program 106 .
- compiler 108 generates target program 106 , which can map unstructured data to structured data.
- a user can declare the structured data that was desired and the transformations needed to transform unstructured data to structured data.
- Compiler 108 then generates the software code to perform the desired transformations. A user thus does not need to write software code for target program 106 .
- particular embodiments leverage object model 206 that allows different generators 208 to operate on the object model.
- different specifications 202 may be parsed into an object model 206 that can be operated on by the same generators 208 .
- Particular embodiments may be implemented in a non-transitory computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or machine.
- the computer-readable storage medium contains instructions for controlling a computer system to perform a method described by particular embodiments.
- the instructions when executed by one or more computer processors, may be operable to perform that which is described in particular embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
Description
- Companies provide services that users access using client devices. For example, a user may view a video in a media player. The companies often seek to improve their service by analyzing events that occur while the users are using their client devices. For example, while viewing the video, the user performs different actions, such as seeking to different times in the video, stopping the video, hovering over icons, etc. Web requests are generated to document the actions taken at the client devices (also referred to as “beacons”). For example, when a user's browser requests information from a website, a server may aggregate information, such as the IP address of the computer being used; the time the material was viewed; the type of browser that was used, the type of action taken by the user, etc. The beacons are logged and aggregated for the company.
- The beacons include information that is in an unstructured format. The unstructured format is not in a pre-defined data model that a company can easily store in a structured database. For example, many analysis applications are keyed to retrieve data in fields in a structured database. The beacons do not include data that can easily be stored in the correct fields. Thus, if a company is going to analyze the information in the beacons, the company needs to transform the unstructured data into structured data. The structured data organizes the data in a format desired by the company where the company can then analyze the structured data.
- Programs need to be written to perform the transformation of the unstructured data of the beacons into structured data. However, each type of beacon has different types of information. Thus, for each type of beacon that the company wants to analyze, a programmer needs to write a program to transform the unstructured data for the beacon to the desired type of structured data. Writing the programs to perform these transformations may be a tedious process. Also, having to write code for the programs limits the number of users that can write the programs because most users are not programmers.
- In one embodiment, a method receives a specification for processing beacons. The beacons are associated with an event occurring at a client while a user is interacting with a web application and include unstructured data. The method then parses the specification to determine an object model including objects determined from the specification where different specifications are parsed into a format of the object model. A generator is determined from a set of generators. Each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model. The method then runs the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
- In one embodiment, a non-transitory computer-readable storage medium is provided containing instructions, that when executed, control a computer system to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
- In one embodiment, an apparatus is provided comprising: one or more computer processors; and a computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
- The following detailed description and accompanying drawings provide a better understanding of the nature and advantages of particular embodiments.
-
FIG. 1 depicts a simplified system for processing beacons according to one embodiment. -
FIG. 2 shows an example of a compiler according to one embodiment. -
FIG. 3 depicts a simplified flowchart for generating target programs according to one embodiment. -
FIG. 4 shows a specification according to one embodiment. -
FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects. -
FIG. 6 shows an example of a target program according to one embodiment. - Described herein are techniques for a framework for processing beacons. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of particular embodiments. Particular embodiments as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
-
FIG. 1 depicts asimplified system 100 for processing beacons according to one embodiment.System 100 includesclients 102, aserver 104,beacon target programs 106, and a beacon targetprogram generation compiler 108. The beacons may include unicode strings and URL encoded binary strings. To obtain any further semantic meaning of the beacon data, the beacon data needs to be interpreted and transformed by target programs. Although beacons are described, which may be web event logs for events that occur while users useclients 102, other types of unstructured data may be appreciated. For example, beacons may also include extensible mark-up language (XML) specifications, hypertext transfer mark-up language (HTML) code, and other human-readable documentation. - Users interact with
clients 102 to produce events. For example, users may interact with websites on the worldwide web (WWW), such as through mouse clicks, hovering over objects, and other user interactions with web pages. Beacons are created based on the events and include information for the actions taken by the users and may also include other metadata about the event. For example, the metadata may include user identification information, what platform (e.g., device type or operating system) is being used, what application is being used, etc. The beacons may be unstructured data. Also,different clients 102 and different web sites may generate beacons in different formats. - A
server 104 receives and stores the beacons for later processing. In one example,server 104 may aggregate beacons from multiple network devices. Also,server 104 may be a distributed system of servers that are storing the beacons. In this example,server 104 stores the beacons, but other storage devices may store the beacons. - In one example,
target programs 106 may be executed to process the beacons. When executed,target programs 106 may determine beacons that are of interest and then transform the unstructured data of the beacons into structured data that can be used by a company. For example,different target programs 106 may be interested in different types of beacons. Eachtarget program 106 would identify the applicable beacons. Then,target programs 106 transform the unstructured data into structured data. The structured data may be stored in a database for later querying, such as to generate reports. - Conventionally, users would have to write
target programs 106 for each type of beacon that a company wanted to process. However, particular embodiments automatically generatetarget programs 106. For example, as will be described in more detail below,compiler 108 receives a specification and uses the specification to automatically generate atarget program 106. Using the specification allows users to declaratively specify what beacons are of interest and what structured data is desired.Compiler 108 then generatestarget programs 106 that can process the beacons and perform the desired transformations from unstructured data to structured data. By using the specification to declare what is wanted, users do not have to write a program that is used to process the beacons. This may allow more users to specify how to process beacons. - The process of generating a
target program 106 from a specification will now be described in more detail.FIG. 2 shows a more detailed example ofcompiler 108 according to one embodiment.Specifications 202 may be written using a specific grammar that declares what beacons are of interest and what structured data is desired. Users may writedifferent specifications 202 to generate different structured data from different beacons. - In one embodiment, an abstract
syntax tree generator 203first converts specifications 202 intoabstract syntax trees 204. The abstract syntax tree is an abstract way of representing the syntax ofdifferent specifications 202. In one embodiment, an abstract syntax tree is a tree representation of the syntactic structure of the input program. The syntax tree is built through the use of a parser, which produces a tree representation of the input program based on a grammar specification. - An
object model generator 205 uses the abstract syntax trees to generateobject models 206.Object models 206 convert nodes of the abstract syntax tree into objects that are in the object model. The object model is used such thatgenerators 208 can be written to read a specific format defined in the object model. This allowsgenerators 208 to be reused to processdifferent specifications 202. Because beacons may have similar formats of data,specifications 202 may be written and parsed intoobject models 206. Thus, to process different types of beacons, objectmodels 206 with different objects may be generated, but thesame generators 208 may be used. Also, even though the information that is being transformed from unstructured data to structured data may be different, thesame generator 208 may be used because eachgenerator 208 is configured to parse the same format of anobject model 206. In one embodiment, the object model is a simplified and generalized view of the input specification based on the abstract syntax tree. The object model is generated by passing over the abstract syntax tree multiple times. Specification correctness checks may be performed (semantic analysis), symbols may be resolved (e.g., various references that must be resolved and disambiguated), and a simplified structure is created (called the object model) so thatgenerators 208 can be written more concisely. -
Object models 206 are in a format that can be read by different generators 208-1-208-N. Each generator 208-1-208-N may generate target programs #1-N, respectively. For example, somegenerators 208 may generate MapReduce source code, structured query language (SQL) queries, representational state transfer (REST) requests, HTML documentation, and other target programs. Eachgenerator 208 may be written to process the formats ofobject models 206 and thusmultiple generators 208 do not need to be written fordifferent specifications 202. That is, if MapReduce code is desired, thesame MapReduce generator 208 is used formultiple specifications 202. The objects inobject model 206 may change, but thesame generator 208 may be used. -
FIG. 3 depicts a simplified flowchart for generatingtarget programs 106 according to one embodiment. At 302,compiler 108 receives aspecification 202.Specification 202 specifies which beacons to process and what transformations of the unstructured data to specified structured data are desired. In one embodiment,specification 202 does not include code that is used to process beacons and transform the unstructured data to structured data. Also,compiler 108 may parse the specification for correctness. For example,compiler 108 parse the specification for semantic correctness, such ascompiler 108 may determine that a basefact is referencing a beacon that is not defined. - At 304,
compiler 108 parsesspecification 202 into anabstract syntax tree 204. The abstract syntax tree organizes the elements ofspecification 202 into a tree structure. - At 306,
compiler 108 convertsabstract syntax tree 204 into anobject model 206. Forexample compiler 108 parses nodes ofabstract syntax tree 204 to generateobject model 206.Object model 206 organizesspecification 202 into objects. - At 308,
compiler 108 determines agenerator 208 for atarget program 106. For example,compiler 108 may receive a user selection of agenerator 208. The selectedgenerator 208 is configured to produce a specific type oftarget program 106. - At 310,
compiler 108 generatestarget program 106 forgenerator 208 based onobject model 206. To illustrate the above process of generatingtarget program 106 fromspecification 202, anexample specification 202 will be described.FIG. 4 shows aspecification 202 according to one embodiment.Specification 202 produces atarget program 106 to convert a video ID to a video name, transform a browser name for the browser used to play a video to a browser name, and count the number of times the video was played. It should be noted thatspecification 202 may not be a complete specification and has parts redacted, such as when a “ . . . ” is shown. -
Specification 202 includes three sections of “composite”, “beacon”, and “basefact”. A composite defines what is in the beacon, such as the raw data that is in the beacon, and how to transform the raw data in the beacon. At 402, three composite objects of “Video”, “Browser”, and “Count” are shown. Composites may have any number of input fields and one or more output fields. At 404, the Video composite object has an input parameter object named “video_id”. This is what the beacon parameter name is in a raw log line. For example, the unstructured data may include the term “video_id”. At 408, the Video composite object includes an output field object called “video_name”. This is the field name after video_id is transformed. At 410, a mapper object for “MapReduceJob” includes transformational logic for the output field object video_name. The mapper object includes details for performing the transformation that is specified in the mapper definition located at conversionMethod. Additional mappers may also be included in a composite object that may perform other transformations. At 412, other composite objects of “Browser” and “Count” are included. Details have not been provided, but would be similar to those found in the Video composite object. It will be understood thatspecification 202 may include any number ofcomposite objects 402. For example,specification 202 may include additional composite objects (not shown) that may be used by other beacon objects. - At 412, a beacon object is identified as “playback_start” and uniquely identifies the beacon within
specification 202. Becausespecification 202 may include multiple composite objects, the beacon object identifies which composite objects are part of this beacon object. At 414, the beacon includes three field objects: “selected_video”, which references the Video composite object; “user_browser”, which references the Browser composite object; and “count”, which references the Count composite object. The field objects are used to refer back to composite objects. - At 416,
specification 202 defines a basefact object of “start_by_video_and_browser”. The basefact object is used to define what structured data is desired and what unstructured data should be used to populate the structured data. The basefact object may use multiple basefacts objects. For example, this basefact object uses the “playback_start” beacon object to determine applicable data. That is, this basefact ignores all other beacon objects that are not named “playback_start” inspecification 202. At 418, the basefact object includes three structured data field objects for the “playback_start” beacon. The structured data fields may be different types, such as dimension or fact fields. A dimension maps a field in the beacon to a structured data field. A fact may perform a function (e.g., an aggregation function) on a field in the beacon to determine a result that is mapped to a structured data field. - A first structured data field of “videoName” is defined as a dimension of the video_name field object in the composite object referenced by the selected_video field object in the beacon object and a second structured data field of “browserName” is defined as a dimension from the name field object in the composite object referenced by the user_browser field object in the beacon object. A third structured data field of “totalCount” is defined as a fact that is the aggregation of the count field object in the composite object referenced by the count field object in the beacon object.
- Once receiving
specification 202,compiler 108 selects agenerator 208 that is used to generate atarget program 106. As discussed above,compiler 108converts specification 202 intoobject model 206.Generator 208 takesobject model 206 and generates code in a software language that is used to process beacons. In one embodiment,compiler 108 generates MapReduce job code as atarget program 106.Target program 106 is configured to receive unstructured data, such as raw web event log lines, and generate structured data specified by the starts_by_video_and_browser basefact definition. That is, transformed data from the beacons is stored in structured data fields of videoName, browserName, and totalCount. -
FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects thatgenerator 208 analyzes to generate code fortarget program 106. At 502,generator 208 identifies the beacon object for the basefact object. For example,specification 202 may include multiple beacon objects and the beacon object for this basefact object is the playback_start beacon object.Generator 208 generates filtering code that determines which beacons should be processed bytarget program 106. - The structured data field objects in the basefact object point to field objects in the beacon object at 504. For example, selected_video, user_browser, and count are referenced in both the basefact and the beacon objects. To determine which composite objects these structured data field objects are associated with, at 506, the field objects in the beacon object are associated with composite objects.
-
Generator 208 then uses the referenced composite objects from the beacon object to generate instructions on how to map unstructured data to structured data. For example,generator 208 generates instructions on how to tokenize (breaking the text of the beacon into words or phrases) and transform raw web log data to structured data. For example, at 508, the basefact object defines the structured data by the terms videoName, browserName, and totalCount, which are structured data fields that can be defined in a database. The transformations for the field objects in the basefact object are specified in the composite object that each beacon field object references as was discussed with respect to 506. Also, for the fact field object,generator 208 generates instructions to aggregate rows based on the count composite object. -
Generator 208 then outputs the final software code that is compiled intotarget program 106.Target program 106 can then be used to process beacons and produce the transformed data as specified in the basefact definition. -
FIG. 6 shows an example oftarget program 106 according to one embodiment.Generator 208 may generatetarget program 106 based onspecification 202 andobject model 206. At 602, the function “Map” defines the aggregator/reducer based on the MapReduce paradigm. Dimensions correspond to Keys, and Facts correspond to Values. At 604, the field “totalCount” corresponds to the structured data field defined in the basefact object ofspecification 202. Also, at 606, the “+=” symbol is determined based on the “sum” function inspecification 202 that is an aggregator. - At 608, the term “playback_start” is based on which beacons were defined by
specification 202. In this case, only events defined by playback_start beacons are reviewed. At 610, the conversion found in the composite Video is found, and at 612, the conversion found in the composite Browser is found. Further, at 614, the functions “Identity<Long>( )” and “StaticInputAction<Long>(1L)” are determined based on the fact “sum” in the basefact inspecification 202. The above information is determined by reviewingobject model 206 to generate thetarget program 106. - Accordingly,
compiler 108 generatestarget program 106, which can map unstructured data to structured data. A user can declare the structured data that was desired and the transformations needed to transform unstructured data to structured data.Compiler 108 then generates the software code to perform the desired transformations. A user thus does not need to write software code fortarget program 106. - Further, particular embodiments leverage
object model 206 that allowsdifferent generators 208 to operate on the object model. Thus,different specifications 202 may be parsed into anobject model 206 that can be operated on by thesame generators 208. - Particular embodiments may be implemented in a non-transitory computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or machine. The computer-readable storage medium contains instructions for controlling a computer system to perform a method described by particular embodiments. The instructions, when executed by one or more computer processors, may be operable to perform that which is described in particular embodiments.
- As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
- The above description illustrates various embodiments along with examples of how aspects of particular embodiments may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of particular embodiments as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents may be employed without departing from the scope hereof as defined by the claims.
Claims (22)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/660,788 US8725750B1 (en) | 2012-10-25 | 2012-10-25 | Framework for generating programs to process beacons |
US14/228,003 US9305032B2 (en) | 2012-10-25 | 2014-03-27 | Framework for generating programs to process beacons |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/660,788 US8725750B1 (en) | 2012-10-25 | 2012-10-25 | Framework for generating programs to process beacons |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/228,003 Continuation US9305032B2 (en) | 2012-10-25 | 2014-03-27 | Framework for generating programs to process beacons |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140122511A1 true US20140122511A1 (en) | 2014-05-01 |
US8725750B1 US8725750B1 (en) | 2014-05-13 |
Family
ID=50548393
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/660,788 Active US8725750B1 (en) | 2012-10-25 | 2012-10-25 | Framework for generating programs to process beacons |
US14/228,003 Active 2033-01-05 US9305032B2 (en) | 2012-10-25 | 2014-03-27 | Framework for generating programs to process beacons |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/228,003 Active 2033-01-05 US9305032B2 (en) | 2012-10-25 | 2014-03-27 | Framework for generating programs to process beacons |
Country Status (1)
Country | Link |
---|---|
US (2) | US8725750B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10951685B1 (en) * | 2018-07-19 | 2021-03-16 | Poetic Systems, Llc | Adaptive content deployment |
US11288448B2 (en) * | 2019-07-26 | 2022-03-29 | Arista Networks, Inc. | Techniques for implementing a command line interface |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8392452B2 (en) * | 2010-09-03 | 2013-03-05 | Hulu Llc | Method and apparatus for callback supplementation of media program metadata |
US8868648B2 (en) | 2012-05-14 | 2014-10-21 | Business Objects Software Ltd. | Accessing open data using business intelligence tools |
US20140214897A1 (en) * | 2013-01-31 | 2014-07-31 | Yuankai Zhu | SYSTEMS AND METHODS FOR ACCESSING A NoSQL DATABASE USING BUSINESS INTELLIGENCE TOOLS |
US10803083B2 (en) | 2015-08-27 | 2020-10-13 | Infosys Limited | System and method of generating platform-agnostic abstract syntax tree |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8015143B2 (en) * | 2002-05-22 | 2011-09-06 | Estes Timothy W | Knowledge discovery agent system and method |
CA2528492A1 (en) * | 2003-06-04 | 2005-01-06 | The Trustees Of The University Of Pennsylvania | Ndma db schema dicom to relational schema translation and xml to sql query translation |
US20050234973A1 (en) * | 2004-04-15 | 2005-10-20 | Microsoft Corporation | Mining service requests for product support |
US7822768B2 (en) * | 2004-11-23 | 2010-10-26 | International Business Machines Corporation | System and method for automating data normalization using text analytics |
US20060173865A1 (en) * | 2005-02-03 | 2006-08-03 | Fong Joseph S | System and method of translating a relational database into an XML document and vice versa |
US7849048B2 (en) * | 2005-07-05 | 2010-12-07 | Clarabridge, Inc. | System and method of making unstructured data available to structured data analysis tools |
US20070011183A1 (en) * | 2005-07-05 | 2007-01-11 | Justin Langseth | Analysis and transformation tools for structured and unstructured data |
US7613996B2 (en) * | 2005-08-15 | 2009-11-03 | Microsoft Corporation | Enabling selection of an inferred schema part |
WO2007127424A2 (en) * | 2006-04-28 | 2007-11-08 | Efunds Corporation | Methods and systems for opening and funding a financial account online |
US7849030B2 (en) * | 2006-05-31 | 2010-12-07 | Hartford Fire Insurance Company | Method and system for classifying documents |
US8271429B2 (en) * | 2006-09-11 | 2012-09-18 | Wiredset Llc | System and method for collecting and processing data |
US8160977B2 (en) * | 2006-12-11 | 2012-04-17 | Poulin Christian D | Collaborative predictive model building |
US20090276403A1 (en) * | 2008-04-30 | 2009-11-05 | Pablo Tamayo | Projection mining for advanced recommendation systems and data mining |
US20100100439A1 (en) * | 2008-06-12 | 2010-04-22 | Dawn Jutla | Multi-platform system apparatus for interoperable, multimedia-accessible and convertible structured and unstructured wikis, wiki user networks, and other user-generated content repositories |
US9460189B2 (en) * | 2010-09-23 | 2016-10-04 | Microsoft Technology Licensing, Llc | Data model dualization |
US9111018B2 (en) * | 2010-12-30 | 2015-08-18 | Cerner Innovation, Inc | Patient care cards |
US9092802B1 (en) * | 2011-08-15 | 2015-07-28 | Ramakrishna Akella | Statistical machine learning and business process models systems and methods |
US8311973B1 (en) * | 2011-09-24 | 2012-11-13 | Zadeh Lotfi A | Methods and systems for applications for Z-numbers |
-
2012
- 2012-10-25 US US13/660,788 patent/US8725750B1/en active Active
-
2014
- 2014-03-27 US US14/228,003 patent/US9305032B2/en active Active
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10951685B1 (en) * | 2018-07-19 | 2021-03-16 | Poetic Systems, Llc | Adaptive content deployment |
US11991237B1 (en) * | 2018-07-19 | 2024-05-21 | Poetic Digital, Llc | Adaptive content deployment |
US11288448B2 (en) * | 2019-07-26 | 2022-03-29 | Arista Networks, Inc. | Techniques for implementing a command line interface |
Also Published As
Publication number | Publication date |
---|---|
US9305032B2 (en) | 2016-04-05 |
US8725750B1 (en) | 2014-05-13 |
US20140214867A1 (en) | 2014-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9305032B2 (en) | Framework for generating programs to process beacons | |
CN106575166B (en) | Method for processing hand input character, splitting and merging data and processing encoding and decoding | |
US20170075973A1 (en) | Automatic Synthesis and Presentation of OLAP Cubes from Semantically Enriched Data Sources | |
WO2016082468A1 (en) | Data graphing method, device and database server | |
US8726229B2 (en) | Multi-language support for service adaptation | |
US20130191404A1 (en) | Using views of subsets of nodes of a schema to generate data transformation jobs to transform input files in first data formats to output files in second data formats | |
US20070038930A1 (en) | Method and system for an architecture for the processing of structured documents | |
US20110276603A1 (en) | Dependency graphs for multiple domains | |
US9535966B1 (en) | Techniques for aggregating data from multiple sources | |
US20070050707A1 (en) | Enablement of multiple schema management and versioning for application-specific xml parsers | |
CN111950239B (en) | Schema document generation method, device, computer equipment and medium | |
CN104536987B (en) | A kind of method and device for inquiring about data | |
EP3846089B1 (en) | Generating a knowledge graph of multiple application programming interfaces | |
US10031981B2 (en) | Exporting data to web-based applications | |
US11726994B1 (en) | Providing query restatements for explaining natural language query results | |
Daquino et al. | Creating RESTful APIs over SPARQL endpoints using RAMOSE | |
CN109284088B (en) | Signaling big data processing method and electronic equipment | |
Malki et al. | Building Semantic Mashup. | |
US9886424B2 (en) | Web application framework for extracting content | |
Bader et al. | Semantic annotation of heterogeneous data sources: Towards an integrated information framework for service technicians | |
CN113515285B (en) | Method and device for generating real-time calculation logic data | |
KR100491725B1 (en) | A data integration system and method using XQuery for defining the integrated schema | |
CN1588371A (en) | Forming method for package device | |
US20230306002A1 (en) | Help documentation enabler | |
CN116303322B (en) | Declaration type log generalization method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HULU LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WAYE, LUCAS;SENG, KEVIN;BAJARIA, VIRAL;AND OTHERS;SIGNING DATES FROM 20121022 TO 20121024;REEL/FRAME:029194/0325 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |