CN112256959A - Method for analyzing information collected by WeChat public number small program - Google Patents
Method for analyzing information collected by WeChat public number small program Download PDFInfo
- Publication number
- CN112256959A CN112256959A CN202011044049.9A CN202011044049A CN112256959A CN 112256959 A CN112256959 A CN 112256959A CN 202011044049 A CN202011044049 A CN 202011044049A CN 112256959 A CN112256959 A CN 112256959A
- Authority
- CN
- China
- Prior art keywords
- module
- interface
- information
- click
- simulation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Stored Programmes (AREA)
Abstract
The invention provides a method for analyzing information collected by a WeChat public number small program, belonging to the technical field of network data analysis. The invention adopts an automatic information acquisition tool to acquire user information, wherein the tool comprises an automatic simulation click module, an interface identification module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module and a collected information analyzing module. The invention adopts a simulator and interface layout recognition mode to automatically simulate operation and login of the WeChat, click and crawl all events and interfaces, recognize and analyze the interfaces and acquire the condition of collecting user information. The invention realizes the automatic analysis and processing of the public number and the small program collected information, can save a large amount of human resources, and can efficiently and accurately classify the data and find the collected information.
Description
Technical Field
The invention belongs to the technical field of network data analysis, and relates to a method for analyzing information collected by a WeChat public number small program.
Background
At present, along with the popularization of networks, various fields are changed greatly, particularly, informatization transformation is carried out in various industries such as education, traffic, medical treatment, news, government affairs and the like, social development and social revolution are promoted, and various applications are produced while a large number of enterprises research and develop various business applications to provide services for users. In order to efficiently analyze application-collected information, perform cluster analysis on the information, and tag attributes of applications, a technique for analyzing application-collected information is required.
Most of the prior markets analyze the collected user information based on websites and rely on mature crawler technology for analysis, but the WeChat public numbers and small programs rely on WeChat public platforms for service, the prior crawler technology cannot directly perform crawling analysis, and therefore, in the aspect of analyzing the information collected by the WeChat small programs, the method needs to be realized by combining automatic simulation application and flow acquisition analysis. At present, simulators suitable for various application systems exist in the prior art, but a technology for automatically collecting information of a wechat public number and a small program needs to be further explored.
Disclosure of Invention
In order to solve the problems, the invention provides a method for analyzing information collected by a WeChat public number small program, which is realized on the basis of the existing tool and marks application information by actively discovering the application and analyzing and acquiring the information collected by the application.
The invention provides a method for analyzing information collected by a WeChat public number small program, which comprises an automatic information acquisition tool, wherein the tool comprises an automatic simulation click module, an interface recognition module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module and a collected information analyzing module. The method starts an automatic information acquisition tool to collect information, and comprises the following steps:
(1) the automatic click simulation module executes: starting an android system simulation environment application program, identifying and starting a WeChat application program in the simulation environment application program, and starting a packet capturing tool; the packet capturing tool starts to capture packets of the network traffic;
(2) recording the name of the WeChat public number or the applet to be analyzed in a text of a preset path, and reading the text by an automatic simulated click module to obtain the name of the WeChat public number or the applet;
(3) the automatic simulation clicking module sends the read name to the interface identification module; the interface identification module identifies a WeChat search box in the opened WeChat application program, inputs the received name and acquires a search result list; the interface identification module identifies the search result, finds out the public number or the applet with the corresponding name and sends the positioning information to the automatic simulation click module;
(4) the automatic simulated clicking module carries out simulated clicking according to the positioning information, enters a public number attention interface, then calls an interface identification module to identify the attention interface, and simulates and clicks an attention public number or a small program;
(5) the interface identification module identifies and acquires the menus on a public number or a small program main interface, and calls an automatic simulation click module to click each menu;
(6) the interface analysis module analyzes interface elements in the clicked functional interface, finds an event possibly containing user information, and calls the automatic simulation click module to click a trigger event; the interface analysis module identifies and analyzes the interface of the triggered event, collects user information and sends the user information to the collected information analysis module;
(7) judging whether each menu of the main interface of the public number or the small program is clicked, if so, continuing to execute the step (8), otherwise, clicking the next menu and continuing to execute the step (6);
(8) the flow capturing and analyzing module analyzes the captured network flow in real time, extracts element information in the link interface and information contained in the link interface source code and sends the information to the collected information analyzing module;
(9) the collected information analysis module cleans and integrates the received information, determines the attribute classification of the data, and outputs the user information collected in the public numbers or the applets classified according to the attributes. The user information comprises the geographic position of the user, the registered name and the like.
In the step (6), when the interface analysis module analyzes the geographical position authorization interface, the geographical position of the user is acquired and sent to the collected information analysis module; when the interface analysis module analyzes the registration or login interface of the public number or the small program, the simulation login module is called, the form is automatically filled, the filled form information is recorded, and the filled form information is sent to the collected information analysis module.
Compared with the prior art, the invention has the advantages and positive effects that: the invention realizes the automatic analysis and processing of the public number and the small program collected information, acquires the automatic flow tool for the application collected information, automatically discovers the relevant application through the tool, acquires and analyzes the information contained in the application and the information in the flow, and marks and classifies the data, thereby saving a large amount of human resources, and efficiently and accurately classifying the data and discovering the information collected by the application.
Drawings
FIG. 1 is a flow chart of a method of analyzing information collected by a WeChat public Small program according to the present invention.
Detailed Description
The technical solution of the present invention is further described in detail below with reference to the accompanying drawings and examples.
At present, information collection of the WeChat public numbers or the small programs is carried out manually, time and labor are wasted, and efficiency is low. The invention realizes more comprehensive acquisition of user information, and can dig out the user information hidden and collected by the micro-message public number or the small program so as to further analyze the micro-message public number or the small program.
The invention provides a method for analyzing information collected by a WeChat public number small program, which is realized by designing an automatic information collection tool, calling a simulator in the prior art, adopting an interface layout recognition mode to automatically simulate operation and login for the WeChat, clicking and crawling all events and interfaces in the simulated small program or the public number, such as registration, login and other operations, identifying and analyzing the interfaces and acquiring the information condition of collected users. Meanwhile, the invention captures the network flow generated by the WeChat public number or the applet in the simulation operation, obtains the relevant page code and identifies the user information field. And integrating the collected user information to finally obtain the condition of collecting the user information by the WeChat applet or the public number.
The embodiment of the invention takes the analysis of the collected information of the public number of the credit card center of the safe bank as an example to explain the realization of the method for analyzing the collected information of the small program of the WeChat public number. When the method is realized, the invention designs an automatic information acquisition tool which comprises an automatic simulation click module, an interface identification module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module, a collected information analyzing module and the like, and finally realizes automatic analysis and collection of user information by means of the tool. An automatic information acquisition tool is installed on an intelligent machine to which the method of the invention is applied, and then the following steps are executed to realize the method of the invention, and the realization flow is shown in fig. 1.
Step 1, an automatic click simulation module calls an android system simulation environment application program, a night android simulator is started in the embodiment of the invention, a WeChat application program is identified and started in the simulator, and meanwhile, a packet capturing tool is started in the simulator to start packet capturing on network traffic.
In the embodiment of the invention, the automatic click simulation module is realized by a command line tool provided in an android system basic development tool, and after the automatic information acquisition tool is started, the automatic click simulation module executes a command line to realize the function of starting each application program. In addition, the starting of the android simulator and the WeChat application program can also be operated by manually using a mouse, and the automatic simulation click module is not used at the moment.
In the embodiment of the invention, a packet capturing tool Fiddler is started in a simulator, and a private HTTPS agent is constructed so as to capture an interface of an HTTPS request and request parameters.
Step 2, the automatic click simulation module automatically reads the public number name of the external data: a secure bank credit card center.
The external data is recorded in the text of the preset path, and the automatic click simulation module automatically reads the content in the text according to the preset path.
And 3, the automatic simulated click module sends the read name to the interface identification module, the interface identification module identifies a WeChat search box in the opened WeChat program, and inputs the received name, namely the center of the safe bank credit card, to obtain a search result list.
And 4, identifying the search result by the interface identification module, acquiring characters corresponding to the search result, finding information with the public number of 'safe bank credit card center', and sending accurate positioning information to the automatic click simulation module.
The interface identification module is a layout identification tool, such as implemented by using an element positioning tool uiautomatatorviewer of the android system.
And 5, performing simulated clicking by the automatic simulated clicking module according to the positioning information, and entering a public number attention interface.
And 6, calling an element positioning tool of the android system by the automatic click simulation module to identify the concerned interface, searching a concerned public number button, carrying out click simulation, and concerning the public number.
In the embodiment of the invention, a command line adb tool is used for acquiring the current layout file of the android application, and a target module is optimized and positioned on a component file by adopting a deep learning-based method; obtaining key attribute values in the interface; simulated clicks are made using the adb tool.
And 7, identifying and acquiring the menu of the public number at the main interface of the public number by the interface identification module.
And 8, clicking each menu by the automatic click simulation module, wherein the embodiment of the invention takes clicking (online card transaction) to enter a corresponding functional interface as an example.
Step 9, the interface analysis module analyzes elements in the interface in the opened functional interface, finds an event possibly containing user information, such as finding an [ apply ] button, calls an automatic click simulation module to perform simulated click, and enters a card transaction application interface;
step 10, after entering, a geographical position authorization interface appears, and the interface analysis module finds and records and collects geographical position information of the user through identification and analysis of the interface. Calling an automatic click simulation module to simulate clicking (confirm) buttons; a secure bank credit card application interface appears.
Step 11, identifying and analyzing the interface, analyzing data information in the interface, and analyzing and storing the data; and calling a simulation login module to automatically register, filling the form elements according to a specified format, clicking an application button through an automatic simulation click module, storing the filled form information, and sending the form information to a collected information analysis module.
And the click event enters an interface containing an embedded Html5, the interface content is identified through identifying and monitoring the layout of the Html interface, elements on the interface are analyzed by an interface analysis module by adopting an interface identification analysis algorithm, and the public information in the application is discovered.
And step 12, repeating the processes from step 8 to step 11 for all menus in the WeChat public number to discover other events and user information contained in the interface.
And step 13, analyzing the captured traffic in real time by the traffic capturing and analyzing module, acquiring linked information by adopting a self-built https agent aiming at the http or https link, such as a security bank credit card application link, analyzing the interface for the source code containing the interface html, and extracting the name (name) and the corresponding text (text) element in the input element of the interface. The flow capturing and analyzing module also extracts information data contained in the interface source code.
And 14, integrating and analyzing the information collected in the step by the collected information analysis module, and sorting out detailed data information related to the public number. The data detail information is information classified by attribute and output according to each attribute type and corresponding information data.
The method can obviously improve the data acquisition capability. If the information data of the public number applet of 10 ten thousand enterprise names are acquired, if the information data are processed in a manual mode, 4 persons are needed, 300 persons/day and about three months are needed, the method can be used for processing the information data by distributing tasks, cooperatively crawling by multiple devices and 1000 persons/day according to 10 terminals, and the information data can be processed in about 10 days. Therefore, the method can greatly improve the data acquisition speed and capacity, and can carry out customized transformation according to the needs to meet diversified requirements.
Claims (3)
1. A method for analyzing information collected by a WeChat public number small program is characterized in that an automatic information collection tool is started to collect information, and the tool comprises an automatic simulation click module, an interface recognition module, a simulation login module, a flow capturing and analyzing module, an interface analyzing module and a collected information analyzing module; the method comprises the following steps:
step 1, starting an android system simulation environment application program by an automatic simulation click module, identifying and starting a WeChat application program in a simulation environment, and starting a packet capturing tool; the packet capturing tool starts to capture packets of the network traffic;
step 2, recording the name of the WeChat public number or the applet to be analyzed in a text of a preset path, and reading the text by an automatic simulation click module to obtain the name of the WeChat public number or the applet;
step 3, the automatic click simulation module sends the read name to the interface identification module; the interface identification module identifies a WeChat search box in the opened WeChat program, inputs the received name and acquires a search result list; the interface identification module identifies the search result, finds out the public number or the applet with the corresponding name and sends the positioning information to the automatic simulation click module;
step 4, the automatic click simulation module carries out click simulation according to the positioning information, enters a public number attention interface, then calls an interface identification module to identify the attention interface, and simulates and clicks the attention public number or the small program;
step 5, in the main interface of the public number or the small program, an interface identification module identifies and acquires the menus, and an automatic simulated click module is called to click each menu;
step 6, the interface analysis module analyzes elements in the interface in the clicked functional interface, finds an event containing user information, and calls an automatic simulation click module to click a trigger event; the interface analysis module identifies and analyzes the interface of the triggered event, collects user information and sends the user information to the collected information analysis module;
step 7, judging whether each menu of the public number or the small program main interface is clicked, if so, continuing to execute the step 8, otherwise, clicking the next menu and continuing to execute the step 6;
step 8, analyzing the captured network traffic in real time by the traffic capturing and analyzing module, extracting element information in the link interface and information contained in the link interface source code, and sending the information to the collected information analyzing module;
and 9, the collected information analysis module cleans and integrates the received information and outputs user information collected in public numbers or applets.
2. The method according to claim 1, wherein in step 6, when the interface analysis module analyzes the "geographic location authorization" interface, the geographic location of the user is acquired and sent to the collected information analysis module; when the interface analysis module analyzes the registration or login interface of the public number or the small program, the simulation login module is called, the form is automatically filled, the filled form information is recorded, and the filled form information is sent to the collected information analysis module.
3. The method of claim 1, wherein in step 8, the traffic capture parsing module parses the link interface, extracts the name and text content in the input element in the interface, and sends the extracted name and text content to the collected information analysis module.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010530327 | 2020-06-11 | ||
CN2020105303275 | 2020-06-11 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112256959A true CN112256959A (en) | 2021-01-22 |
CN112256959B CN112256959B (en) | 2022-11-08 |
Family
ID=74233334
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011044049.9A Active CN112256959B (en) | 2020-06-11 | 2020-09-28 | Method for analyzing information collected by WeChat public number small program |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112256959B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704435A (en) * | 2021-09-02 | 2021-11-26 | 北京声智科技有限公司 | Data extraction method and device and data acquisition terminal |
CN113822036A (en) * | 2021-09-28 | 2021-12-21 | 百度在线网络技术(北京)有限公司 | Privacy policy content generation method and device and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6078924A (en) * | 1998-01-30 | 2000-06-20 | Aeneid Corporation | Method and apparatus for performing data collection, interpretation and analysis, in an information platform |
CN105320740A (en) * | 2015-09-22 | 2016-02-10 | 清华大学 | WeChat article and official account acquisition method and acquisition system |
CN106384249A (en) * | 2016-09-13 | 2017-02-08 | 四川长虹电器股份有限公司 | WeChat official account platform management system |
CN108833264A (en) * | 2018-06-25 | 2018-11-16 | 厦门理工学院 | Data acquisition management system, method and application based on wechat small routine |
CN110177139A (en) * | 2019-05-23 | 2019-08-27 | 中国搜索信息科技股份有限公司 | A kind of ostensible mobile APP data grab method |
-
2020
- 2020-09-28 CN CN202011044049.9A patent/CN112256959B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6078924A (en) * | 1998-01-30 | 2000-06-20 | Aeneid Corporation | Method and apparatus for performing data collection, interpretation and analysis, in an information platform |
CN105320740A (en) * | 2015-09-22 | 2016-02-10 | 清华大学 | WeChat article and official account acquisition method and acquisition system |
CN106384249A (en) * | 2016-09-13 | 2017-02-08 | 四川长虹电器股份有限公司 | WeChat official account platform management system |
CN108833264A (en) * | 2018-06-25 | 2018-11-16 | 厦门理工学院 | Data acquisition management system, method and application based on wechat small routine |
CN110177139A (en) * | 2019-05-23 | 2019-08-27 | 中国搜索信息科技股份有限公司 | A kind of ostensible mobile APP data grab method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704435A (en) * | 2021-09-02 | 2021-11-26 | 北京声智科技有限公司 | Data extraction method and device and data acquisition terminal |
CN113822036A (en) * | 2021-09-28 | 2021-12-21 | 百度在线网络技术(北京)有限公司 | Privacy policy content generation method and device and electronic equipment |
CN113822036B (en) * | 2021-09-28 | 2022-07-12 | 百度在线网络技术(北京)有限公司 | Privacy policy content generation method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112256959B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106982150B (en) | Hadoop-based mobile internet user behavior analysis method | |
CN109656792A (en) | Applied performance analysis method, apparatus, computer equipment and storage medium based on network call log | |
CN111882367B (en) | Method for monitoring and tracking online advertisements through analysis of user surfing behavior | |
CN102968494B (en) | The system and method for transport information is gathered by microblogging | |
CN112256959B (en) | Method for analyzing information collected by WeChat public number small program | |
CN111930868A (en) | Big data behavior trajectory analysis method based on multi-dimensional data acquisition | |
CN107894889A (en) | Bury point methods, equipment and computer-readable recording medium | |
CN114817968B (en) | Method, device and equipment for tracing path of featureless data and storage medium | |
CN108429747A (en) | A kind of extensive Web server information collecting method | |
CN111581067A (en) | Data acquisition method and device | |
CN111355628A (en) | Model training method, business recognition device and electronic device | |
CN104376021A (en) | File recommending system and method | |
CN111882368B (en) | On-line advertisement DPI encryption buried point and transparent transmission tracking method | |
CN111917848A (en) | Data processing method based on edge computing and cloud computing cooperation and cloud server | |
CN116049808B (en) | Equipment fingerprint acquisition system and method based on big data | |
CN115935323A (en) | Characteristic variable acquisition method and device | |
CN115296892B (en) | Data information service system | |
CN115357656A (en) | Information processing method and device based on big data and storage medium | |
CN113434404B (en) | Automatic service verification method and device for verifying reliability of disaster recovery system | |
CN112528104A (en) | Traceability system and traceability method based on sensitive data | |
CN113190458A (en) | Method and device for automatically analyzing buried point data, computer equipment and storage medium | |
CN107295087B (en) | System and method for realizing data aggregation between network systems | |
CN113626673A (en) | Page data acquisition method, system, terminal and storage medium | |
CN113076308A (en) | Space-time big data service system | |
CN117896732B (en) | APP privacy data use purpose consistency analysis method based on large language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |