WO2021093968A1 - Computerized system and method of using word embedding for generating a list of words personalized to the learning needs of a user - Google Patents
Computerized system and method of using word embedding for generating a list of words personalized to the learning needs of a user Download PDFInfo
- Publication number
- WO2021093968A1 WO2021093968A1 PCT/EP2019/081498 EP2019081498W WO2021093968A1 WO 2021093968 A1 WO2021093968 A1 WO 2021093968A1 EP 2019081498 W EP2019081498 W EP 2019081498W WO 2021093968 A1 WO2021093968 A1 WO 2021093968A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- words
- ord
- mass
- list
- center
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B7/00—Electrically-operated teaching apparatus or devices working with questions and answers
- G09B7/02—Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/02—Electrically-operated educational appliances with visual presentation of the material to be studied, e.g. using film strip
Definitions
- the present disclosure relates to a computerized system, computerized method and computer program product using word embedding for automatically generating a list of words personalized to the learning needs of a user, selected from a corpus of words represented as vectors in an M- dimensional word embedding space.
- the above presented methods have limitations, one of them being that the content is more or less static and that the learning process is not well adapted/personalized to the needs of the individual learner. There is a need for providing a customized/personalized learning path for each individual student, as each of them have different needs and progresses.
- An object of the present disclosure is to address at least one of the issues described above.
- the inventors have realized that in order to providing a customized/personalized learning path for each individual student, an improved computerized system and method must be provided that enable users/learners to expand their knowledge outside of the currently studied learning points, so the users learn not only from their mistakes, but are enabled to learn something new.
- previous solutions using e.g. spaced repetition or making learners repeat assignments for only the words that they have already studied before moving on to a "new chapter” or the like will not suffice. These previous solutions do not enable the learner to learn something new, and specifically do not introduce any new vocabulary information/words personalized to the learning needs of the user.
- What is needed, the inventors realized, is to enable at each time instance the generation of a list of words, a recommendation, personalized to the learner that will be the natural next step to take in terms of expanding their vocabulary and the understanding of the words in it.
- Word embedding may be used for calculating how similar a piece of text or is to another piece of text, how similar is a word to another word, in a high dimensional word embedding space wherein each dimension represents a property of the word and the word is in the word embedding space represented as a vector comprising a set of numeric values, one for each dimension of the word embedding space.
- Such a high dimensional word embedding space typically comprises hundreds, or more, dimensions. It is hence not possible for the human mind to produce the data of the word embedding space.
- the inventors used a large corpus of language learning information to train a machine learning algorithm to perform the word embedding. Based on the training data provided, the machine learning algorithm was configured to set the distance between words in the high dimensional word embedding space dependent on the similarity of the words according to the property represented in that dimension. In other words, the closer two words (vectors representing words) are in the word embedding space, the more similar they are deemed to be.
- the similarity may mean that they are related in meaning, appear in a similar context in the training data, etc.
- the word embedding space thus generated may suitably be used for dynamically generating personalized recommendations for vocabulary training, e.g. in the form of a list of words suggested for study, if the vocabulary comprehension of the individual learner could be determined, and possibly tracked, in relation to the words represented in the word embedding space.
- the list of words may then be presented to the user for self-study or digitally assisted study or used as input to the same of a different computerized system configured to provide digital language learning assignments based on the recommended word on the list.
- the task essential to enabling any of these aims is to determine, and possibly track the vocabulary comprehension of the individual learner could be in relation to the words represented in the word embedding space and to use this knowledge for generating the recommended word list.
- this object is achieved by an end-to-end specialized adaptive system, and corresponding computerized method, using word embedding in a high dimensional word space to not only remediate on the vocabulary that is not being mastered, but also adaptively progress the learners towards new parts of the vocabulary, wherein the new parts are selected personally for the learner, based on the learner's preferences and personalized by the specialized adaptive system based on knowledge on the workings of the human brain.
- a computerized system using word embedding for generating a list of words personalized to the learning needs of a user of the system at a given time instance, the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector having a position in an M-dimensional word embedding space, the system comprising processing circuitry and a memory configured to communicate with the processing circuitry.
- the processing circuitry is configured to obtain, via a first interface, a first input signal indicative of user specific system initialization settings and to initialize the system, by assigning a respective score value to each numeric vector, based on the first input signal and a predetermined set of rules obtained from the memory and calculating the position of the center of mass of the M-dimensional word embedding space, at the initial time instance, based on the respective positions and score value of the numeric vectors comprised in the M-dimensional word embedding space.
- the processing circuitry is further configured to generate a list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated position of the center of mass.
- the processing circuitry may further be configured to, repeatedly: obtain, via the first interface, a second input signal indicative of user input related to one or more of the numeric vectors comprised in the M-dimensional word embedding space, at a current time instance; adjust the settings of the system by updating the respective score value assigned to each of the one or more numeric vectors, based on the second input signal and the predetermined set of rules and calculating an updated position of the center of mass of the M-dimensional word embedding space, at the current time instance, based on the respective positions and updated score value of the numeric vectors comprised in the M-dimensional word embedding space; and generate an updated list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated updated position of the center of mass.
- Embodiments described herein thereby solve the limitation of spaced repetition, by enabling the learners to expand their vocabulary by providing new relevant vocabulary information based on the vocabulary comprehension of the learner.
- this enable learners to learn vocabulary faster and more efficiently for the user by providing personalized recommendations of words to focus on next, adapted to the individual learner/user of the system.
- the processing circuitry is configured to, before generating the list of words, or the updated list of words apply a filter mask centered at the position of the calculated center of mass of the M-dimensional word embedding space and determine a subset of numeric vectors comprising the numeric vectors that are inside the filter mask.
- the processing circuitry may be configured to set the length of the list of words based on user input received via the first user interface or a second user interface or an input device connected to the system.
- the memory may be configured to, for each time instance, store information on the calculated position of the center of mass and the respective associated time instance, at which the position of the center of mass was calculated.
- the processing circuitry may further be configured to, for two or more of the time instances for which information has been stored: retrieve information on the calculated position of the center of mass and the respective associated time instance the position was calculated; and determine the change in the position of the center of mass in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass and the respective associated time instance the position was calculated.
- the processing circuitry may further be configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass in the M-dimensional word embedding space over time.
- the processing circuitry may further be configured to present a visualization of the determined change in the position of the center of mass in the M-dimensional word embedding space over time via the first user interface or the second user interface.
- embodiments herein thereby provide the possibility to represent the vocabulary comprehension of a learner/user of the system, and possibly also to represent and/or track the progression of the vocabulary comprehension.
- the representation may be feedback into the system and be used as basis for further personalization of future recommendations, and/or it may be visualized via a user interface comprised in or connected to the system.
- the system may be configured to determine, based on the tracking of a number of learners, optimal paths for learning for an individual learner, i.e. an optimal order of being presented to different parts of the vocabulary and/or suitable activities to perform, in order to optimize the learning progress/vocabulary comprehension progress of the learner.
- a method in a computerized system, of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance, the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector having a position in an M-dimensional word embedding space.
- the method comprises obtaining, via a first interface, a first input signal indicative of user specific system initialization settings and initializing, using processing circuitry, the system, by assigning a respective score value to each numeric vector, based on the first input signal and a predetermined set of rules; and calculating the position of the center of mass of the M-dimensional word embedding space, at the initial time instance, based on the respective positions and score value of the numeric vectors comprised in the M-dimensional word embedding space.
- the method further comprises generating, using the processing circuitry, a list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated position of the center of mass.
- the method further comprises, repeatedly: obtaining, via the first interface, a second input signal indicative of user input related to one or more of the numeric vectors comprised in the M-dimensional word embedding space, at a current time instance; adjusting, using the processing circuitry, the settings of the system by: updating the respective score value assigned to each of the one or more numeric vectors, based on the second input signal and the predetermined set of rules; and calculating an updated position of the center of mass of the M-dimensional word embedding space, at the current time instance, based on the respective positions and updated score value of the numeric vectors comprised in the M-dimensional word embedding space; and finally generating, using the processing circuitry, an updated list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated updated position of the center of mass.
- the method comprises, before generating the list of words, or the updated list of words: applying, using the processing circuitry, a filter mask centered at the position of the calculated center of mass of the M-dimensional word embedding space and determining, using the processing circuitry, a subset of numeric vectors comprising the numeric vectors that are inside the filter mask.
- generating the list of words, or generating the updated list of words comprises generating the list to only comprise words represented by numeric vectors in the determined subset of numeric vectors.
- the method may further comprise setting, using the processing circuitry, the length of the list of words based on user input received via the first user interface or a second user interface or an input device connected to the system.
- the method comprises storing, in a memory of the system, information on the calculated position of the center of mass and the respective associated time instance at which the position of the center of mass was calculated.
- the method may further comprise, for two or more of the time instances for which information has been stored: retrieving, using the processing circuitry, information on the calculated position of the center of mass and the respective associated time instance the position was calculated and determining, using the processing circuitry, the change in the position of the center of mass in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass and the respective associated time instance the position was calculated.
- the generating the list of words, or the updated list of words, using the processing circuitry may in these embodiments further be based on the determined change in the position of the center of mass in the M-dimensional word embedding space over time.
- the method of these embodiments may further comprise presenting, via the first user interface or the second user interface, a visualization of the determined change in the position of the center of mass in the M-dimensional word embedding space over time.
- a computer program loadable into a memory communicatively connected or coupled to at least one data processor comprising software for executing the method according any of the method embodiments described herein when the program is run on the at least one data processor.
- a processor-readable medium having a program recorded thereon, where the program is to make at least one data processor execute the method according to of any of the method embodiments described herein when the program is loaded into the at least one data processor.
- Fig. 1 shows a schematic overview of a system according to one or more embodiment
- Fig. 2 is a flow chart of a computerized method for using word embedding for generating a list of words personalized to the learning needs of a user, according to one or more embodiment
- Fig. 3 is a flow chart of a computerized method for using word embedding for generating a list of words personalized to the learning needs of a user, according to one or more embodiment
- Fig. 4 is a flow chart of a computerized method for determining and possibly using information on a change in position of the center of mass over time, according to one or more embodiment
- Fig. 5 shows an oversimplified 2D representation of a word embedding space.
- Figs. 6 to 7 show an illustrative example of center of mass calculation and updating in an oversimplified 2D representation of a word embedding space.
- M is in a non limiting example an integer around 300, but it may in different implementations range from 50 or 100 to several thousands, depending on factors such as the number of properties relevant to describe the word in the embedding space and the computational capabilities of the system used.
- FIG. 5 An oversimplified 2D representation of a word embedding space (unmarked axis) is shown in Figure 5.
- word embedding space words that are similar to each other, based on the properties defined for the words (included in the word vectors) and pre-set rules and conditions, are positioned close to each other, while words that are less similar by the same standards are positioned far from each other in the word embedding space.
- the center of mass is the unique point at the center of a distribution of mass in space, here the word embedding space, that has the property that the weighted position vectors relative to this point sum to zero.
- the center of mass is the mean location of a distribution of mass in space.
- the mass of a vector V W ORD ("particle") in the word embedding space of this disclosure corresponds to a numeric "weight” determined based on the initial or current score value S M ASTER assigned to the vector VwoRD ⁇
- a non-limiting example is illustrated in Fig.
- NPU T_I and a predetermined set of rules according to embodiments herein may be selected as one of the following values:
- SMASTER 0: meaning that the learner/user of the system has not been presented with the word before.
- SMASTER MIN: a preset minimum value > 0 representing a minimum score of mastery of the word.
- MIN ⁇ SMASTER ⁇ MASTER representing that the learner/user of the system is learning the word.
- a suitable number of internal levels between MIN and MAX may be applied, for example being represented as integers or float numbers.
- S MASTER MASTER: a preset maximum value meaning the learner/user of the system has mastered the word.
- Figure 1 shows a schematic overview of a computerized system 100 using word embedding for generating a list of words personalized to the learning needs of a user of the system 100 at a given time instance t.
- the words on the list are selected from a plurality of words each represented as an M-dimensional numeric vector V W ORD having a position P W ORD in an M-dimensional word embedding space.
- the system 100 comprises processing circuitry 110 and a memory 120 configured to communicate with the processing circuitry 110.
- the processing circuitry 100 is configured to obtain, via a first interface 130, a first input signal S
- the processing circuitry is further configured to generate a list of words personalized to the learning needs of a user based on the respective distances from the position P W ORD of each numeric vector V W ORD to the calculated position of the center of mass P CM ⁇
- the processing circuitry 110 is configured to, repeatedly: obtain, via the first interface 130, a second input signal S
- the processing circuitry 110 is configured to, for each time the second input signal S
- the personalized recommendations of words to focus on next are continuously adapted to the individual learner/user of the system, which further increases the relevance of the recommended words on the generated list to the user.
- the processing circuitry 110 is configured to, before generating the list of words, or the updated list of words: apply a filter mask centered at the position of the calculated center of mass P CM of the M-dimensional word embedding space; and determine a subset of numeric vectors V W ORD comprising the numeric vectors V W ORD that are inside the filter mask, wherein the processing circuitry 110 is further configured to generate the list of words, or generate the updated list of words, to only comprise words represented by numeric vectors V W ORD in the determined subset of numeric vectors V W ORD-
- the filter mask has the same dimension as the word embedding space and hence filters in all dimensions, using the same value/search radius for all dimensions, or differentiated values/search radii for different dimensions.
- the filter mask is pre-defined/pre- calculated. As the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and faster.
- the processing circuitry 110 may be configured to set the length of the list of words based on user input received via the first user interface 130 or a second user interface 140 or an input device 150 connected to the system 100. Thereby, the user is enabled to select the length of the list of words to focus on and hence control the pace at which the learning progresses to suit the needs of the user.
- the memory 120 may be configured to, for each time instance t
- the memory 120 may in these embodiments be configured to, for each time instance t
- the memory may further be configured to store the score values S MA STER assigned to each or a selection of the words represented by numeric vectors V W ORD in the word embedding system, or the determined subset of numeric vectors V W ORD, at the respective associated time instance t
- the memory is configured to store the score values S MA STER for more than one learner/user 155 connected to the system in this manner, a more granular comparison of the vocabulary knowledge status of the learners at one or more given time instances is correspondingly enabled.
- the processing circuitry 110 may further be configured to, for two or more time instances for which information has been stored: retrieve information on the calculated position of the center of mass P CM and the respective associated time instance the position was calculated and determine the change in the position of the center of mass P CM in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass P CM and the respective associated time instance the position was calculated.
- the processing circuitry 110 may in these embodiments be configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass P CM in the M-dimensional word embedding space over time.
- the processing circuitry 110 may be configured to present a visualization of the determined change in the position of the center of mass P CM in the M-dimensional word embedding space over time via the first user interface 130 or the second user interface 140.
- embodiments herein thereby provide the possibility to represent the vocabulary comprehension of the learner/user of the system 100, and possibly also to represent and/or track the progression of the vocabulary comprehension.
- the representation may be feedback into the system and be used as basis for further personalization of future recommendations, and/or it may be visualized via a user interface comprised in or connected to the system.
- the system may be configured to determine, based on the tracking of a number of learners, optimal paths for learning for an individual learner, i.e. an optimal order of being presented to different parts of the vocabulary and/or suitable activities to perform, in order to optimize the learning progress/vocabulary comprehension progress of the learner.
- FIG 2 there is shown a method, in a computerized system 100, of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance t, the words on the list being selected from a plurality of words each represented as an M- dimensional numeric vector V W ORD having a position P W ORD in an M-dimensional word embedding space, the method comprising:
- step 210 obtaining, via a first interface 130, a first input signal S
- the user specific system initialization settings may comprise results of a test performed by the user and input into the system via a digital learning environment (program application or the like).
- the user specific system initialization settings may be input to the system as manual input from the user, a teacher, or another interested party - e.g. by enabling selection of learning preferences in a displayed menu via a user interface or input as a signal from the system, or an external program application communicatively connected to the system, wherein the signal represents results of a placement test or the like.
- the initialisation setting may comprise pre-set default values.
- step 220 initializing, using processing circuitry 110, the system 100.
- step 220 includes two sub-steps 222, 224, comprising:
- sub-step 222 assigning a respective score value S M ASTER to each numeric vector V W ORD, based on the first input signal S
- NPU T_I may comprise the respective score values S M ASTER and the predetermined set of rules define that the respective score values are to be assigned to the numeric vectors V W ORD ⁇
- NPU T_I may comprise score values S M ASTER for some of the numeric vectors V W ORD in the M-dimensional word embedding space and the rules further comprise how to approximate score values numeric vectors V W ORD for groups/clusters of words based on the provided score values S M ASTER-
- NPU T_I may comprise an estimated "mastery level" for one or more of the numeric vectors V W ORD and the rules may comprise how the words/vectors in the M-dimensional word embedding space are to be
- sub-step 224 calculating the position of the center of mass P CM of the M-dimensional word embedding space, at the initial time instance t
- the method shown in Figure 2 further comprises:
- step 240 generating, using the processing circuitry 110, a list of words personalized to the learning needs of a user based on the respective distances from the position P W ORD of each numeric vector VWORD to the calculated position of the center of mass P CM ⁇
- generating the list of words personalized to the learning needs of a user based on the respective distances from the position P W ORD of each of the numeric vector V W ORD to the calculated position of the center of mass P CM e.g. comprises generating a list comprising the N words that are represented by the N numeric vector V W ORD with a respective position P W ORD closest to the position of the center of mass P CM in the M-dimensional word embedding space, wherein N being is an integer > 0 representing the length of the word list.
- generating the list of words personalized to the learning needs of a user based on the respective distances from the position P W ORD of each of the numeric vector V W ORD to the calculated position of the center of mass P CM e.g. comprises generating a list comprising all words represented by a numeric vector V W ORD with a position P W ORD less than the pre-set distance d from the position of the center of mass P CM in the M-dimensional word embedding space.
- the method may further comprise presenting the list or words to the user/learner via a user interface 130, 140, thereby enabling the user to perform self-study or digitally assisted study of the words selected as optimal for the individual user.
- the method may further comprise inputting the list of words into the system 100, or a different computerized system, wherein the system 100 (or other system) is configured to provide digital language learning assignments or actions to the user based on the words on the list.
- the method shown in Figure 2 further comprises, before step 240 of generating the list of words:
- step 230 applying, using the processing circuitry 110, a filter mask centered at the position of the calculated center of mass P CM of the M-dimensional word embedding space; and determining, using the processing circuitry 110, a subset of numeric vectors V W ORD comprising the numeric vectors V W ORD that are inside the filter mask.
- step 240 of generating the list of words comprises generating the list to only comprise words represented by numeric vectors V W ORD in the determined subset of numeric vectors V W ORD- AS the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and thereby also faster.
- the method of figure 2 may further comprise, as shown in Figure 3, performing the following steps, repeatedly at selected (pre-set of user input) time intervals or points in time:
- step 310 obtaining, via the first interface 130, a second input signal S
- NPUT 2 may comprise results of a test, assignment, one or more action, or the like performed by the user and input into the system via a digital learning environment (program application or the like).
- the input may be made via a user interface 130, 140 or an input device 150 and may comprise text input, voice input, selections, and/or other information.
- the input to the system may be manual input performed by a teacher, or another interested party, via a user interface 130, 140 or input device 150, relating to the learning process of the learner.
- step 320 adjusting, using the processing circuitry 110, the settings of the system 100.
- the adjusting of step 320 comprises two sub-steps 322, 324, comprising: In sub-step 322: updating the respective score value S M ASTER assigned to each of the one or more numeric vectors V W ORD, based on the second input signal S
- the predetermined set of rules may include that a score value S M ASTER assigned to a numeric vector VWORD should be increased (e.g. updated to the next, higher, level or a defined number of levels closer to the maximum mastery level) if the second input signal S
- the predetermined set of rules may further include that a score value S M ASTER assigned to a numeric vector VWORD should be decreased (e.g. updated to the next, lower, level or a defined number of levels closer to the minimum mastery level) if the time since the learner/user of the system was last presented with the word represented by the numeric vector V W ORD in an assignment, as indicated by the second input signal S
- a score value S M ASTER assigned to a numeric vector VWORD should be decreased (e.g. updated to the next, lower, level or a defined number of levels closer to the minimum mastery level) if the time since the learner/user of the system was last presented with the word represented by the numeric vector V W ORD in an assignment, as indicated by the second input signal S
- sub-step 324 calculating an updated position of the center of mass P CM of the M-dimensional word embedding space, at the current time instance t CURRENT , based on the respective positions P W ORD and updated score value S M ASTER of the numeric vectors V W ORD comprised in the M-dimensional word embedding space.
- the position of P cm will move within the M-dimensional word embedding space. If the mastery level and score values S M ASTER related to a certain word or group of words (e.g. neighbouring words in the word embedding space) increases, the position of P cm will move closer to un-initialized parts of the vocabulary, as higher score values means that the "mass" of the numeric vector V W ORD being assigned the score values decreases.
- one or more method embodiments shown in Figure 3 further comprises:
- step 340 generating, using the processing circuitry 110, an updated list of words personalized to the learning needs of a user based on the respective distances from the position P W ORD of each numeric vector V W ORD to the calculated updated position of the center of mass P CM ⁇
- generating the updated list of words personalized to the learning needs of a user based on the respective distances from the position P W ORD of each of the numeric vector VWORD to the updated position of the center of mass P CM e.g. comprises generating a list comprising the N words that are represented by the N numeric vector V W ORD with a respective position P W ORD closest to the updated position of the center of mass P CM in the M-dimensional word embedding space, wherein N being is an integer > 0 representing the length of the word list.
- generating the updated list of words personalized to the learning needs of a user based on the respective distances from the position P W ORD of each of the numeric vector V W ORD to the updated position of the center of mass P CM e.g. comprises generating a list comprising all words represented by a numeric vector V W ORD with a position P W ORD less than the pre-set distance d from the updated position of the center of mass P CM in the M-dimensional word embedding space.
- the method may further comprise presenting the list or words to the user/learner via a user interface 130, 140, thereby enabling the user to perform self-study or digitally assisted study of the words selected as optimal for the individual user.
- the method may further comprise inputting the list of words into the system 100, or a different computerized system, wherein the system 100 (or other system) is configured to provide digital language learning assignments or actions to the user based on the words on the list.
- the method shown in Figure 3 may in some embodiments further comprise, before step 340 of generating the updated list of words:
- step 330 applying, using the processing circuitry 110, a filter mask centered at the position of the calculated center of mass P CM of the M-dimensional word embedding space; and determining, using the processing circuitry 110, a subset of numeric vectors V W ORD comprising the numeric vectors V W ORD that are inside the filter mask.
- step 340 of generating the updated list of words comprises generating the list to only comprise words represented by numeric vectors V W ORD in the determined subset of numeric vectors V W ORD- AS the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and thereby also faster.
- the method described in connection with Figure 2 and/or 3 may further comprise setting, using the processing circuitry 110, the length of the list of words based on user input received via the first user interface 130 or a second user interface 140 or an input device 150 connected to the system 100.
- the user of the system i.e. the learner or another user influencing the learning process such as a teacher, instructor, supervisor, or other is enabled to control the amount of words recommended for study.
- the length of the list may be pre-set in the system, and/or be dynamically adjusted by the system depending on pre-set rules and conditions.
- the vocabulary knowledge status of the learner, or each of several learners, connected to the system 100 can be determined for a given time instance by determining the current position of the center of mass P CM for each specified learner.
- the learning progress of the learner, or each of several learners, connected to the system 100 can be monitored over time, by determining the current position of the center of mass P CM for each specified learner at more than one time instance, e.g. at a number of consecutive time instances, and the information thus gathered can be presented visually via a user interface and/or be feedback into the system to further enhance future recommendations.
- the information may also be used for training the machine learning algorithm generating the word embedding system so that the accuracy of the word embedding system may be continuously improved.
- step 410 storing, in a memory 120 of the system 100, information on the calculated position of the center of mass P CM and the respective associated time instance t
- the vocabulary knowledge status of the learner can be determined at one or more given time instances.
- step 410 is performed for more than one learner/user 155 connected to the system, comparison of the vocabulary knowledge status of the learners at one or more given time instances is enabled.
- the storing of step 410 may further comprise storing the score values S M ASTER assigned to each or a selection of the words represented by numeric vectors V W ORD in the word embedding system or the determined subset of numeric vectors V W ORD, at the respective associated time instance t
- step 410 is performed for more than one learner/user 155 connected to the system, a more granular comparison of the vocabulary knowledge status of the learners at one or more given time instances is correspondingly enabled.
- step 420 For two or more of the time instances for which information has been stored: retrieving, using the processing circuitry 110, information on the calculated position of the center of mass P CM and the respective associated time instance the position was calculated.
- step 420 may further comprise, for the two or more of the time instances for which information has been stored: retrieving, using the processing circuitry 110, information on the score values S M ASTER assigned to each or the selection of the words represented by numeric vectors V W ORD in the word embedding system or the determined subset of numeric vectors V W ORD at the respective associated time instance t
- step 430 for the two or more time instances in step 420: determining, using the processing circuitry 110, the change in the position of the center of mass P CM in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass
- the learning progress of the learner, over time may be determined.
- steps 410 and 420 is performed for more than one learner/user 155 connected to the system, the learning progress of more than one learner may be determined in step 430, and a comparison of the learning progress of the learners, at one or more given time instances, is enabled.
- generating the list of words in Step 240, and/or generating the updated list of words in Step 340 may further be based on the determined change in the position of the center of mass P CM in the M-dimensional word embedding space over time.
- step 440 presenting, via the first user interface 130 or the second user interface 140, a visualization of the calculated position of the center of mass P CM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass P CM in the M- dimensional word embedding space over time.
- the learner may view and thereby better understand the vocabulary knowledge status of the learner at a given time, and/or the learning progress of the learner over time. Any anomalies, such as decreasing knowledge or progress slowing down may hence easily be detected and action taken.
- step 420 comprises retrieving, using the processing circuitry 110, information on the score values S M ASTER assigned to each or the selection of the words represented by numeric vectors V W ORD in the word embedding system or the determined subset of numeric vectors VWORD
- the presenting of step 440 may further comprise presenting information on the score values SMASTER assigned to each or the selection of the words represented by numeric vectors V W ORD in the word embedding system or the determined subset of numeric vectors V W ORD-
- step 440 may comprise presenting a visualization of the calculated position of the center of mass P CM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass P CM in the M-dimensional word embedding space over time, for the more than one learner/user 155.
- a visualization of the learning progress of the learners, at one or more given time instances, is provided.
- step 450 Feeding back to the system the calculated position of the center of mass P CM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass (P CM ) in the M-dimensional word embedding space over time.
- step 450 may also be performed with regard to more than one learner/user 155 connected to the system.
- information on the vocabulary knowledge status of a learner, or several learners connected to the system, at a given time, and/or the learning progress of the learner, or learners, over time may be used to further enhance future recommendations and/or to be used for training the machine learning algorithm generating the word embedding system so that the accuracy of the word embedding system is continuously improved.
- the processing circuitry 110 may further be configured to perform the steps and functions according to any of the method embodiments described herein.
- All of the process steps, as well as any sub-sequence of steps, described with reference to Fig. 2 above may be controlled by means of a programmed data processor.
- the embodiments of the invention described above with reference to the drawings comprise processing circuitry, the invention thus also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice.
- the program may be in the form of source code, object code, a code intermediate source and object code such as in partially compiled form, or in any other form suitable for use in the implementation of the process according to the invention.
- the program may either be a part of an operating system or be a separate application.
- the carrier may be any entity or device capable of carrying the program.
- the carrier may comprise a storage medium, such as a Flash memory, a ROM (Read Only Memory), an EPROM (Erasable Programmable Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read only Memory), or a magnetic recording medium, for example a floppy disc or hard disc.
- the carrier may be a transmissible carrier such as an electrical or optical signal which may be conveyed via electrical or optical cable or by radio or by other means.
- the carrier may be constituted by such cable or device or means.
- the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant processes.
- Program code which, when run by the processing circuitry 110, causes the system 100 to perform the method according to any of the method embodiments herein may already be pre-stored in an internal memory 120 of the system 100.
- the processor 110 is in such embodiments communicably connected to the memory 120.
- a computer program loadable into a memory communicatively connected or coupled to at least one data processor, e.g. the processor 110, comprising software for executing the method according any of the embodiments herein when the program is run on the at least one processor 110.
- a processor-readable medium having a program recorded thereon, where the program is to make at least one data processor, e.g. the processor 110, execute the method according to of any of the embodiments herein when the program is loaded into the at least one data processor.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Image Processing (AREA)
Abstract
There is provided a computerized system (100) and method using word embedding for generating a list of words personalized to the learning needs of a user of the system (100) at a given time instance (t), the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector (VWORD) having a position (PWORD) in an M-dimensional word embedding space, by obtaining a first input signal (SINPUT_1) indicative of user specific system initialization settings; initializing the system (100), by assigning a respective score value (SMASTER) to each numeric vector (VWORD), based on the first input signal (SINPUT_1) and a predetermined set of rules obtained from the memory (120); calculating the position of the center of mass (PCM) of the M-dimensional word embedding space, at the initial time instance (tINITIAL), based on the respective positions (PWORD) and score value (SMASTER) of the numeric vectors (VWORD) comprised in the M-dimensional word embedding space; and generating a list of words personalized to the learning needs of a user based on the respective distances from the position (PWORD) of each numeric vector (VWORD) to the calculated position of the center of mass (PCM).
Description
COMPUTERIZED SYSTEM AND METHOD OF USING WORD EMBEDDING FOR GENERATING A LIST OF
WORDS PERSONALIZED TO THE LEARNING NEEDS OF A USER
TECHNICAL FIELD
The present disclosure relates to a computerized system, computerized method and computer program product using word embedding for automatically generating a list of words personalized to the learning needs of a user, selected from a corpus of words represented as vectors in an M- dimensional word embedding space.
BACKGROUND
In learning a language, whether it is a person's first language, a second language or other, learning the vocabulary of the language is crucial to being able to understand and use the language. As of the past decades, the study patterns of learners have evolved from being in a stationary classroom performing assignments using pen and paper, to including or completely transitioning to digital learning applications accessible via computers or smart devices, e.g. smart phones. With this change there has also been a change from the previous approach of studying with the aid of a single teacher to having access to a combination of one or more human instructors and specifically developed and often adaptive computer systems, based on knowledge obtained by language learning experts.
The common way of such a specifically developed adaptive computer system to assist a learner in his/her learning process is to make the learning aware of and remediate the mistake areas of a certain language currently studied learning point. The nature of such systems is thus to prompt a learning to "redo what you have not done well" by, e.g., going back in a provided static list of words and repeat the assignment with regards to the words where an erroneous answer was given by the user/learner. An example of a known repetition method implemented in computerized language learning systems is presenting the learner with the same word at certain intervals with the goal of the learner eventually memorizing the word. This is commonly referred to as "spaced repetition".
The above presented methods have limitations, one of them being that the content is more or less static and that the learning process is not well adapted/personalized to the needs of the individual learner.
There is a need for providing a customized/personalized learning path for each individual student, as each of them have different needs and progresses.
SUMMARY
An object of the present disclosure is to address at least one of the issues described above.
The inventors have realized that in order to providing a customized/personalized learning path for each individual student, an improved computerized system and method must be provided that enable users/learners to expand their knowledge outside of the currently studied learning points, so the users learn not only from their mistakes, but are enabled to learn something new. For this purpose, previous solutions using e.g. spaced repetition or making learners repeat assignments for only the words that they have already studied before moving on to a "new chapter" or the like will not suffice. These previous solutions do not enable the learner to learn something new, and specifically do not introduce any new vocabulary information/words personalized to the learning needs of the user. What is needed, the inventors realized, is to enable at each time instance the generation of a list of words, a recommendation, personalized to the learner that will be the natural next step to take in terms of expanding their vocabulary and the understanding of the words in it.
The previously known systems do not provide any satisfying solution to this problem. From this realization the inventors, having good knowledge in the science of the human brain, pondered the prospect of using word embedding to achieve an improved computerized vocabulary learning system and method.
Word embedding may be used for calculating how similar a piece of text or is to another piece of text, how similar is a word to another word, in a high dimensional word embedding space wherein each dimension represents a property of the word and the word is in the word embedding space represented as a vector comprising a set of numeric values, one for each dimension of the word embedding space.
Such a high dimensional word embedding space typically comprises hundreds, or more, dimensions. It is hence not possible for the human mind to produce the data of the word embedding space. To obtain the needed word embedding space, the inventors used a large corpus of language learning information to train a machine learning algorithm to perform the word embedding. Based on the training data provided, the machine learning algorithm was configured to set the distance between words in the high dimensional word embedding space dependent on the similarity of the words according to the property represented in that dimension. In other words, the closer two words (vectors representing words) are in the word embedding space, the more similar they are deemed to
be. The similarity may mean that they are related in meaning, appear in a similar context in the training data, etc.
For the same reason, the sheer vastness of information available, it is not possible for the human mind to process the data in the word embedding space, especially not to comprehend the relationship between words in the large number of dimensions available, to generate a comprehensive list of words recommended for study by, i.e. personalized to the learning needs of, a learner based on the word embedding information.
Having come this far in the inventive process, the inventors further realised that the word embedding space thus generated may suitably be used for dynamically generating personalized recommendations for vocabulary training, e.g. in the form of a list of words suggested for study, if the vocabulary comprehension of the individual learner could be determined, and possibly tracked, in relation to the words represented in the word embedding space.
The list of words may then be presented to the user for self-study or digitally assisted study or used as input to the same of a different computerized system configured to provide digital language learning assignments based on the recommended word on the list. However, the task essential to enabling any of these aims is to determine, and possibly track the vocabulary comprehension of the individual learner could be in relation to the words represented in the word embedding space and to use this knowledge for generating the recommended word list.
In embodiments described herein, this object is achieved by an end-to-end specialized adaptive system, and corresponding computerized method, using word embedding in a high dimensional word space to not only remediate on the vocabulary that is not being mastered, but also adaptively progress the learners towards new parts of the vocabulary, wherein the new parts are selected personally for the learner, based on the learner's preferences and personalized by the specialized adaptive system based on knowledge on the workings of the human brain.
The invention is defined by the appended claims.
According to a first aspect of the invention, there is provided a computerized system using word embedding for generating a list of words personalized to the learning needs of a user of the system at a given time instance, the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector having a position in an M-dimensional word embedding space, the system comprising processing circuitry and a memory configured to communicate with the processing circuitry. The processing circuitry is configured to obtain, via a first interface, a first input signal indicative of user specific system initialization settings and to initialize
the system, by assigning a respective score value to each numeric vector, based on the first input signal and a predetermined set of rules obtained from the memory and calculating the position of the center of mass of the M-dimensional word embedding space, at the initial time instance, based on the respective positions and score value of the numeric vectors comprised in the M-dimensional word embedding space. The processing circuitry is further configured to generate a list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated position of the center of mass.
The processing circuitry may further be configured to, repeatedly: obtain, via the first interface, a second input signal indicative of user input related to one or more of the numeric vectors comprised in the M-dimensional word embedding space, at a current time instance; adjust the settings of the system by updating the respective score value assigned to each of the one or more numeric vectors, based on the second input signal and the predetermined set of rules and calculating an updated position of the center of mass of the M-dimensional word embedding space, at the current time instance, based on the respective positions and updated score value of the numeric vectors comprised in the M-dimensional word embedding space; and generate an updated list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated updated position of the center of mass.
Embodiments described herein thereby solve the limitation of spaced repetition, by enabling the learners to expand their vocabulary by providing new relevant vocabulary information based on the vocabulary comprehension of the learner. Suitably, this enable learners to learn vocabulary faster and more efficiently for the user by providing personalized recommendations of words to focus on next, adapted to the individual learner/user of the system.
In one or more embodiments, the processing circuitry is configured to, before generating the list of words, or the updated list of words apply a filter mask centered at the position of the calculated center of mass of the M-dimensional word embedding space and determine a subset of numeric vectors comprising the numeric vectors that are inside the filter mask.
The processing circuitry may be configured to set the length of the list of words based on user input received via the first user interface or a second user interface or an input device connected to the system.
The memory may be configured to, for each time instance, store information on the calculated position of the center of mass and the respective associated time instance, at which the position of the center of mass was calculated. In these embodiments, the processing circuitry may further be
configured to, for two or more of the time instances for which information has been stored: retrieve information on the calculated position of the center of mass and the respective associated time instance the position was calculated; and determine the change in the position of the center of mass in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass and the respective associated time instance the position was calculated. The processing circuitry may further be configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass in the M-dimensional word embedding space over time.
The processing circuitry may further be configured to present a visualization of the determined change in the position of the center of mass in the M-dimensional word embedding space over time via the first user interface or the second user interface.
Advantageously, embodiments herein thereby provide the possibility to represent the vocabulary comprehension of a learner/user of the system, and possibly also to represent and/or track the progression of the vocabulary comprehension. The representation may be feedback into the system and be used as basis for further personalization of future recommendations, and/or it may be visualized via a user interface comprised in or connected to the system. If tracking is performed, the system may be configured to determine, based on the tracking of a number of learners, optimal paths for learning for an individual learner, i.e. an optimal order of being presented to different parts of the vocabulary and/or suitable activities to perform, in order to optimize the learning progress/vocabulary comprehension progress of the learner.
According to a second aspect of the invention, there is provided a method, in a computerized system, of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance, the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector having a position in an M-dimensional word embedding space. The method comprises obtaining, via a first interface, a first input signal indicative of user specific system initialization settings and initializing, using processing circuitry, the system, by assigning a respective score value to each numeric vector, based on the first input signal and a predetermined set of rules; and calculating the position of the center of mass of the M-dimensional word embedding space, at the initial time instance, based on the respective positions and score value of the numeric vectors comprised in the M-dimensional word embedding space. The method further comprises generating, using the processing circuitry, a list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated position of the center of mass.
In one or more embodiments, the method further comprises, repeatedly: obtaining, via the first interface, a second input signal indicative of user input related to one or more of the numeric vectors comprised in the M-dimensional word embedding space, at a current time instance; adjusting, using the processing circuitry, the settings of the system by: updating the respective score value assigned to each of the one or more numeric vectors, based on the second input signal and the predetermined set of rules; and calculating an updated position of the center of mass of the M-dimensional word embedding space, at the current time instance, based on the respective positions and updated score value of the numeric vectors comprised in the M-dimensional word embedding space; and finally generating, using the processing circuitry, an updated list of words personalized to the learning needs of a user based on the respective distances from the position of each numeric vector to the calculated updated position of the center of mass.
In one or more embodiments the method comprises, before generating the list of words, or the updated list of words: applying, using the processing circuitry, a filter mask centered at the position of the calculated center of mass of the M-dimensional word embedding space and determining, using the processing circuitry, a subset of numeric vectors comprising the numeric vectors that are inside the filter mask. In these embodiments, generating the list of words, or generating the updated list of words, comprises generating the list to only comprise words represented by numeric vectors in the determined subset of numeric vectors.
In some embodiments the method may further comprise setting, using the processing circuitry, the length of the list of words based on user input received via the first user interface or a second user interface or an input device connected to the system.
The method according to some embodiments comprises storing, in a memory of the system, information on the calculated position of the center of mass and the respective associated time instance at which the position of the center of mass was calculated. In these embodiments, the method may further comprise, for two or more of the time instances for which information has been stored: retrieving, using the processing circuitry, information on the calculated position of the center of mass and the respective associated time instance the position was calculated and determining, using the processing circuitry, the change in the position of the center of mass in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass and the respective associated time instance the position was calculated. The generating the list of words, or the updated list of words, using the processing circuitry, may in these embodiments further be based on the determined change in the position of the center of mass in the M-dimensional word embedding space over time. The method of these embodiments may further
comprise presenting, via the first user interface or the second user interface, a visualization of the determined change in the position of the center of mass in the M-dimensional word embedding space over time.
According to third aspect of the invention, there is provided a computer program loadable into a memory communicatively connected or coupled to at least one data processor, comprising software for executing the method according any of the method embodiments described herein when the program is run on the at least one data processor.
According to fourth aspect of the invention, there is provided a processor-readable medium, having a program recorded thereon, where the program is to make at least one data processor execute the method according to of any of the method embodiments described herein when the program is loaded into the at least one data processor.
The effects and/or advantages presented in the present disclosure for embodiments of the first aspect also apply to corresponding embodiments of the second, third and fourth aspect.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is now to be explained more closely by means of preferred embodiments, which are disclosed as examples, and with reference to the attached drawings.
Fig. 1 shows a schematic overview of a system according to one or more embodiment;
Fig. 2 is a flow chart of a computerized method for using word embedding for generating a list of words personalized to the learning needs of a user, according to one or more embodiment;
Fig. 3 is a flow chart of a computerized method for using word embedding for generating a list of words personalized to the learning needs of a user, according to one or more embodiment;
Fig. 4 is a flow chart of a computerized method for determining and possibly using information on a change in position of the center of mass over time, according to one or more embodiment;
Fig. 5 shows an oversimplified 2D representation of a word embedding space.
Figs. 6 to 7 show an illustrative example of center of mass calculation and updating in an oversimplified 2D representation of a word embedding space.
DETAILED DESCRIPTION
Introduction
Firstly, we provide some definition of terms used herein.
In the M dimensional word embedding space described for embodiments herein, M is in a non limiting example an integer around 300, but it may in different implementations range from 50 or 100 to several thousands, depending on factors such as the number of properties relevant to describe the word in the embedding space and the computational capabilities of the system used.
An oversimplified 2D representation of a word embedding space (unmarked axis) is shown in Figure 5. In the word embedding space, words that are similar to each other, based on the properties defined for the words (included in the word vectors) and pre-set rules and conditions, are positioned close to each other, while words that are less similar by the same standards are positioned far from each other in the word embedding space.
The center of mass is the unique point at the center of a distribution of mass in space, here the word embedding space, that has the property that the weighted position vectors relative to this point sum to zero. In analogy to statistics, the center of mass is the mean location of a distribution of mass in space. In the case of a system of particles P„ / = 1, ..., n, each with mass m ,· that are located in space with coordinates r„ / = 1, ..., n, the coordinates R of the center of mass satisfy the condition:
Solving this equation for R yields the formula:
where M is the sum of the masses of all of the particles.
The mass of a vector VWORD ("particle") in the word embedding space of this disclosure corresponds to a numeric "weight" determined based on the initial or current score value SMASTER assigned to the vector VwoRD· The mass may e.g. be determined as mass = l/log2(SMASTER), where SMASTER is larger than or equal to 2 after initialization, to make sure that the mass is always positive and smaller than or equal to 1, but any suitable conversion function may be used that fulfils the condition that as the value of SMASTER increases, the mass of a vector VWORD having an assigned score value SMASTER decreases. A non-limiting example is illustrated in Fig. 6, showing an oversimplified 2D representation of a word embedding system wherein the dots and circles represent the initialized words with different mastery score values. As described above, the higher the score value for the
word is, the smaller the mass is since the mass is. The triangle is the center of mass. When the system receives information that the learner/user of the system has learnt a new word, improved his/her understanding of a word, or has decayed in his/her knowledge of a word, the score values are updated according to embodiments herein, whereby the center of mass will move to a new position. This change in position of the center of mass is illustrated by the dashed arrow in Fig. 7.
In a non-limiting example, the score value SMASTER assigned to each numeric vector VWORD, based on the first input signal S|NPUT_I and a predetermined set of rules according to embodiments herein may be selected as one of the following values:
SMASTER = 0: meaning that the learner/user of the system has not been presented with the word before.
SMASTER = MIN: a preset minimum value > 0 representing a minimum score of mastery of the word.
MIN < SMASTER < MASTER: representing that the learner/user of the system is learning the word. A suitable number of internal levels between MIN and MAX may be applied, for example being represented as integers or float numbers.
SMASTER = MASTER: a preset maximum value meaning the learner/user of the system has mastered the word.
System architecture
Figure 1 shows a schematic overview of a computerized system 100 using word embedding for generating a list of words personalized to the learning needs of a user of the system 100 at a given time instance t.
The words on the list are selected from a plurality of words each represented as an M-dimensional numeric vector VWORD having a position PWORD in an M-dimensional word embedding space. The system 100 comprises processing circuitry 110 and a memory 120 configured to communicate with the processing circuitry 110. The processing circuitry 100 is configured to obtain, via a first interface 130, a first input signal S|NPUT_I indicative of user specific system initialization settings and to initialize the system 100, by assigning a respective score value SMASTER to each numeric vector VWORD, based on the first input signal S|NPUT_I and a predetermined set of rules obtained from the memory 120 and calculating the position of the center of mass PCM of the M-dimensional word embedding space, at the initial time instance t|NmAL, based on the respective positions PWORD and score value SMASTER of the numeric vectors VWORD comprised in the M-dimensional word embedding space. The processing circuitry is further configured to generate a list of words personalized to the learning needs of a user
based on the respective distances from the position PWORD of each numeric vector VWORD to the calculated position of the center of mass PCM·
This solve the limitation of spaced repetition, by enabling learners to expand their vocabulary by providing new relevant vocabulary information based on the vocabulary comprehension of the learner. Suitably, this enable learners to learn vocabulary faster and more efficiently for the user by providing personalized recommendations of words to focus on next, adapted to the individual learner/user of the system.
In one or more embodiment, the processing circuitry 110 is configured to, repeatedly: obtain, via the first interface 130, a second input signal S|NPUT 2 indicative of user input related to one or more of the numeric vectors VWORD comprised in the M-dimensional word embedding space, at a current time instance tCURRENT; and adjust the settings of the system 100 by updating the respective score value SMASTER assigned to each of the one or more numeric vectors VWORD, based on the second input signal SINPUT_2 and the predetermined set of rules and calculating an updated position of the center of mass P CM of the M-dimensional word embedding space, at the current time instance tCURRENT, based on the respective positions PWORD and updated score value SMASTER of the numeric vectors VWORD comprised in the M-dimensional word embedding space. Thereafter the processing circuitry 110 is configured to, for each time the second input signal S|NPUT 2 is obtained and system settings adjusted, generate an updated list of words personalized to the learning needs of a user based on the respective distances from the position PWORD of each numeric vector VWORD to the calculated updated position of the center of mass PCM·
Thereby, the personalized recommendations of words to focus on next are continuously adapted to the individual learner/user of the system, which further increases the relevance of the recommended words on the generated list to the user.
In one or more embodiments, the processing circuitry 110 is configured to, before generating the list of words, or the updated list of words: apply a filter mask centered at the position of the calculated center of mass PCM of the M-dimensional word embedding space; and determine a subset of numeric vectors VWORD comprising the numeric vectors VWORD that are inside the filter mask, wherein the processing circuitry 110 is further configured to generate the list of words, or generate the updated list of words, to only comprise words represented by numeric vectors VWORD in the determined subset of numeric vectors VWORD- The filter mask has the same dimension as the word embedding space and hence filters in all dimensions, using the same value/search radius for all dimensions, or differentiated values/search radii for different dimensions. The filter mask is pre-defined/pre- calculated.
As the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and faster.
The processing circuitry 110 may be configured to set the length of the list of words based on user input received via the first user interface 130 or a second user interface 140 or an input device 150 connected to the system 100. Thereby, the user is enabled to select the length of the list of words to focus on and hence control the pace at which the learning progresses to suit the needs of the user.
The memory 120 may be configured to, for each time instance t|NmAL, ICURRENT/ store information on the calculated position of the center of mass PCM and the respective associated time instance t|NmAu tcuRRENT at which the position of the center of mass PCM was calculated. Thereby, the vocabulary knowledge status of the learner can be determined at one or more given time instances.
The memory 120 may in these embodiments be configured to, for each time instance t|NmAL, f CURRENT/ store information on the calculated position of the center of mass PCM and the respective associated time instance t|NmAL, ICURRENT at which the position of the center of mass PCM was calculated for more than one learner/user 155 connected to the system, whereby comparison of the vocabulary knowledge status of the learners at one or more given time instances is enabled.
In some embodiments, the memory may further be configured to store the score values SMASTER assigned to each or a selection of the words represented by numeric vectors VWORD in the word embedding system, or the determined subset of numeric vectors VWORD, at the respective associated time instance t|NmAL, tCURRENT· Thereby a more granular determination of the vocabulary knowledge status of the learner can be made at one or more given time instances. If the memory is configured to store the score values SMASTER for more than one learner/user 155 connected to the system in this manner, a more granular comparison of the vocabulary knowledge status of the learners at one or more given time instances is correspondingly enabled.
The processing circuitry 110 may further be configured to, for two or more time instances for which information has been stored: retrieve information on the calculated position of the center of mass P CM and the respective associated time instance the position was calculated and determine the change in the position of the center of mass PCM in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass PCM and the respective associated time instance the position was calculated. The processing circuitry 110 may in these embodiments be configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass PCM in the M-dimensional word embedding space over time. Alternatively, or additionally, the processing circuitry 110 may be
configured to present a visualization of the determined change in the position of the center of mass P CM in the M-dimensional word embedding space over time via the first user interface 130 or the second user interface 140. Advantageously, embodiments herein thereby provide the possibility to represent the vocabulary comprehension of the learner/user of the system 100, and possibly also to represent and/or track the progression of the vocabulary comprehension. The representation may be feedback into the system and be used as basis for further personalization of future recommendations, and/or it may be visualized via a user interface comprised in or connected to the system. If tracking is performed, the system may be configured to determine, based on the tracking of a number of learners, optimal paths for learning for an individual learner, i.e. an optimal order of being presented to different parts of the vocabulary and/or suitable activities to perform, in order to optimize the learning progress/vocabulary comprehension progress of the learner.
Method embodiments
Turning now to figure 2, there is shown a method, in a computerized system 100, of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance t, the words on the list being selected from a plurality of words each represented as an M- dimensional numeric vector VWORD having a position PWORD in an M-dimensional word embedding space, the method comprising:
In step 210: obtaining, via a first interface 130, a first input signal S|NPUT_I indicative of user specific system initialization settings.
The user specific system initialization settings may comprise results of a test performed by the user and input into the system via a digital learning environment (program application or the like). Alternatively, the user specific system initialization settings may be input to the system as manual input from the user, a teacher, or another interested party - e.g. by enabling selection of learning preferences in a displayed menu via a user interface or input as a signal from the system, or an external program application communicatively connected to the system, wherein the signal represents results of a placement test or the like. Alternatively, if no specific input has been made, the initialisation setting may comprise pre-set default values.
In step 220: initializing, using processing circuitry 110, the system 100.
The initialization of step 220 includes two sub-steps 222, 224, comprising:
In sub-step 222: assigning a respective score value SMASTER to each numeric vector VWORD, based on the first input signal S|NPUT_I and a predetermined set of rules.
In some embodiments, the first input signal S|NPUT_I may comprise the respective score values SMASTER and the predetermined set of rules define that the respective score values are to be assigned to the numeric vectors VWORD· In some embodiments, the first input signal S|NPUT_I may comprise score values SMASTER for some of the numeric vectors VWORD in the M-dimensional word embedding space and the rules further comprise how to approximate score values numeric vectors VWORD for groups/clusters of words based on the provided score values SMASTER- Alternatively, or in combination, the first input signal S|NPUT_I may comprise an estimated "mastery level" for one or more of the numeric vectors VWORD and the rules may comprise how the words/vectors in the M-dimensional word embedding space are to be scored for users of different mastery levels. Alternatively, the first input signal S|NPUT_I may comprise only default values (if no specific values are available for the user) and the rules may comprise how to score the words/vectors based on the default values.
In sub-step 224: calculating the position of the center of mass PCM of the M-dimensional word embedding space, at the initial time instance t|NmAL, based on the respective positions PWORD and score value SMASTER of the numeric vectors VWORD comprised in the M-dimensional word embedding space.
After initialization of the system settings, the method shown in Figure 2 further comprises:
In step 240: generating, using the processing circuitry 110, a list of words personalized to the learning needs of a user based on the respective distances from the position PWORD of each numeric vector VWORD to the calculated position of the center of mass PCM·
In one or more embodiments, generating the list of words personalized to the learning needs of a user based on the respective distances from the position PWORD of each of the numeric vector VWORD to the calculated position of the center of mass PCM e.g. comprises generating a list comprising the N words that are represented by the N numeric vector VWORD with a respective position PWORD closest to the position of the center of mass PCM in the M-dimensional word embedding space, wherein N being is an integer > 0 representing the length of the word list.
In other embodiments, generating the list of words personalized to the learning needs of a user based on the respective distances from the position PWORD of each of the numeric vector VWORD to the calculated position of the center of mass PCM e.g. comprises generating a list comprising all words represented by a numeric vector VWORD with a position PWORD less than the pre-set distance d from the position of the center of mass PCM in the M-dimensional word embedding space.
After generation of the list of words personalized to the learning needs of a user, the method may further comprise presenting the list or words to the user/learner via a user interface 130, 140,
thereby enabling the user to perform self-study or digitally assisted study of the words selected as optimal for the individual user. Alternatively, or in combination, the method may further comprise inputting the list of words into the system 100, or a different computerized system, wherein the system 100 (or other system) is configured to provide digital language learning assignments or actions to the user based on the words on the list.
In some embodiments, the method shown in Figure 2 further comprises, before step 240 of generating the list of words:
In an optional step 230: applying, using the processing circuitry 110, a filter mask centered at the position of the calculated center of mass PCM of the M-dimensional word embedding space; and determining, using the processing circuitry 110, a subset of numeric vectors VWORD comprising the numeric vectors VWORD that are inside the filter mask.
In embodiments wherein step 230 is performed, the method step 240 of generating the list of words comprises generating the list to only comprise words represented by numeric vectors VWORD in the determined subset of numeric vectors VWORD- AS the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and thereby also faster.
The method of figure 2 may further comprise, as shown in Figure 3, performing the following steps, repeatedly at selected (pre-set of user input) time intervals or points in time:
In step 310: obtaining, via the first interface 130, a second input signal S|NPUT 2 indicative of user input related to one or more of the numeric vectors VWORD comprised in the M-dimensional word embedding space, at a current time instance tCURRENT·
The second input signal S|NPUT 2 may comprise results of a test, assignment, one or more action, or the like performed by the user and input into the system via a digital learning environment (program application or the like). The input may be made via a user interface 130, 140 or an input device 150 and may comprise text input, voice input, selections, and/or other information. Alternatively, the input to the system may be manual input performed by a teacher, or another interested party, via a user interface 130, 140 or input device 150, relating to the learning process of the learner.
In step 320: adjusting, using the processing circuitry 110, the settings of the system 100.
In the embodiments shown in Figure 3, the adjusting of step 320 comprises two sub-steps 322, 324, comprising:
In sub-step 322: updating the respective score value SMASTER assigned to each of the one or more numeric vectors VWORD, based on the second input signal S|NPUT 2 and the predetermined set of rules.
The predetermined set of rules may include that a score value SMASTER assigned to a numeric vector VWORD should be increased (e.g. updated to the next, higher, level or a defined number of levels closer to the maximum mastery level) if the second input signal S|NPUT 2 comprises information indicating that the learner/user 155 of the system has e.g. performed and/or answered correctly to an exercise including the word represented by the numeric vector VWORD-
The predetermined set of rules may further include that a score value SMASTER assigned to a numeric vector VWORD should be decreased (e.g. updated to the next, lower, level or a defined number of levels closer to the minimum mastery level) if the time since the learner/user of the system was last presented with the word represented by the numeric vector VWORD in an assignment, as indicated by the second input signal S|NPUT 2 or based on one or more previously received second input signals SINPUT_2, exceeds a preset threshold.
In sub-step 324: calculating an updated position of the center of mass PCM of the M-dimensional word embedding space, at the current time instance tCURRENT, based on the respective positions PWORD and updated score value SMASTER of the numeric vectors VWORD comprised in the M-dimensional word embedding space.
In other words, when the mastery level and score values SMASTER change, the position of Pcm will move within the M-dimensional word embedding space. If the mastery level and score values SMASTER related to a certain word or group of words (e.g. neighbouring words in the word embedding space) increases, the position of Pcm will move closer to un-initialized parts of the vocabulary, as higher score values means that the "mass" of the numeric vector VWORD being assigned the score values decreases.
After adjustment of the system settings, one or more method embodiments shown in Figure 3 further comprises:
In step 340: generating, using the processing circuitry 110, an updated list of words personalized to the learning needs of a user based on the respective distances from the position PWORD of each numeric vector VWORD to the calculated updated position of the center of mass PCM·
In one or more embodiments, generating the updated list of words personalized to the learning needs of a user based on the respective distances from the position PWORD of each of the numeric vector VWORD to the updated position of the center of mass PCM e.g. comprises generating a list
comprising the N words that are represented by the N numeric vector VWORD with a respective position PWORD closest to the updated position of the center of mass PCM in the M-dimensional word embedding space, wherein N being is an integer > 0 representing the length of the word list.
In other embodiments, generating the updated list of words personalized to the learning needs of a user based on the respective distances from the position PWORD of each of the numeric vector VWORD to the updated position of the center of mass PCM e.g. comprises generating a list comprising all words represented by a numeric vector VWORD with a position PWORD less than the pre-set distance d from the updated position of the center of mass PCM in the M-dimensional word embedding space.
After generation of the updated list of words personalized to the learning needs of a user, the method may further comprise presenting the list or words to the user/learner via a user interface 130, 140, thereby enabling the user to perform self-study or digitally assisted study of the words selected as optimal for the individual user. Alternatively, or in combination, the method may further comprise inputting the list of words into the system 100, or a different computerized system, wherein the system 100 (or other system) is configured to provide digital language learning assignments or actions to the user based on the words on the list.
In similarity to what is described in connection with Figure 2, the method shown in Figure 3 may in some embodiments further comprise, before step 340 of generating the updated list of words:
In an optional step 330: applying, using the processing circuitry 110, a filter mask centered at the position of the calculated center of mass PCM of the M-dimensional word embedding space; and determining, using the processing circuitry 110, a subset of numeric vectors VWORD comprising the numeric vectors VWORD that are inside the filter mask.
In embodiments wherein step 330 is performed, the method step 340 of generating the updated list of words comprises generating the list to only comprise words represented by numeric vectors VWORD in the determined subset of numeric vectors VWORD- AS the number of words/vectors to be considered is lowered by the filtering, the generation of the list of words is less computationally expensive and thereby also faster.
In some embodiments, the method described in connection with Figure 2 and/or 3 may further comprise setting, using the processing circuitry 110, the length of the list of words based on user input received via the first user interface 130 or a second user interface 140 or an input device 150 connected to the system 100. Thereby, the user of the system, i.e. the learner or another user influencing the learning process such as a teacher, instructor, supervisor, or other is enabled to control the amount of words recommended for study. Alternatively, the length of the list may be
pre-set in the system, and/or be dynamically adjusted by the system depending on pre-set rules and conditions.
Using the M-dimensional word embedding system, not only can recommendations be made for future study of vocabulary for optimized learning, but the vocabulary knowledge status of the learner, or each of several learners, connected to the system 100 can be determined for a given time instance by determining the current position of the center of mass PCM for each specified learner.
Furthermore, the learning progress of the learner, or each of several learners, connected to the system 100 can be monitored over time, by determining the current position of the center of mass P CM for each specified learner at more than one time instance, e.g. at a number of consecutive time instances, and the information thus gathered can be presented visually via a user interface and/or be feedback into the system to further enhance future recommendations.
The information may also be used for training the machine learning algorithm generating the word embedding system so that the accuracy of the word embedding system may be continuously improved.
In Figure 4, method embodiments relating to determining the current position of the center of mass P CM for a current learner/user 155 of the system are shown, the method embodiments comprising:
In step 410: storing, in a memory 120 of the system 100, information on the calculated position of the center of mass PCM and the respective associated time instance t|NmAu ICURRENT at which the position of the center of mass PCM was calculated. Thereby, the vocabulary knowledge status of the learner can be determined at one or more given time instances.
If step 410 is performed for more than one learner/user 155 connected to the system, comparison of the vocabulary knowledge status of the learners at one or more given time instances is enabled.
The storing of step 410 may further comprise storing the score values SMASTER assigned to each or a selection of the words represented by numeric vectors VWORD in the word embedding system or the determined subset of numeric vectors VWORD, at the respective associated time instance t|NmAL, tcuRRENT· Thereby a more granular determination of the vocabulary knowledge status of the learner can be made at one or more given time instances. If step 410 is performed for more than one learner/user 155 connected to the system, a more granular comparison of the vocabulary knowledge status of the learners at one or more given time instances is correspondingly enabled.
In step 420: For two or more of the time instances for which information has been stored: retrieving, using the processing circuitry 110, information on the calculated position of the center of mass PCM and the respective associated time instance the position was calculated.
In embodiments wherein step 410 comprises storing the score values SMASTER assigned to each or a selection of the words represented by numeric vectors VWORD in the word embedding system or the determined subset of numeric vectors VWORD, at the respective associated time instance t|NmAu tcuRRENT, step 420 may further comprise, for the two or more of the time instances for which information has been stored: retrieving, using the processing circuitry 110, information on the score values SMASTER assigned to each or the selection of the words represented by numeric vectors VWORD in the word embedding system or the determined subset of numeric vectors VWORD at the respective associated time instance t|NmAL, f CURRENT* In step 430: for the two or more time instances in step 420: determining, using the processing circuitry 110, the change in the position of the center of mass PCM in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass PCM and the respective associated time instance the position was calculated.
Thereby, the learning progress of the learner, over time, may be determined.
If steps 410 and 420 is performed for more than one learner/user 155 connected to the system, the learning progress of more than one learner may be determined in step 430, and a comparison of the learning progress of the learners, at one or more given time instances, is enabled.
In some embodiments, generating the list of words in Step 240, and/or generating the updated list of words in Step 340, may further be based on the determined change in the position of the center of mass P CM in the M-dimensional word embedding space over time.
In an optional step 440: presenting, via the first user interface 130 or the second user interface 140, a visualization of the calculated position of the center of mass PCM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass PCM in the M- dimensional word embedding space over time.
Thereby, the learner, or any other interested party, may view and thereby better understand the vocabulary knowledge status of the learner at a given time, and/or the learning progress of the learner over time. Any anomalies, such as decreasing knowledge or progress slowing down may hence easily be detected and action taken.
In embodiments wherein step 420 comprises retrieving, using the processing circuitry 110, information on the score values SMASTER assigned to each or the selection of the words represented by numeric vectors VWORD in the word embedding system or the determined subset of numeric vectors VWORD, the presenting of step 440 may further comprise presenting information on the score values SMASTER assigned to each or the selection of the words represented by numeric vectors VWORD in the word embedding system or the determined subset of numeric vectors VWORD-
If steps 410 and 420, and optionally also step 439, is performed for more than one learner/user 155 connected to the system, step 440 may comprise presenting a visualization of the calculated position of the center of mass PCM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass PCM in the M-dimensional word embedding space over time, for the more than one learner/user 155. Thereby, a visualization of the learning progress of the learners, at one or more given time instances, is provided.
In an optional step 450: Feeding back to the system the calculated position of the center of mass PCM in the M-dimensional word embedding space, and/or the determined change in the position of the center of mass (PCM) in the M-dimensional word embedding space over time.
Of course, step 450 may also be performed with regard to more than one learner/user 155 connected to the system.
Thereby, information on the vocabulary knowledge status of a learner, or several learners connected to the system, at a given time, and/or the learning progress of the learner, or learners, over time may be used to further enhance future recommendations and/or to be used for training the machine learning algorithm generating the word embedding system so that the accuracy of the word embedding system is continuously improved.
The processing circuitry 110 may further be configured to perform the steps and functions according to any of the method embodiments described herein.
Further embodiments
All of the process steps, as well as any sub-sequence of steps, described with reference to Fig. 2 above may be controlled by means of a programmed data processor. Moreover, although the embodiments of the invention described above with reference to the drawings comprise processing circuitry, the invention thus also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as in partially compiled
form, or in any other form suitable for use in the implementation of the process according to the invention. The program may either be a part of an operating system or be a separate application. The carrier may be any entity or device capable of carrying the program. For example, the carrier may comprise a storage medium, such as a Flash memory, a ROM (Read Only Memory), an EPROM (Erasable Programmable Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read only Memory), or a magnetic recording medium, for example a floppy disc or hard disc. Further, the carrier may be a transmissible carrier such as an electrical or optical signal which may be conveyed via electrical or optical cable or by radio or by other means. When the program is embodied in a signal which may be conveyed directly by a cable or other device or means, the carrier may be constituted by such cable or device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant processes.
Program code, which, when run by the processing circuitry 110, causes the system 100 to perform the method according to any of the method embodiments herein may already be pre-stored in an internal memory 120 of the system 100. The processor 110 is in such embodiments communicably connected to the memory 120.
In one or more embodiments, there may be provided a computer program loadable into a memory communicatively connected or coupled to at least one data processor, e.g. the processor 110, comprising software for executing the method according any of the embodiments herein when the program is run on the at least one processor 110.
In one or more further embodiment, there may be provided a processor-readable medium, having a program recorded thereon, where the program is to make at least one data processor, e.g. the processor 110, execute the method according to of any of the embodiments herein when the program is loaded into the at least one data processor.
The invention is not restricted to the described embodiments in the figures but may be varied freely within the scope of the claims.
Claims
1) Computerized system (100) using word embedding for generating a list of words personalized to the learning needs of a user of the system (100) at a given time instance (t), the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector (VWORD) having a position (PWORD) in an M-dimensional word embedding space, the system (100) comprising: processing circuitry (110); and a memory (120) configured to communicate with the processing circuitry (110); wherein the processing circuitry (100) is configured to: obtain, via a first interface (130), a first input signal (S|NPUT i) indicative of user specific system initialization settings; initialize the system (100), by: assigning a respective score value (SMASTER) to each numeric vector (VWORD), based on the first input signal (S|NPUT i) and a predetermined set of rules obtained from the memory (120); and calculating the position of the center of mass (PCM) of the M-dimensional word embedding space, at the initial time instance (t|NmAL), based on the respective positions (PWORD) and score value (SMASTER) of the numeric vectors (VWORD) comprised in the M- dimensional word embedding space; and generate a list of words personalized to the learning needs of a user based on the respective distances from the position (PWORD) of each numeric vector (VWORD) to the calculated position of the center of mass (PCM)·
2) The system (100) of claim 1, wherein the processing circuitry (110) is further configured to, repeatedly: obtain, via the first interface (130), a second input signal (S|NPUT _2) indicative of user input related to one or more of the numeric vectors (VWORD) comprised in the M-dimensional word embedding space, at a current time instance (tCURRENT); adjust the settings of the system (100) by:
updating the respective score value (SMASTER) assigned to each of the one or more numeric vectors (VWORD), based on the second input signal (S|N PUT_2) and the predetermined set of rules; and calculating an updated position of the center of mass (PCM) of the M-dimensional word embedding space, at the current time instance ^CURRENT), based on the respective positions (PWORD) and updated score value (SMASTER) of the numeric vectors (VWORD) comprised in the M-dimensional word embedding space; and generate an updated list of words personalized to the learning needs of a user based on the respective distances from the position (PWORD) of each numeric vector (VWORD) to the calculated updated position of the center of mass (PCM)·
3) The system (100) of claim 1 or 2, wherein in the processing circuitry (110) is configured to, before generating the list of words, or the updated list of words: apply a filter mask centered at the position of the calculated center of mass (PCM) of the M- dimensional word embedding space; and determine a subset of numeric vectors (VWORD) comprising the numeric vectors (VWORD) that are inside the filter mask, wherein the processing circuitry (110) is configured to generate the list of words, or generate the updated list of words, to only comprise words represented by numeric vectors (VWORD) in the determined subset of numeric vectors (VWORD)·
4) The computerized system (100) of any of the preceding claims, wherein the processing circuitry (110) is configured to set the length of the list of words based on user input received via the first user interface (130) or a second user interface (140) or an input device (150) connected to the system (100).
5) The system (100) of any of the preceding claims, wherein the memory (120) is configured to, for each time instance (t|NmAL, f CURRENT)/ store information on the calculated position of the center of mass (PCM) and the respective associated time instance (t|NmAL, f CURRENT) at which the position of the center of mass (PCM) was calculated.
6) The system (100) of claim 5, wherein the processing circuitry (110) is configured to, for two or more of the time instances for which information has been stored:
retrieve information on the calculated position of the center of mass (PCM) and the respective associated time instance the position was calculated; and determine the change in the position of the center of mass (PCM) in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass (PCM) and the respective associated time instance the position was calculated.
7) The system (100) of claim 6, wherein the processing circuitry (110) is configured to generate the list of words, or the updated list of words, also based on the determined change in the position of the center of mass (PCM) in the M-dimensional word embedding space over time.
8) The system (100) of claim 6, wherein the processing circuitry (110) is configured to present a visualization of the determined change in the position of the center of mass (PCM) in the M- dimensional word embedding space over time via the first user interface (130) or the second user interface (140).
9) A method, in a computerized system (100), of using word embedding for generating a list of words personalized to the learning needs of a user at a given time instance (t), the words on the list being selected from a plurality of words each represented as an M-dimensional numeric vector (VWORD) having a position (PWORD) in an M-dimensional word embedding space, the method comprising: obtaining, via a first interface (130), a first input signal (S|NPUT i) indicative of user specific system initialization settings; initializing, using processing circuitry (110), the system (100), by: assigning a respective score value (SMASTER) to each numeric vector (VWORD), based on the first input signal (S|NPUT i) and a predetermined set of rules; and calculating the position of the center of mass (PCM) of the M-dimensional word embedding space, at the initial time instance (t|NmAL), based on the respective positions (PWORD) and score value (SMASTER) of the numeric vectors (VWORD) comprised in the M- dimensional word embedding space; and generating, using the processing circuitry (110), a list of words personalized to the learning needs of a user based on the respective distances from the position (PWORD) of each numeric vector (VWORD) to the calculated position of the center of mass (PCM)·
10) The method of claim 9, further comprising, repeatedly: obtaining, via the first interface (130), a second input signal (S|NPUT 2) indicative of user input related to one or more of the numeric vectors (VWORD) comprised in the M-dimensional word embedding space, at a current time instance (tCURRENT); adjusting, using the processing circuitry (110), the settings of the system (100) by: updating the respective score value (SMASTER) assigned to each of the one or more numeric vectors (VWORD), based on the second input signal (S|N PUT_2) and the predetermined set of rules; and calculating an updated position of the center of mass (PCM) of the M-dimensional word embedding space, at the current time instance ^CURRENT), based on the respective positions (PWORD) and updated score value (SMASTER) of the numeric vectors (VWORD) comprised in the M-dimensional word embedding space; and generating, using the processing circuitry (110), an updated list of words personalized to the learning needs of a user based on the respective distances from the position (PWORD) of each numeric vector (VWORD) to the calculated updated position of the center of mass (PCM)·
11) The method of claim 9 or 10, further comprising, before generating the list of words, or the updated list of words: applying, using the processing circuitry (110), a filter mask centered at the position of the calculated center of mass (PCM) of the M-dimensional word embedding space; and determining, using the processing circuitry (110), a subset of numeric vectors (VWORD) comprising the numeric vectors (VWORD) that are inside the filter mask, wherein the method step of generating the list of words, or generating the updated list of words, comprises generating the list to only comprise words represented by numeric vectors (VWORD) in the determined subset of numeric vectors (VWORD)·
12) The method of any of the claims 9 to 11, further comprising setting, using the processing circuitry (110), the length of the list of words based on user input received via the first user interface (130) or a second user interface (140) or an input device (150) connected to the system (100).
13) The method of any of the claims 9 to 12, further comprising storing, in a memory (120) of the system (100), information on the calculated position of the center of mass (PCM) and the
respective associated time instance (t|NmAu tcuRREN-r) at which the position of the center of mass (PCM) was calculated.
14) The method of claim 13, further comprising, for two or more of the time instances for which information has been stored: - retrieving, using the processing circuitry (110), information on the calculated position of the center of mass (PCM) and the respective associated time instance the position was calculated; and determining, using the processing circuitry (110), the change in the position of the center of mass (PCM) in the M-dimensional word embedding space over time, based on the information on the two or more calculated position of the center of mass (PCM) and the respective associated time instance the position was calculated.
15) The method of claim 14, wherein generating the list of words, or the updated list of words, using the processing circuitry (110), is further based on the determined change in the position of the center of mass (PCM) in the M-dimensional word embedding space over time. 16) The method of claim 14, further comprising presenting, via the first user interface (130) or the second user interface (140), a visualization of the determined change in the position of the center of mass (PCM) in the M-dimensional word embedding space over time.
17) A computer program loadable into a memory communicatively connected or coupled to at least one data processor, comprising software for executing the method according any of the method claims 9 to 16 when the program is run on the at least one data processor.
18) A processor-readable medium, having a program recorded thereon, where the program is to make at least one data processor execute the method according to of any of the method claims 9 to 16 when the program is loaded into the at least one data processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2019/081498 WO2021093968A1 (en) | 2019-11-15 | 2019-11-15 | Computerized system and method of using word embedding for generating a list of words personalized to the learning needs of a user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2019/081498 WO2021093968A1 (en) | 2019-11-15 | 2019-11-15 | Computerized system and method of using word embedding for generating a list of words personalized to the learning needs of a user |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021093968A1 true WO2021093968A1 (en) | 2021-05-20 |
Family
ID=68610226
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2019/081498 WO2021093968A1 (en) | 2019-11-15 | 2019-11-15 | Computerized system and method of using word embedding for generating a list of words personalized to the learning needs of a user |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021093968A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116011456A (en) * | 2023-03-17 | 2023-04-25 | 北京建筑大学 | Chinese building specification text entity identification method and system based on prompt learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140134576A1 (en) * | 2012-11-09 | 2014-05-15 | Microsoft Corporation | Personalized language learning using language and learner models |
EP3514783A1 (en) * | 2018-01-17 | 2019-07-24 | Signum International AG | Contextual language learning device, system and method |
-
2019
- 2019-11-15 WO PCT/EP2019/081498 patent/WO2021093968A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140134576A1 (en) * | 2012-11-09 | 2014-05-15 | Microsoft Corporation | Personalized language learning using language and learner models |
EP3514783A1 (en) * | 2018-01-17 | 2019-07-24 | Signum International AG | Contextual language learning device, system and method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116011456A (en) * | 2023-03-17 | 2023-04-25 | 北京建筑大学 | Chinese building specification text entity identification method and system based on prompt learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11900232B2 (en) | Training distilled machine learning models | |
CN110991195B (en) | Machine translation model training method, device and storage medium | |
KR20180125905A (en) | Method and apparatus for classifying a class to which a sentence belongs by using deep neural network | |
CN109036389A (en) | The generation method and device of a kind of pair of resisting sample | |
CN110546608A (en) | Artificial intelligence cognitive threshold | |
CN110737339B (en) | Visual-tactile interaction model construction method based on deep learning | |
KR102410110B1 (en) | How to provide Korean language learning service | |
CN107194151A (en) | Determine the method and artificial intelligence equipment of emotion threshold value | |
CN108369669A (en) | Automatic problem assessment in machine learning system | |
CN117808946B (en) | Method and system for constructing secondary roles based on large language model | |
US20190116093A1 (en) | Simulating a user score from input objectives | |
WO2021093968A1 (en) | Computerized system and method of using word embedding for generating a list of words personalized to the learning needs of a user | |
CN112307176B (en) | Method and device for guiding user to write | |
KR102183894B1 (en) | Computer based training estimation system using virtual reality and operating method of thereof | |
US10832586B2 (en) | Providing partial answers to users | |
US11081016B2 (en) | Personalized syllabus generation using sub-concept sequences | |
KR20210020756A (en) | Synonym-stem based korean verb learning tool | |
US10074290B2 (en) | Language training apparatus, method and computer program | |
CN115712739A (en) | Dance action generation method, computer device and storage medium | |
KR20140051607A (en) | Apparatus providing analysis information based on level of a student and method thereof | |
US20220180764A1 (en) | Method and system for generating a training platform | |
JP2018189726A (en) | Device, method, and program for processing information | |
CN115083222A (en) | Information interaction method and device, electronic equipment and storage medium | |
GB2622963A (en) | Systems, methods and devices for predicting personalized biological state with model produced with meta-learning | |
Sarkar et al. | Adaptive E-learning using deterministic finite automata |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19805636 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19805636 Country of ref document: EP Kind code of ref document: A1 |