Times Change and Your Training Data Should Too: The Effect of Training Data Recency on Twitter Classifiers

Paper Type:  Article review
Pages:  7
Wordcount:  1791 Words
Date:  2022-10-21
Categories: 

Introduction

Erudite opponents are relocating their botnet command and control system towards social media platforms like Twitter. Therefore, as Ryan (2018) states in the article, "Times Change and Your Training Data Should Too: The Effect of Training Data Recency on Twitter Classifiers," the security practitioners whose responsibility is the successful identification of innovative techniques for distracting and distinguishing such botnets which include machine-learning approaches should be able to comprehend the effect of training data recency on their performance of their classifiers. Additionally, the article outpoints at the performance of some of the binary classifiers together with their capability to differentiate varied from non-varied tweets given the change in the offset of data training and data test. Ryan (2018) has it that each and every individual should develop a better understanding of how classifiers can differentiate 'trusted' from 'untrusted' tweets because Botnets are using more sophisticated techniques not only for communication but also to conceal their presence.

Trust banner

Is your time best spent reading someone else’s essay? Get a 100% original essay FROM A CERTIFIED WRITER!

Recency on Twitter refers to the retrieval and ranking of communication messages on basis of their significance and freshness. This is a new innovation that has been embedded on the technological changes that have become part of the Social Media Changes. the impoverished in-links and click messages is one of the greatest challenges for data recency ranking which is why having the understanding of training data is likely to result into the establishment of effective classifiers (Chang et al., 2013). Today, the freshness of messages on Twitter has become a common trend, whereby a user is automatically treated to endless streams of new information based on their previous searches and indicated areas of interest. Generally, according to Fang et al., (2015), accurate classifiers can be designed by leveraging the differences in the usage of a feature across different topics and as of such would lead to better methods for detection and identification of covert botnet networks. Twitter has been able to integrate a user's previous searches on Google and other internet sources; they use this information to provide one with the contents, based on the assumption of their interest. Botnets are generally regarded as massive and distribution channels of zombies or bots which are typically seen in the form of hosts infected by malware or unwitting participants which depend on command and control channels to receive, respond and execute commands from a botmaster (Tao, Abel, Hauff & Houben, 2012).

Given the increasing technological advancements, social media platforms like Twitter are becoming more important sources of information regarding public opinion about some issues like political elections and even business information like stock marketing. Many organizations use Twitter as a critical tool of communication, as a source of contact with the clients and their partners. Twitter remains an important tool of marketing in from organizations, political outfits as well as the household individuals. Therefore, the complexity of Twitter data raises various challenges on the performance of such estimations because twitter data share some traits of time sequence and other traits of static data (Peng, Jiang, Wang & Sipei, 2014). Not many people understand the details involved in the Twitter data usage; not many learning institutions teach the same to the students pursuing courses in information technology. A time sequence data trait is the assortment of observations at various intermediate time points and they are generally dependent on the preceding observations. The predictions are based on previous action and performance online by a user. Therefore, an individual tends to see a higher percentage of information of their interests at all times; this is the recency technology that exists in the Twitter and other Social Media platforms (Peng, Jiang, Wang & Sipei, 2014). Even though Twitter posts can be assumed not to be dependent on the previous posts, there is a potential indirect dependence which is portrayed in significant trends and even events by influential users (Mozetic et al., 2018). Therefore, an appropriate approach of classifying tweets and microblogs like the training of data recency on a machine is very important to aid in the automatic transfer of labels from one social site to another, for instance, the in-link between tweets and a YouTube video (Magdy et al., 2015).

According to Burnap et al., (2015), various online social media platforms with counts of millions of active users are as well being used by cybercriminals to distribute malicious software attacks (malware) at a very increasing rate with the intention to exploit the vulnerabilities on the machines of the active users for individual benefits. The virus attacks on the computer devices are not anything new in the modern technologies with the target to defraud the unsuspecting online consumers. The malware appears pop-ups and prompts the users to follow a particular link or to respond to a particular message or application. Twitter is one of the most susceptible social media platforms to such cyber-criminal activities despite its 140 characters limit because that has become common for individuals to include directive links on their tweets to more detailed information or even news reports. The modern online platforms stores up to millions of passwords and security details of different users, this makes it nearly impossible by the users to certainly guard their details online and or to be cushioned from the sprawling streams of attack that have left the society exposed on its wake. The fact that URLs are always shortened and the endpoint cannot be determined until an individual click the link, has given cybercriminals an opportunity to exploit and distribute malicious links on Twitter which can perform malicious actions on the machine of an active use (Nguyen, Woon & Ng, 2015). Therefore the training of a classifier using machine activity logs created in the interaction with URLs extracted from Twitter like in the article by Ryan enables effective training of data recency which helps classifiers to distinguish between verified from non-verified data and explain the relationship between the activities of a machine and malicious software behavior (Mozetic, Torgo, Cerqueira & Smailovic, 2018).

The article by Ryans (2018) to develop Twitter classifier which is likely to differentiate between the verified tweets which he regards as the trusted tweets and the non-verified which he regards as the untrusted tweets is very significant because of the real-time search of information which depends on the capability to retrieve fresh content is increasingly in demand. Despite the fact that Twitter maintains a specialized search engine, there are also various search engines with index content that run across real-time information (Mozetic, Torgo, Cerqueira & Smailovic, 2018). A case example of breaking news alert, search queries may be able to return fresh information even though they can often face challenges of coverage. The search options will pick the latest information from the streams of massive data storage and ensure that it only gives the client the latest information from the streams of information bank. The article has investigated Twitter as a research topic in the analysis of social networks by using it to address the issue of training data recency. However, in order to effectively disrupt a botnet, it implies the disabling of a botmaster to effectively hide the traffic of command and control channels amongst the noise legitimate traffics on twitter, which is a concept taken into consideration by Ryan in the development of a classifier and its training to distinguish botnets on Twitter (Magdy, Walid & Sajjad, El-Ganainy, & Sebastiani, 2015).

The effect of training data recency as depicted in the article by Ryan (2018) induces adaptive extrapolative simulations in the real-time from high amount data streams which are one of the most challenging areas of massive data analytics. Unique challenges are, therefore, imposed in comparing predictive data excavating from bunch data given the fact that data streams may comprise of changes of pattern encrypted in the stream over time. The growing value of data stream classification methods is depicted in various commercial applications which include the management of internet traffic and the analysis of weblogs. The analysis of the weblogs provides yet the surest indication of the most reliable method of telling and predicting the Twitter users and their trends (Gupta, Aggarwal & Kumaraguru, 2014). However, the research has shown that ML classifiers can differentiate between the verified and non-verified tweets, an aspect that is very significant in the training of data recency (Magdy & Elsayed, 2014).

Generally, the research has got vital effects for an investigation that is likely to be conducted in the future in regards to the detection of botnet traffic on Twitter. Particularly, it emphasizes on the point that researchers must be sensitive on the timeliness of the training data they use which would enable the testing of whether there exists a similar effect in differentiating between genuine traffic and botnet command and control network channels (Atefeh & Khreich, 2015). The further delay of the draining plan will spell confusion for the twitter users in the future which only serves to further the interests of the mischievous persons. This research also provides a viable approach for botnet traffic which social media platforms could implement to distinguish that are part of the botnet command and control network and disable it by removing the information. The information removal may also serve to protect the security of the users by giving them protection against the malware attacks and unauthorized access to their personal information (Gupta & Kumaraguru, 2012). The research approach is also significant because it can be implemented by network owners within their network boundary together with an SSL forward proxy so as to examine outbound information sent to social media networks as well as distinguish command and control messages (Gupta & Kumaraguru, 2012).

However, the setbacks to the implementation of the approach proposed by this article are mainly related to cost challenges. Even though network owners would incur relatively low costs of processing by implementing the approach of training data recency on their network given the comparative volume traffic on social media as compared to other traffic, the concept of obtaining buy-in from various network owners is very challenging. Nonetheless, social media platforms can implement the approach of training machines on data recency to identify suspicious traffic even though it would result in the incurrence of higher processing costs (Atefeh & Khreich, 2015). Generally, increasing the training data adds the information and is likely to improve the fit even though there may be difficulty in the evaluation of the performance of a classifier on the training data that is used.

Generally, the concept of training for data recency helps in the building of an algorithm that is capable in the identification of relevant tweets in regards to a particular topic which has been a very vital research challenge because twitter uses the search technique of short queries which results into an enormous number of matching tweets (Fang, Ounis, Habel, Macdonald & Limsopatham, 2015). The research findings show that the classifiers are likely to distinguish between non-verified an...

Cite this page

Times Change and Your Training Data Should Too: The Effect of Training Data Recency on Twitter Classifiers. (2022, Oct 21). Retrieved from https://proessays.net/essays/times-change-and-your-training-data-should-too-the-effect-of-training-data-recency-on-twitter-classifiers

logo_disclaimer
Free essays can be submitted by anyone,

so we do not vouch for their quality

Want a quality guarantee?
Order from one of our vetted writers instead

If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal:

didn't find image

Liked this essay sample but need an original one?

Hire a professional with VAST experience and 25% off!

24/7 online support

NO plagiarism