Recently, YouTube has started working with content providers (known as YouTube partners) to promote the users’ watching and sharing activities. The substantial benefit is to further augment its service and monetize more videos, which is crucial to both YouTube and its partners, as well as to other providers of relevant services.
In this paper, our main contribution is to analyze the massive amounts of video data from a YouTube partner’s view. We make effective use of Insight, a new analytics service of YouTube that offers simple data analysis for partners. To provide the practical guidance from the raw Insight data, we enable more complex investigations for the inherent features that affect the popularity of the videos. Our findings facilitate YouTube partners to re-design current video publishing strategies, having more opportunities to attract more views.
There have been significant studies on the user-generated data in YouTube. Besides the content shared by normal users, YouTube has also introduced the Partner Program, through which premium content owners who are motivated by the advertisement revenues can upload high- quality copyrighted videos, serving an even larger user base. Notable partner examples include such industrial giants as EA, ESPN, and Warner Brothers. More and more small businesses and individuals have also partnered with YouTube to benefit from monetizing their videos, and their revenue has doubled for four years in a row. Machinima, one of the most popular YouTube partners, has also received significant investment from Google to produce more appealing videos, further implying the key role of YouTube partners.
Using Hadoop to Analyze Stock Market Data
Since Stock Markets generate a wide variety of unstructured data, this type of data can be analyzed using the Hadoop framework. A stock market data analysis project was conducted by taking a sample ‘New York Stock Exchange’ data set. Using the Hadoop Framework, the covariance for this stock data was calculated and aimed to solve both storage and processing problems related to a huge volume of data. The dataset used in this project was a comma separated file (CSV) that contains the stock information such as daily quotes, stock opening price, stock highest price, etc. on the New York Stock Exchange. Using Hive commands, a Hive Table was created.
Once the table was created, the CSV data was loaded into the Hive Table. By using the Hive select queries, Covariance for the provided stock dataset for the inputted year was calculated. From the covariance results, stockbrokers provided key recommendations including the possibility of stock prices moving in the upward direction or inverse direction.
Sentiment Analysis of Twitter Data Using Hadoop
Sentiment analysis or opinion mining is defined as categorizing opinions expressed on a social media platform about a given subject. This project was undertaken to understand the comment writer’s attitude towards a particular product or a given subject. Using Sentiment Analysis, it can be determined if the general attitude of people is positive, negative or neutral towards a specific subject.
The core objective of this project was to analyze the twitter data and classify/categorize it based on the polarity of the words. The data was collected from Twitter using java Twitter streaming API. This data was then stored in HDFS (Hadoop Distributed File System) in a certain format. This data was further fed to mapper in MapReduce programming approach. The data collected from Twitter over a period of time was processed by using java and distributed processing software framework and using MapReduce programming model and Apache hive framework. The output obtained from the reducer phase was further analyzed and represented to the management in the form of pie-charts. The final data outcome showed in this project was in the form of Positive, Negative and Neutral tweets. This helped companies in identifying if the marketing campaign for a specific product was a success or a failure.
- Problem Definition:
- Modules Description:
The main objective of this project is to focus on how data generated from YouTube can be mined and utilized by different companies to make targeted, real-time and informed decisions about their product that can increase their market share. This can be done by using Hadoop concepts.
The given project will focus on how data generated from YouTube can be mined and utilized. There are multiple applications of this project. Companies can use this project to understand how effective and penetrative their marketing programs are. In addition to the view counts, subscribers, and shares, audience retention count, companies can also evaluate views according to a date range. This can tell the companies when is the slow period or spike in viewership and attribute the same to a certain marketing campaign.
YoutubeCategory.java – Mapper Reducer code to get top 5 categories
The task of big data analysis is not only important but also a necessity. In fact, many organizations that have implemented Big Data are realizing significant competitive advantage compared to other organizations with no Big Data efforts. The project is intended to analyze the YouTube Big Data and come up with significant insights which cannot be determined otherwise.
The output results of YouTube data analysis project show key insights that can be extrapolated to other use cases as well. One of the output results describes that for a specific video id, how many likes were received. The number of likes — or “thumbs-up” — a video had has a direct significance to the YouTube video’s ranking, according to YouTube Analytics. So if a company posts its video on YouTube, then the number of YouTube likes the company has could determine whether the company or its competitors appear more prominently in YouTube search results.