CS246: Mining Data Sets

栏目: IT技术 · 发布时间: 4年前

内容简介:The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on MapReduce andThe previous version of the course isCS345A: Data Mining which also included a course project. CS345A has now

Content

What is this course about? [Info Handout]

The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on MapReduce and Spark as tools for creating parallel algorithms that can process very large amounts of data.

Topics include : Frequent itemsets and Association rules, Near Neighbor Search in High Dimensional Data, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Systems, Clustering, Link Analysis, Large-scale Supervised Machine Learning, Data streams, Mining the Web for Structured Data, Web Advertising.

Previous offerings

The previous version of the course isCS345A: Data Mining which also included a course project. CS345A has now been split into two courses CS246 (Winter, 3-4 Units, homework, final, no project) and CS341 (Spring, 3 Units, project-focused).

You can access class notes and slides of previous versions of the course here:

CS246 Websites :CS246: Winter 2019 /CS246: Winter 2018 /CS246: Winter 2017 /CS246: Winter 2016 /CS246: Winter 2015 /CS246: Winter 2014 /CS246: Winter 2013 /CS246: Winter 2012 /CS246: Winter 2011
CS345a Website : CS345a: Winter 2010

Prerequisites

Students are expected to have the following background:

  • Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program (e.g., CS107 or CS145 or equivalent are recommended).
  • Good knowledge of Java and Python will be extremely helpful since most assignments will require the use of Spark.
  • Familiarity with basic probability theory (CS109 or Stat116 or equivalent is sufficient but not necessary).
  • Familiarity with writing rigorous proofs (at a minimum, at the level of CS 103).
  • Familiarity with basic linear algebra (e.g., any of Math 51, Math 103, Math 113, CS 205, or EE 263 would be much more than necessary).
  • Familiarity with algorithmic analysis (e.g., CS 161 would be much more than necessary).

The recitation sessions in the first weeks of the class will give an overview of the expected background.

Reference Text

The following text is useful, but not required. It can be downloaded for free, or purchased from Cambridge University Press.

Leskovec-Rajaraman-Ullman: Mining of Massive Dataset

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

大数据大创新:阿里巴巴云上数据中台之道

大数据大创新:阿里巴巴云上数据中台之道

邓中华 / 电子工业出版社 / 2018-11 / 99

阿里巴巴云上数据中台正服务着阿里生态中的数十个业务板块、百余家公司、千万级客户,在帮助决策层看清甚至决定业态走向的同时,在上万个业务场景中应用并催生创新。 《大数据大创新:阿里巴巴云上数据中台之道》基于作者在阿里巴巴的十年大数据从业经历,精彩演绎云上数据中台之道。《大数据大创新:阿里巴巴云上数据中台之道》基于大数据探索的大趋势,讲述阿里巴巴云上数据中台顶层设计,再以实际案例详述阿里巴巴云上数......一起来看看 《大数据大创新:阿里巴巴云上数据中台之道》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

在线进制转换器
在线进制转换器

各进制数互转换器

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具