The Best (FREE) Data Repositories for Aspiring Data Scientists in 2020

栏目: IT技术 · 发布时间: 4年前

内容简介:Awesome Public Datasets is a repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there

The Best (FREE) Data Repositories for Aspiring Data Scientists

A Quick Reference guide to data for any and every industry imaginable

The Best (FREE) Data Repositories for Aspiring Data Scientists in 2020

Earlier this week, Google announced that its Dataset Search engine is now out of beta . This is a great accomplishment for the world and an invaluable tool for any aspiring Data Scientist in 2020.

In honor of the news, I thought I’d put together a list of my favorite data repositories that I’ve used in the past to create a quick reference guide for any and all aspiring Data Scientists. No matter what industry you want to get into, there’s definitely a dataset for it here :)

Awesome Public Datasets

Awesome Public Datasets is a repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there

Data is Plural

Date is Plural is a weekly newsletter of useful/curious datasets. You can find a huge archive of datasets on their google doc. Just hit ctrl + f for a topic you’d like to look into and see the dozens of results that pop up.

Data World

Data World is an open data repository containing data contributed by thousands of users and organizations all across the world.

What I love about this is site is that it contains really hard to find data from. In particular, the healthcare field is one of the more difficult industries to get publicly available data from(due to privacy concerns). But luckily, Data World has 3667 free health datasets you can use for your next project .

Google Data Set Search

A data set search engine… powered by Google. No further explanation needed.

Kaggle

Kaggle enables data scientists and other developers to engage in running machine learning contests, write and share code, and to host datasets. The types of data science problems posted on Kaggle can be anything from attempting to predict cancer occurrence by examining patient records to analyzing sentiment to evoke by movie reviews and how this affects audience reaction.

Makeover Monday

This repository is mostly for data visualizations, but I think what they do is a lot of fun.

Makeover Monday was an initiative started in the first week of 2016, between Andy Kriebel (Head Coach, the Information Lab UK — @ vizwizbi ) and Andy Cotgreave (Tableau Evangelist — @ acotgreave ).

Every week, usually on a Sunday, Andy K will post (via blog and twitter) an original visualization to be “made over”. Some are awful, some are already great in which case the challenge is to present a different angle on the original

When complete, post a link to the visualisation and/or a picture, using the hashtag #MakeoverMonday. All the individual screenshots are compiled into one big Pinterest collage of combined visualizations

r/datasets/

A place to share, find, and discuss Datasets. You can request datasets from other subsribers as well as share and contribute your own.

UCI Machine Learning Repository

The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited “papers” in all of computer science.

United States Government

Under the terms of the 2013 Federal Open Data Policy , newly-generated government data is required to be made available in open, machine-readable formats, while continuing to ensure privacy and security.

That’s going to be all for now. Please feel free to bookmark this article and use it as a quick reference for your data pursuits.

Did I miss your favorite repository? Let me know below so I can add it to the guide. Until next time everyone, happy coding.

The Best (FREE) Data Repositories for Aspiring Data Scientists in 2020

My name is Kishen Sharma and I am a Data Scientist based in the Bay Area. I create content to educate and motivate aspiring Data Scientists all across the world.

Links to my blog and social media : https://linktr.ee/keesh_codes


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

深入浅出密码学

深入浅出密码学

Christof Paar、Jan Pelzl / 马小婷 / 清华大学出版社 / 2012-9 / 59.00元

密码学的应用范围日益扩大,它不仅用于政府通信和银行系统等传统领域,还用于Web浏览器、电子邮件程序、手机、制造系统、嵌入式软件、智能建筑、汽车甚至人体器官移植等领域。今天的设计人员必须全面系统地了解应用密码学。 《深入浅出密码学——常用加密技术原理与应用》作者帕尔和佩尔茨尔长期执教于计算机科学与工程系,拥有十分丰富的应用密码学教学经验。本书可作为研究生和高年级本科生的教科书,也可供工......一起来看看 《深入浅出密码学》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器