Levi Briggs Data Science Blog

Sunday, April 12, 2020

Meeting Charleston

Due to the ongoing COVID-19 pandemic, I was unable to attend a physical meeting to fulfill the requirements for this blog post. I was originally planning on going to either the Data Science meetup, or the Linux Users group. Being the avid Linux user that I am, I was disappointed that I was not able to attend the meeting and potentially learn interesting things about the Linux operating system. I also have great disappointment for the missed networking opportunity that the Data Science meetup could have given me. This unforeseen pandemic has definitely disrupted the ending to this semester, and presented me with various challenges that I had to overcome. Although I was not able to attend an event this year, I do have a past experience with the Linux Users Group. When I was a child, my father actually took me to one of the meetings while he was attending classes at the College of Charleston. Although I do not remember much from the meeting, I will say that my interest in Linux was sparked because of it. The experience is one of the earliest memories I have of the operating system, and had a great influence on me at such a pivotal formative age. The meeting showed me the passion of the open source community, and how such people find joy in open source computing.

Wednesday, April 8, 2020

Chapter 9

With this project coming to a close, I am grateful for all of the concepts and skills that I have learned from this class. Although it was not the capstone that I was expecting, being responsible for contributing to an open source project has proved to be widely beneficial for my professional development. This class has broadened my horizons, and shown me the diverse world that open source development. I find beauty in this community driven software development. It exemplifies a passionate community that is looking out for the good of its people. They produce and maintain good software that will be in turn, used to create more good software and products that everyone can use. It is a beautiful cycle of code production that anyone can be involved with. Being familiar with this type of software development process, although not conventionally data science related, is still a skill that every prospective tech professional should be familiar with. I am glad that I was required to be exposed to this different curriculum as I feel that it has given me invaluable experience for my career in the industry. Data science is not just about finding meaning in large data sets and performing analytical tasks; a data scientist that is familiar with software development can prove to be an irreplaceable team member, and a great asset to the community. Chapter 9 outlines the end of a software project, and explains various concepts to ensure the smooth transition to the client. This chapter provides meaningful information for the hand-off to the client, and discusses the different choices a developer has to provide their client with the working product for years to come. There are many options in terms of transitioning to professional support, and hosting the code on a clients server for relatively cheap.

Tuesday, March 24, 2020

Chapter 6

The database is arguably the most essential part to many software projects. It is where the bulk of the important data from users is stored and able to be reused when needed. Part of the data science curriculum at the College of Charleston is centered databases and the management of large data-sets. With that being said, I have slight experience in several SQL queries and other database techniques from the DATA 210 class. The command line is a useful tool when querying data-sets, and trying to find trends in data. One interesting thing about data-bases is that the information stored inside "persists", or is able to exceed the lifetime of the actual program. The data is still able to be used and modified even if the program no longer is viable. Databases are stand alone things that coincide with programs to serve as a data storage. One is able to store and organize vast amounts of data with relatively simple commands. One such type of organization are tables, which have a fixed number of columns and varying number of rows. An important concept to consider when implementing tables is the idea of normalization. Normalization allows tables to be queried and compared using a standardized language. Tables are able to be quickly queried because the information has its own key which is referenced when searching. This chapter has served as good review for database management, and the specific queries that we learned in Data 210 at the College of Charleston. I have had many conversations with my father about the great importance of databases, and how a company will pay good money to have a database specialist. This is something that I have considered as a career, citing the need for more database professionals in the field. I feel as though as data-scientist with an expertise in database management should go hand in hand, and would make for an essential part of any team.

Monday, March 23, 2020

Chapter 5

Domain classes are a vital part of any software project, and should be considered carefully during the software development process. This object oriented approach to client solutions is what generally makes larger projects more manageable, and often times what makes it even possible. It is always a good idea to break your program up into various classes that contain specific instance variables and functions relative to that class, but also implement some kind of inheritance hierarchy. There are two approaches to coding domain classes in one's program; one being reusing legacy code, or code that has already been established and tested by other developers, and the more labor intensive alternative, starting from scratch. Downloading and modifying previously written code seems the most simple, and safe option when trying to code these classes, but sometimes this proves to be difficult if such classes do not exemplify your specific needs. That is the job of the developer, to determine what the clients needs are and to best serve them with the tools at hand. This chapter has been a good refresher for me in terms of classes in a project, and their specific attributes. Another big concept posed in this chapter is the idea of unit testing, and maintaining an effective testing strategy for your code. I have the most experience in testing throughout the process, or in other words, testing each piece of code and function before moving on. Building on this, the chapter introduced the idea of test driven development, or gathering testing requirements before anything concrete is actually coded. The second implementation of testing seems to be the most rigorous and time consuming, but allows for the most success. Test suites, which are a collection of unit tests, play a pivotal role in the testing process. They allow for the individual testing of specific modules in a program to determine if everything is working as it should. This chapter has provided me with beneficial review in terms of classes and unit testing, as well as introduced me to new concepts and specific frameworks continuing to build my skills as a software developer.

Monday, March 9, 2020

Release early and often

Proper documentation is an extremely vital part of good software. The longevity and viability of said software is greatly dependent on good documentation practices for which future contributors can read and fully comprehend to better maintain and alter the code. In a perfect world, there would be no need for documentation as said in chapter 8, every individual would be have the same coding conventions and would be able to understand any piece of code written by anybody. But sadly that is not the case, everybody has their own way of doing things; naming conventions, spacing and indentations are a couple of instances that allow for stylistic independence/ expression (Java specifically). Although many are taught to name their variables and methods with clarity and obvious purpose, some time constraints or other factors prevent developers from doing so. There is legacy code out there with simple variable or method names designated as such to save time and to get the code working. The idea of getting the code to work first and cleaning up later is prevalent in the field, but many projects such as government contracting do not allow for this to occur. These contracting jobs are in the market for code that does what they need it to do, no government projects will pay a team to go back and clean up the code, and make it more readable. Harsh deadlines and requirements make it hard for developers to take the time to make readable code, they are just looking to meet the deadlines and complete the sprints on time. Although this is the reality of some projects, it is still imperative that said developers take that extra time to document their code, making the lives of future maintainers, or the developers themselves much easier in the long run. Proper documentation practices can go a long way in this field, and many people will be greatly appreciative for it. One must write code that is relatively easy to understand or have documentation that adequately describes what the purpose of the software. Developers should maintain good developer documentation in the code base, ensuring the future understanding of every function and module. Technical writing is the same as any other kind of writing, know your audience and be as clear and concise as possible.

Wednesday, February 19, 2020

Stupid or Solid?

Reading both of these articles has been greatly insightful for me, and beneficial for my professional development. As a Data Science major, the curriculum goes to great lengths in exposing us to data gathering techniques and what to do with such large data-sets. There is not much focus on writing SOLID code and the many factors that are at play when trying to produce code that everyone can maintain. The Data Science curriculum is good at showing students how to manage the vast amount of data that is out there, and how to extract meaningful information and make predictions or decisions from it. We mostly use built in Python libraries or other machine learning tools such as scikit-learn or Pandas. My first two programming classes somewhat gave me good experiences in writing neat and readable code, however the main focus of my degree was data management, and finding insights from large data sets. These articles opened up a new world for me and showed me that writing good code is as much an art form as it is a technical skill. There is a great amount of effort that goes into writing SOLID code as the article describes. One must be aware and mindful of the people that will have to read and maintain the code down the road. It is not enough to just write functioning code, a good developer writes code that his team and possibly other people in the world will be able to comprehend and add to. One important concept that I was not aware of before reading these articles was that when writing classes, you want high cohesion and low coupling; meaning that you want to keep code together that is related in function but you also do not want to design your classes in such a way that many of them depend on another. The goal is to have your code work towards a common goal but different parts be independent if possible. I also learned that it is not ideal to prematurely optimize your code before it is even a working product. Working code is far better than optimized code that does not do what it is intended to. Both articles exposed me to concepts that I was not completely aware of. These software engineering principles will assist me in my career and make me a more versatile data scientist. It is always a good idea to continue your education and broaden your horizons especially in the fast growing and constantly changing field of technology and software.

Thursday, February 13, 2020

What's Happening?

There are a wide variety of interesting technological articles in this Association for Computing Machinery magazine. One that peaked my interest and is relevant to my field was the Computing Ethics, Engaging the Ethics of Data Science in Practice. The authors wished to seek more common ground between data scientist and their critics, and to discuss the possible issues that arise from the growing field of Data Science and its practitioners. They explain that there exists critical commentary of the field, and that these critics proclaim that data scientists do not recognize the power they wield and often times use such power in a reckless and unethical manner. These critiques are not new and are not based in much truth, There are some instances of Data Scientists and their firms abusing their analytical powers such as the Cambridge Analytica or Facebook controversies; but as a whole, Data Science is no more unethical than other computer science fields. It is the personal morals and end goals of specific people that lead to possible unethical situations. These accusations are based in ignorance of the field, and an overlook to the routine deliberate activities that these outsiders are thinking about when it comes to ethics. Solon Barocas and Danah Boyd, the authors of this article, provide examples of Data Scientist practicing ethics, much like many other fields. They explain that they engage in countless acts of implicit ethical deliberations while in the process of creating a meaningful machine learning model. Data Scientists have to deal with incomplete data, which the authors argue is as much a moral concern as it is a practical one. Choosing what data to use, and determining if it is useful based on where it came from is a common situation data scientist find themselves in. Validating a model, and how this said model will perform when deployed are also ethical concerns that are often overlooked by the outside community. There is a great need for careful judgement in this field, many times having to take into consideration the ramifications it will bring humanity, and how it will ultimately affect the world. Even attempting to address these ethical issues explicitly, practitioners face trade-offs that must be considered. The article then explains of a model with gender bias, and fixing the issue would have to sacrifice privacy. The authors want a collaborative and constructive dialogue between Data Science practitioners and their critics. They want the critics to realize the effort put in by these people, and the small, ethical decision that go into making their analysis. The commentary of the field is often created by people who are unaware of the actual practice of it. The authors argue that we need to make effort to work collectively to deliberate appropriately about the field which will reveal a common ground between the two groups, and lessen the gap in understanding.