Key Principles for Professional Coding
If you are a software developer you will know the importance of writing re-usable code & professional code and how it benefits and saves time in the long run. It not only saves time but also improves program efficiency and performance. In fact, the single best thing you can do to make your code more professional is to make it reusable & professional.
But, what does “reusable” mean? At some point in your career, you are going to write code that will be used more than just once or twice. Maybe you’re running the same data pipelines or any other processes on some different sets of data, or anything else. Most people would have copied and pasted the same code, but once you find yourself copying the same code more than once or twice, it’s time to sink some time into making your code reusable. Reusing well-written code is an efficient use of your time and it’s considered a best practice in software engineering.
There are six central principles that make it easy for you or your colleagues to write professional code, making your code look really polished and above all saving you time. Here are they;
-
📦 Modular
Code is broken into small, independent parts (like functions) that each do one thing. Code you’re reusing lives in a single central place.
-
✔️ Correct
Your code does what you say/think it does.
-
đź“– Readable
It’s easy to read the code and understand what it does. Variable names are informative and code has up-to-date comments.
-
đź’… Stylish
Code follows a single, consistent style.
-
🛠️ Versatile
Solves a problem that will happen more than once and anticipates variation in the data.
-
đź’ˇ Creative
Solves a problem that hasn’t already been solved or is a clear improvement over an existing solution.
Let’s go through each of these steps in a bit more detail with a bit of sample code and see how they work in practice. Please note that the examples here are written using python but the same philosophy applies to all programming languages.
📦 Modular
Modular code means that your code is broken into small, independent parts (like functions) that each do one thing. Each function (in Python) has several parts:
- A name for the function.
- Arguments for your function. This is the information you’ll pass into your function.
- The body of your function. This is where you define what your function does. Generally, I’ll write the code for my function and test with an existing data structure first and then put the code into a function.
- A return value. This is what your function will send back after it’s finished writing. In Python, you’ll need to specify what you want to return by adding
return(thing_to_return)
at the bottom of your function.
Let’s look at some examples.
# define a function
def find_most_common(values):
list_counts = collections.Counter(values)
most_common_values = list_counts.most_common(1)
return(most_common_values[0][0])
# use the function
find_most_common([1, 2, 2, 3])
Pretty straightforward, right? You can use this general principle of writing little functions that do one thing each to break your code up into smaller pieces.
If you are someone writing data pipelines, breaking your code apart into functions - particularly if each function just transforms the data that gets passed into it can save you time by letting you reuse code and combine different functions into compact data pipelines.
✔️ Correct
By “correct”, it means that your code does what you say/think it does. This can be tricky to check. One way to make sure your code is correct is through code review.
Code review is a process where one of your colleagues carefully checks over your code to make sure that it works the way you think it does.
Unfortunately, that’s not always practical for all. Especially if you are the only developer in a start-up, it would be tough to get someone without any experience to give you expert feedback on your code. As the field grows larger it may become more common for code to undergo code review but in the meantime, you can help make sure your code is correct by including some tests.
Tests are little pieces of code you use to check that your code is working correctly.
Writing tests doesn’t have to be complex! Here, we will see how to test a function with just a single line of code.
In the Python function, we saw above, we returned the most common value… but what if there was more than one value tied for the most common? Currently our function will just return one of them, but if I really need to know if there’s a tie my current function won’t do that.
So let’s include a test to let us know if there’s a tie! assert
is a method built into Python that will let us check that something is true. If it is, nothing happens. If it isn’t, our function will stop and give us an error message.
import collections
def find_most_common(values):
""""Return the value of the most common item in a list"""
list_counts =collections.Counter(values)
top_two_values =list_counts.most_common(2)
# make sure we don't have a tie for most common
asserttop_two_values[0][1] !=top_two_values[1][1]\
,"There's a tie for most common value"
return(top_two_values[0][0])
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-3-3944475aef87> in <module>
1 values_list = [1, 2, 3, 4, 4, 4, 5, 5, 5]
2
----> 3 find_most_common(values_list)
<ipython-input-1-9c10e5923d14> in find_most_common(values)
8 # make sure we don't have a tie for most common
9 assert top_two_values[0][1] != top_two_values[1][1]\
---> 10 ,"There's a tie for most common value"
11
12 return(top_two_values[0][0])
AssertionError: There's a tie for most common value
đź“– Readable
“Readable” code is code that is easy to read, even if it’s the first time you’ve seen it. In general, the more things like variable and function names are words that describe what they do/are the easier it is to read the code. In addition, comments that describe what the code does at a high level or why you made specific choices can help you.
Some examples of not-very-readable variable names are:
- Single characters, like
x
orq
. There are a few exceptions, like usingi
for index orx
for the x axis. - All lower case names with no spaces between words
likethisexample
orsomedatafromsomewhere
- Uninformative or ambiguous names
data2
doesn’t tell you what’s in the data or how it’s different fromdata1
.df
tells you that something’s a dataframe, but if you have multiple dataframes how can you tell which one?
You can improve names by following a couple of rules:
- Use some way to indicate the spaces between words in variable names. Since you can’t use actual spaces, some common ways to do this are
snake_case
andcamelCase
. Your style guide will probably recommend one. - Use the names to describe what’s in the variable or what a function does. For example,
sales_data_jan
is more informative than justdata
, andz_score_calculator
is more informative than justcalc
ornorm
.
Comments are blocks of natural language text in your code. Some tips for writing better comments:
- While some style guides recommend not including information on what a bit of code is doing, I actually think that it’s often warranted. I personally include comments describing what my code is doing.
- If you change the code, remember to update the comment.
- If you’re using an uncommon way of doing something it’s worth adding a comment to explain why its done that way.
- Some style guides will recommend only ever writing comments in English, but if you’re working with folks who use another language I’d personally suggest that you write comments in whichever language will be easiest for everyone using the code to understand.
- Docstring: In Python, a docstring is a comment that’s the first bit of text in a function or class. If you are importing functions, you should include a docstring. This lets you, and anyone else using the function, quickly learn more about what the function does.
def function(argument):
""" This is the docstring. It should describe what the function will do when run """
To check the docstring for a function, you can use the syntax function_name.__doc__
.
Readable code is faster to read. This saves you time when you need to go back to a project or when you’re encountering new code for the first time and need to understand what’s going on.
đź’… Stylish
Styles are described and defined in documents called “style guides”. If you haven’t used a style guide before, they’re very handy! Following a specific style guide makes your code easier to read and helps you avoid common mistakes. It can also help you avoid writing code with code smells.
Style guides will offer guidance on things like where to put white spaces, how to organize the structure of code within a file and how to name things like functions and files. Code that doesn’t consistently follow a style guide may still run perfectly fine, but it will look a little odd and generally be hard to read.
Pro tip: You can actually use a program called a “linter” to automatically check if your code follows a specific style guide. Pylint for Python is a popular linter.
Once you’ve picked a style guide to follow, you should do your best to follow it consistently within your code. There are, of course, differences across style guides. A couple of examples:
- You should have all of your imports (
import module_name
) at the top of your code file and only have one import per line. - Whether you indent with tabs or spaces will depend on your style guide, but you should never mix tabs and spaces (e.g. have some lines indented with two spaces and some lines indented with a tab).
- Avoid having spaces at the ends of your lines.
- Function and variable names should all be lowercase and have words seperated_by_underscores (unless you’re working with existing code that follows another standard, in which case use that)
- Try to keep your lines of code fairly short, ideally less than 80 characters long.
Style guides can be a little overwhelming at first, but don’t stress too much. As you read and write more code it will become easier and easier to follow a specific style guide. In the meantime, even a few small improvements will make your code easier to follow and use.
🛠️ Versatile
“Versatile” means useful in a variety of situations. Versatile code solves a problem that will happen more than once and anticipates variation in the data.
So, a natural question arises - Should I only ever write code if I’m planning to reuse it?
Of course not. There’s nothing wrong with writing new code to solve a unique problem. Maybe you need to rename a batch of files quickly or someone’s asked you to make a new, unique visualization for a one-off presentation.
However, you probably don’t want to go through all the trouble of making every single line of code you ever write totally polished and reusable. Data scientists have to do and know about a lot of different things: you’ve probably got a better use for your time than carefully polishing every line of code you ever write. Investing time in polishing your code starts to make sense when you know the code is going to be reused. A little time spent making everything easier to follow and use while it’s still fresh in your head can save a lot of time down the line.
đź’ˇ Creative
By “creative”, I mean code that solves a problem that hasn’t already been solved or is a clear improvement over an existing solution. The reason that I include this is to encourage you to seek out existing libraries or modules that already exist to solve your problem. If someone has already written the code you need, and it’s under a license that allows you to use it, then you should probably just do that.
I would only suggest writing a library that replicates the functionality of another one if you’re making a clear improvement. For example, the Python library flashtext. It allows you to do the same thing as you can with regular expressions- like find, extract and replace text- but much, much faster.
Only spending time writing code if there’s no existing solution saves you time because you can build on existing work rather than starting over from scratch.
So, here they are. 6 key principles to follow and not only become a professional software developer but also have an impact on other software developers.